Hello everybody,
Michael here, and this lesson will be on sorting & filtering NumPy arrays-part 5 in my NumPy series.
First, let’s discuss sorting a NumPy array. Sorting elements-in any context-means putting them in some sort of ordered sequence (whether numeric, alphabetical, ascending or descending).
NumPy arrays have a sort function that sorts an array. Here’s a simple example of sorting an array:
numpyA = np.array([3.6, 2.4, 6.0, 1.2, 9.6, 7.2, 8.4, 4.8])
print(np.sort(numpyA))
[1.2 2.4 3.6 4.8 6. 7.2 8.4 9.6]
In this example, I sorted the array in ascending numerical order-the sort function sorts arrays in ascending order (least-to-greatest) by default.
What if you wanted to sort the elements in descending order (greatest-to-least)? Take a look at this example:
numpyA = np.array([3.6, 2.4, 6.0, 1.2, 9.6, 7.2, 8.4, 4.8])
print(-np.sort(-numpyA))
[9.6 8.4 7.2 6. 4.8 3.6 2.4 1.2]
To sort elements in descending order, simply place a minus sign (-) in front of np.sort and in front of the array you are sorting.
Now I’ve shown you plenty of NumPy examples with floats and integers, but let’s discover how to work with NumPy arrays of strings-more specifically, let’s discuss sorting NumPy string arrays:
numpyB = np.array(['United States', 'Canada', 'Mexico', 'Brazil', 'Guatemala', 'Chile', 'Argentina'])
print(np.sort(numpyB))
['Argentina' 'Brazil' 'Canada' 'Chile' 'Guatemala' 'Mexico'
'United States']
When you’re sorting NumPy arrays of strings, the sort function sorts all elements in the array in ascending (alphabetical) order by default.
Unfortunately, if you want to sort the elements in descending (reverse-alphabetical) order, the minus-sign trick I demonstrated with arrayA won’t work here:
numpyB = np.array(['United States', 'Canada', 'Mexico', 'Brazil', 'Guatemala', 'Chile', 'Argentina'])
print(np.sort(numpyB))
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
<ipython-input-3-bc1f48d843fe> in <module>
1 numpyB = np.array(['United States', 'Canada', 'Mexico', 'Brazil', 'Guatemala', 'Chile', 'Argentina'])
----> 2 print(-np.sort(-numpyB))
UFuncTypeError: ufunc 'negative' did not contain a loop with signature matching types dtype('<U13') -> dtype('<U13')
Now let’s discover how to sort NumPy boolean arrays:
numpyC = np.array([True, False, False, True, True])
print(np.sort(numpyC))
[False False True True True]
In this example, I had an array filled with True and False. When I sorted the array, all of the Falses were placed before all of the Trues.
Unfortunately, if you want to use the minus-sign trick I used for arrayA to place all the Trues before all the Falses, that won’t work here either:
numpyC = np.array([True, False, False, True, True])
print(-np.sort(-numpyC))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-5-62530c5ec9a2> in <module>
1 numpyC = np.array([True, False, False, True, True])
----> 2 print(-np.sort(-numpyC))
TypeError: The numpy boolean negative, the `-` operator, is not supported, use the `~` operator or the logical_not function instead.
One thing to keep in mind with boolean NumPy arrays is that the element’s don’t have to be all True or False. Take a look at this example:
arrayD = np.array([4 > 5, 5 + 2 != 8, 9 <= 10, 4 - 1 > 3, 41 >= 2**2])
print(np.sort(arrayD))
[False False True True True]
In this example, I have a NumPy array consisting of five boolean statements (as opposed to just Trues and Falses). When sorting the array, the False statements are place before the True statements.
Now, I’ve shown you several examples of sorting 1-D arrays. What would happen when you sort a multi-dimensional array? Take a look at this example:
arrayE = np.array([[0.8, 0.2, 0, 0.4, 0.6], [1.2, 1.8, 1.6, 1, 1.4], [2.4, 2, 2.8, 2.2, 2.6]])
print(np.sort(arrayE))
[[0. 0.2 0.4 0.6 0.8]
[1. 1.2 1.4 1.6 1.8]
[2. 2.2 2.4 2.6 2.8]]
In this example, when I sorted this 2-D array, the elements in each array are sorted in ascending order
Now, would it be possible to sort the elements in arrayE in descending order?
arrayE = np.array([[0.8, 0.2, 0, 0.4, 0.6], [1.2, 1.8, 1.6, 1, 1.4], [2.4, 2, 2.8, 2.2, 2.6]])
print(-np.sort(-arrayE))
[[0.8 0.6 0.4 0.2 0. ]
[1.8 1.6 1.4 1.2 1. ]
[2.8 2.6 2.4 2.2 2. ]]
Using the minus-sign trick, we can sort through each 1-D array in the 2-D array in descending order.
- Since the minus-sign trick didn’t work with 1-D string or boolean arrays, it probably won’t work with multi-dimensional string or boolean arrays.
Next, let’s discussing filtering a NumPy array. In the context of NumPy, filtering an array means creating a new array from certain elements in an existing array.
Filtering a NumPy array is more complex than sorting a NumPy array, as there is no exisiting function to filter a NumPy array. Here’s a basic example of filtering a NumPy array:
arrayF = np.array([15, 30, 45, 60, 75])
X = [False, True, True, False, False]
arrayG = arrayF[X]
print(arrayG)
[30 45]
To filter a NumPy array, use a boolean index list (X in this example), which is an array of booleans that corresponds to indexes in the main array (arrayF in this example). For instance, the first False in X corresponds to 15 in arrayF, the first True in X corresponds to 30 in arrayF, and so on. Oftentimes, the filtered array is stored as a new array (arrayG in this example); the general syntax for a filtered array goes like this: array you're filtering[boolean index list].
Any index that corresponds to True in the boolean index list is included in the filtered array, while any index that corresponds to False in the boolean index list is excluded from the filtered array.
In this example, I created the boolean index list by hard-coding the True and False values into the list. A more common way to create a boolean index list is to fill the list based on certain conditions. Take a look at the example below:
arrayH = np.array([45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210])
Y = []
for i in arrayH:
if i%6==0:
Y.append(True)
else:
Y.append(False)
arrayI = arrayH[Y]
print(arrayI)
[ 60 90 120 150 180 210]
In this example, I first created my main array (arrayH) along with an empty boolean index list (Y). I then filled the boolean index list with Trues and Falses with a for loop that iterates through arrayH along with an if-else statement that adds a True to the boolean index list if the value in arrayH is evenly divisible by 6 and adds a False for all other values. I then created a new array (arrayI) that is derived from the filtering I did for arrayH. After printing out arrayI, you can see that six values were returned from the filtering I did for arrayH.
Now, the for-loop/if-else statement approach to filtering a NumPy array works great and is a fairly common approach to use when filtering NumPy arrays. However, there’s a more efficient method of NumPy array filtering. Take a look at the example below:
arrayJ = np.array([2021, 2022, 2023, 2024, 2025, 2026, 2027, 2028, 2029, 2030, 2031, 2032, 2033, 2034, 2035])
Z = arrayJ % 4 == 0
arrayK = arrayJ[Z]
print(arrayK)
[2024 2028 2032]
In this example, I am demonstrating a direct-from-array filtering approach. I first created my main array (arrayJ) along with my boolean index list (Z). This time, however, I didn’t fill the boolean index list using a for-loop/if-else statement pair; rather, I simply set the value of the boolean index list equal to the filter condition I want to use (arrayJ % 4 == 0).
After creating the boolean index list, I then create a new NumPy array (arrayK) which is derived from the filtering I did for arrayJ. After printing out the new NumPy array, you can see that three elements met the filtering criteria (elements evenly divisible by 4).
Now, the magic of using the direct-from-array filtering approach is that even though you don’t need a for-loop/if-else statement pair to fill the boolean index list, this approach still knows where to place the Trues and Falses in your boolean index list using the one line of code that you set as the filtering criteria.
Thanks for reading,
Michael