Python Lesson 21: Joining & Splitting NumPy Arrays (NumPy pt 4)

Advertisements

Hello everybody,

Michael here, and today’s lesson will be on joining and splitting NumPy arrays.

Joining NumPy arrays simply involves concatenating two or more NumPy arrays. However, joining two NumPy arrays isn’t as simple as joining two strings.

To join two or more NumPy arrays together, use the np.concatenate function. Here’s a simpleIn example of NumPy array concatenation:

numpy1 = np.array([0, 0.4, 0.8, 1.2, 1.6, 2])
numpy2 = np.array([2.4, 2.8, 3.2, 3.6, 4, 4.4])
numpy3 = np.concatenate((numpy1, numpy2))
print(numpy3)

[0.  0.4 0.8 1.2 1.6 2.  2.4 2.8 3.2 3.6 4.  4.4]

In order for the np.concatenate function to work properly, you’d need to pass in all the arrays you’d like to concatenate into a single tuple; if you pass in the arrays one-by-one, the function won’t work.

Also, when you concatenate multiple NumPy arrays, the dimensions stay the same:

print(numpy1.ndim)
print(numpy2.ndim)
print(numpy3.ndim)

1
1
1

Numpy1 and numpy2 are both 1-D arrays; the combined array numpy3 is also a 1-D array.

Now, can we join two arrays with different dimensions together? Let’s take a look:

numpy4 = np.array([[3, 6, 9], [4, 8, 12]])
numpy5 = np.array([5, 10, 15])
numpy6 = np.concatenate((numpy4, numpy5))
print(numpy6)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-14-479431df35d4> in <module>
      1 numpy4 = np.array([[3, 6, 9], [4, 8, 12]])
      2 numpy5 = np.array([5, 10, 15])
----> 3 numpy6 = np.concatenate((numpy4, numpy5))
      4 print(numpy6)

<__array_function__ internals> in concatenate(*args, **kwargs)

ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

As you can see, trying to concatenate two arrays with different dimensions doesn’t work; all of the arrays you’re trying to concatenate must have the same number of dimensions.

Now, in the array concatenation examples I’ve shown you, the arrays are being joined along the same axis. How could you join arrays along different axes?

You would stack the arrays together. Here’s an example of NumPy array stacking:

numpy7 = np.array([14, 28, 42])
numpy8 = np.array([15, 30, 45])
numpy9 = np.stack((numpy7, numpy8), axis=1)
print(numpy9)

[[14 15]
 [28 30]
 [42 45]]

Stacking arrays is the same as concatenation, except stacking is usually done along a new axis. To stack a NumPy array, use the np.stack function and pass a tuple containing the arrays you want to stack and the axis you want to stack them on; if you don’t pass an axis into np.stack, arrays will automatically be stacked along the first axis.

What would it look like if these arrays were stacked along the first axis rather than the second axis? Take a look:

numpy7 = np.array([14, 28, 42])
numpy8 = np.array([15, 30, 45])
numpy9 = np.stack((numpy7, numpy8))
print(numpy9)

[[14 28 42]
 [15 30 45]]

All of the elements from both arrays will still be present, however, stacking along the first axis creates a 2×3 array, while stacking along the second axis creates a 3×2 array. Both stacked arrays are still 2-D.

  • In case you didn’t figure it out, axis=0 refers to the first axis while axis=1 refers to the second axis.
  • You can’t stack along a non-existent axis; in this example, the stacked array only has two dimensions, therefore you can’t stack along axis=2 because the stacked array has no third dimension and thus has no third axis.

Stacking along axes works well, but what are some other NumPy array stacking methods?

Let’s say you wanted to stack along rows. Here’s how to do so:

numpy7 = np.array([14, 28, 42])
numpy8 = np.array([15, 30, 45])
numpy9 = np.hstack((numpy7, numpy8))
print(numpy9)

[14 28 42 15 30 45]

To stack an array along rows, use the np.hstack function and pass in a tuple containing the arrays you want to stack. In this example, stacking along rows simply merged the two arrays into a single 1-D array with the elements of numpy7 being listed before numpy8.

Now what if you wanted to stack along columns? Here’s how to do so:

numpy7 = np.array([14, 28, 42])
numpy8 = np.array([15, 30, 45])
numpy9 = np.vstack((numpy7, numpy8))
print(numpy9)

[[14 28 42]
 [15 30 45]]

To stack arrays along columns, use the np.vstack function-the parameter for this function is the same as the parameter for the np.hstack function (a tuple containing the arrays you want to stack). In the example, stacking along columns created a 2×3 2-D array (this is the same outcome as stacking along the first axis).

Now, there’s another way that you can stack your array-along depth (or height). Here’s how to do so:

numpy7 = np.array([14, 28, 42])
numpy8 = np.array([15, 30, 45])
numpy9 = np.dstack((numpy7, numpy8))
print(numpy9)

[[[14 15]
  [28 30]
  [42 45]]]

Stacking along depth/height is the same gist as stacking along rows or columns-the only difference is that you’d use the np.dstack function to stack along depth/height. In this example, stacking along depth/height created a 3×2 3-D array, which is interesting because in the example where I stacked along the 2nd axis in a previous example (using the same three NumPy arrays), I got a 3×2 2-D array.

Now that we’ve discussed the basics of joining NumPy arrays, let’s discover how to do the reverse (splitting arrays).

Here’s a simple example of splitting a NumPy array:

numpy10 = np.array([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000])
numpy11 = np.array_split(numpy10, 2)
print(numpy11)

[array([100, 200, 300, 400, 500]), array([ 600,  700,  800,  900, 1000])]

To split up a NumPy array, use the np.array_split function and pass in two parameters-the array you want to split up and the number of splits you want to use on the array. In this example, I have an array of 10 elements that I split in two.

Something to note when splitting areas is that the number of splits you want to use on an array doesn’t need to be divisible by the number of elements in the array. Granted, I did use two splits on a 10-element array. Watch what happens when I use three splits on the same 10-element array:

numpy10 = np.array([100, 200, 300, 400, 500, 600, 700, 800, 900, 1000])
numpy11 = np.array_split(numpy10, 3)
print(numpy11)

[array([100, 200, 300, 400]), array([500, 600, 700]), array([ 800,  900, 1000])]

The split still works, though the array isn’t split evenly (and Python automatically decides how to split the array).

Now let’s say you wanted to access one of the individual arrays. Here’s how to do so (using the 3-split example):

print(numpy11[0])
print(numpy11[1])
print(numpy11[2])

[100 200 300 400]
[500 600 700]
[ 800  900 1000]

Accessing individual arrays from a larger split NumPy array is the same as accessing elements from an individual array-in this case, the first array is index 0, the second array is index 1, and so on.

Now what if we wanted to access individual elements from these split arrays? Take a look at this example:

print(numpy11[0][1])
print(numpy11[1][1])
print(numpy11[2][1])

200
600
900

To access an individual element in each individual array, you’d need to add another indexing call. In this example, the first indexing call refers to the array itself while the second indexing call refers to the element inside the array.

Now, how would you split a multi-dimensional array? Take a look at this example:

numpy12 = np.array([[200, 400, 600, 800, 1000, 1200], [300, 600, 900, 1200, 1500, 1800]])
numpy13 = np.array_split(numpy12, 2)
print(numpy13)

[array([[ 200,  400,  600,  800, 1000, 1200]]), array([[ 300,  600,  900, 1200, 1500, 1800]])]

In this example, I am splitting a 2-D array in two-interestingly enough, both of the split arrays are still 2-D.

As you can see in the example, there are two 1-D arrays inside the 2-D array, therefore, splitting the 2-D array in two made sense. However, what if you wanted to split this 2-D array another way? Let’s see what happens when we split this array in three:

numpy12 = np.array([[200, 400, 600, 800, 1000, 1200], [300, 600, 900, 1200, 1500, 1800]])
numpy13 = np.array_split(numpy12, 3)
print(numpy13)

[array([[ 200,  400,  600,  800, 1000, 1200]]), array([[ 300,  600,  900, 1200, 1500, 1800]]), array([], shape=(0, 6), dtype=int32)]

When you split this array in three, you get both of the 1-D arrays in the 2-D arrays plus a blank array with a shape of (0, 6). If I was to split this array in four, I would’ve gotten both of the 1-D arrays plus two blank arrays with a shape of (0, 6).

A neat thing about splitting arrays is that, just like with joining arrays, you can split the arrays along a certain axis. Let’s see how axis-splitting an array works with a 2-D array:

numpy14 = np.array([[22, 44, 66, 88, 110], [33, 66, 99, 132, 165]])
numpy15 = np.array_split(numpy14, 5, axis=1)
print(numpy15)

[array([[22],
       [33]]), array([[44],
       [66]]), array([[66],
       [99]]), array([[ 88],
       [132]]), array([[110],
       [165]])]

In this example, I split the 2-D array numpy14 in five, resulting in a split array where elements from both of the 1-D arrays are stacked on top of each other in two columns. To split a NumPy array along a certain axis, specify the axis you want to split along after the number of splits you want to perform.

  • Just as with axis-joining an array, if you don’t specify an axis to split along in the np.array_split function, the array will automatically split along the first axis.
  • You can’t split along an axis beyond the scope of the array. In this example, since numpy14 is a 2-D array, you can’t split along axis=2 since a 2-D array doesn’t have a third axis.

Now, I mentioned that you can join arrays along rows, columns, and depth/height. However, did you know that each of the array-joining functions-np.hstack, np.vstack, and np.dstack-also have array-splitting counterparts-np.hsplit, np.vsplit, np.dsplit?

Let’s demonstrate the np.hsplit function first, which splits your array along rows:

numpy16 = np.array([40, 80, 120, 160, 200, 240])
numpy17 = np.hsplit(numpy16, 2)
print(numpy17)

[array([ 40,  80, 120]), array([160, 200, 240])]

In this example, I split numpy16 in two along rows; as you can see, both split arrays are displayed along a single row. Also, even though np.hsplit is the counterpart to np.hstack, np.hsplit doesn’t require a tuple as a parameter since you’re splitting a single array rather than stacking multiple arrays on top of each other.

Now let’s check out the np.vsplit function, which splits your array along columns:

numpy18 = np.array([[2010, 2011, 2012, 2013], [2014, 2015, 2016, 2017], [2018, 2019, 2020, 2021]])
numpy19 = np.vsplit(numpy18, 3)
print(numpy19)

[array([[2010, 2011, 2012, 2013]]), array([[2014, 2015, 2016, 2017]]), array([[2018, 2019, 2020, 2021]])]

Interestingly, the output for np.vsplit is displayed in the same format as the output for np.hsplit. However, keep in mind that unlike for np.hsplit, np.vsplit won’t work on 1-D arrays-you’ll need at least a 2-D array to make np.vsplit work.

Finally, let’s demonstrate the np.dsplit function, which splits your array along depth/height:

numpy20 = np.array([[[1996, 1997, 1998, 1999], [2000, 2001, 2002, 2003], [2004, 2005, 2006, 2007]]])
numpy21 = np.dsplit(numpy20, 4)
print(numpy21)

[array([[[1996],
        [2000],
        [2004]]]), array([[[1997],
        [2001],
        [2005]]]), array([[[1998],
        [2002],
        [2006]]]), array([[[1999],
        [2003],
        [2007]]])]

Unlike the output for np.hsplit and np.vsplit, the output for np.dsplit displays stacked, with elements from each 1-D array interspersed with each other. Also, for np.dsplit to work, you’ll need at least a 3-D array; 2-D arrays won’t work with np.dsplit.

Thanks for reading,

Michael

Leave a ReplyCancel reply