Dissecting the NumPy Indexing Mechanism
Introduction
Typically people tend to treat arrays using the intuition gained in linear algebra, where arrays of numbers are treated as matrix-like objects. However, in this deep learning era, this intuition cannot be easily generalized to higher dimensions (4 or higher), nor explain how the numbers are stored. Thus, we introduce a new graphical notation based on the hierarchy of array elements that helps:
- Understand how array elements are stored
- Explain the three operations, i.e., aggregation, reshape and stack, in NumPy
- Solve some practical problems
The passage is organized as follows: 1) in the second section, we will discuss the graphical representation and the indexing mechanisms; 2) in the third and fourth section, we explain the aggregation and reshape and stack operation respectively; 3) we give an example showing the usefulness of such representations.
A Novel Graphical Representation
In this section, we will build up a graphical notation for NumPy arrays from simple cases.
Let’s begin with some simple cases. The figure above gives examples of the graphical representation. We want to highlight some points:
-
array(1)
is a single number andarray([1])
is an array. A single number cannot be indexed. And an array is indexable, which combines objects and assigns indices to them. - For objects in every position of an array, they are called elements. An element can be a single number, as indicated in
array([1])
andarray([1,2])
. The element can also be arrays, shown inarray([[1]])
. High dimensional arrays are built upon this recursive definitions. - The dimension for an array equals to the depth of recursion. For example,
array(1)
have 0 dimension andarray([[1]])
is 2-dimensional.
We try to simplify the notation. As indicated above, we use a box to denote a number and indicates its index in the right-bottom corner.
The hierarchy or dimension is illustrated by the number of square borders and the indexes. 1-dimensional array([0])
has a single boundary while 2D array([[1]])
have two boundaries. And the index of array([[1]])
becomes a 2D tuple. Based on such simplified representation, it is easier to represent some high dimensional arrays, e.g., array([[[1]],[[2]]])
.
For array elements who contain more than one sub-elements, we can simplify them accordingly, e.g., the graph for array([1,2])
and array([[1,2],[3,4]])
shown above. Specifically, in the array([[1,2],[3,4]])
case, we arrange the elements like a 2D matrix for easier understanding. But we can also stack the elements in one row, just like array([1,2])
case.
The Aggregation Operation
In this section, we explain the aggregation operation in NumPy. Aggregations operations, like the sum
function, generate aggregated description for the input array. However, for high dimensional arrays, the aggregation operation can be processed in various levels, i.e., different axes in NumPy. This axis system is sometimes confusing based on traditional intuitions, and it can be clearly explained using the proposed notation.
We firstly explain and set up the intuition in 1D and 2D arrays and apply such understanding in 3D arrays.
Aggregation in 1D and 2D Arrays
Let’s begin with simple cases in 1D and 2D arrays. And we will use the sum
function as an example for various aggregation functions.
For a 1D array, it has only 1 axis, and the aggregation can only be operated one direction, i.e., the 0 axis
in NumPy. Shown in the graph above, the 0-axis aggregation decomposes the array and add the elements together. Note: the output value for np.sum(array([1,2]), axis=0)
is a single number but not an array, because each element is a single number.
Let’s move to 2D arrays. Similarly, the 0-axis aggregation is broken down the outer arrays and add the elements together. The addition here is the element-wise vector addition, i.e., add the elements in the corresponding position. The obtained vector, in this case, is a 1D array.
If we perform the aggregation operation in the second direction, i.e., the 1 axis
, we will notice some magic here. The operation performs 0-axis aggregation on every element of the outer array and combines the results. For example, calculating np.sum(array([1,2],[3,4]), axis=1)
is equivalent to
array([
np.sum(array([1,2]), axis=0),
np.sum(array([3,4]), axis=0)
])
. The clever trick here not only perfectly align with human intuition in 2D examples but also help explain complicated case in higher dimensions.
Aggregation in 3D and High-dimensional Arrays
When moving to 3D arrays, for 0-axis and 1-axis aggregation, we directly apply the conclusion from the previous section and get the correct results. But how about 2-axis aggregation? Can we generalize the conclusion above?
The answer is yes. Similarly to 1-axis aggregation, which transforms a 1-axis aggregation to performing 0-axis aggregation in each element, the 2-axis aggregation is equivalent to applying 1-axis aggregation to each element.
The reduction idea easily illustrates the confusing 3D aggregation computation and can be further utilized to explain the axis system in a more complex array.
The Reshape and Stack Operation
We continue our journey to understand the reshape
and stack
operation in NumPy with our representation. These operations are hard to understand when performing in higher dimensions. And we will show our representation can clearly explain these operations.
Reshape
Let’s begin with the simple 1D array array([1,2,3,4,5,6])
. The lefthand side sub graphs show how to reshape the array to 2D arrays. reshape(2,3)
and reshape(3,2)
tells how many numbers of elements in the outer and inner arrays. And subsequently, it cuts the original sequence given the new shapes. Therefore, the first element of the resultant vector res[0,...]
is a 3-element and 2-element array for reshape(2,3)
and reshape(3,2)
, respective.
The righthand side shows the results of converting the input array in the 3D arrays.
reshape(1,2,3)
is the starting case. It says transform the array into a 3D array, whose element is a 2D array. Thus,res[0,…]
is the same as the output ofreshape(2,3)
.reshape(3,1,2)
says transforming the array into a 3 element array, whose elements are 2D arrays with shape (1,2). Therefore, the outputres
isarray([[[1,2]],[[3,4]],[[5,6]]])
. Similarly, we can get the result forreshape(3,2,1)
. It isarray([[[1],[2]],[[3],[4]],[[5],[6]]])
.- Compared to the result of
reshape(3,2)
,array([[1,2],[3,4],[5,6]])
, the last two results elements are 2D arrays with shape (1,2) or (2,1) instead of 1D arrays.
Stack
When stacking arrays, we can also specify the axis
for the arrays, and it is sometimes confusing. Two common questions are, what’s the dimension it stacks and how to stack in that dimension. In this section, we will use the three arrays $K_1$, $K_2$, and $K_3$ shown above to illustrate such problems.
0-axis stacking is the simplest stacking case. The stacking dimension is a new dimension, and it generates a new array and assigns $K_1$, $K_2$, and $K_3$ for each element.
For 1-axis and 2-axis stacking, it is more complicated. Let’s begin with the 1-axis stacking. Based on the idea in 0-axis stacking, 1-axis should stack the array in the 2nd dimension. And stacking in the 2nd dimension means we enlarge the 1D 2 element arrays into a 2D arrays with (3,2) shape. For example, array([0,1])
becomes array([[0,1],[1,2],[2,3])
. The newly added elements are from the same position in other input arrays. Thus, after answering the two question, we can construct the array shown above.
For 2-axis stacking, the strategy is similar. The 3rd dimension is a single number element. Therefore, we enlarge it into a 1D array with 4 elements. In this case, we transform array(0)
into array([0,1,2])
. And, as the same, the newly added elements are from the same position in other input arrays.
One Example: Loading CIFAR-10 Data
CIFAR-10 is an important image dataset in Computer Vision and Deep Learning nowadays. The image data stored for python is a 2D array with the shape. It is sometimes hard to devise the right way to convert the data to a regular image format without a clear understanding of the NumPy. But with the help of our aforementioned representation, we can easily find the correct way to load the data. Lets’ try and see.
According to the website description, the data is stored 10000x3072 NumPy array of uint8s in row-major order. Each row is a 32x32 color image, with 3 1024 arrays stored in the RGB sequence. (Each row of the array stores a 32x32 color image. The first 1024 entries contain the red channel values, the next 1024 the green, and the final 1024 the blue. The image is stored in row-major order so that the first 32 entries of the array are the red channel values of the first row of the image.)
Thus, we can quickly build up a representation accordingly. Take a row in the input array as an example. Firstly we can use the reshape(3,-1)
to split the row into 3 different color channels. And, based on the results, we can use reshape(3,32,32)
to transform each element into a 2D array. Finally, we use np.moveaxis()
to change the output array into a channel-last representation, which is compatible with the matplotlib input requirements. The code is shown below.
import numpy as np
import matplotlib.pyplot as plt
def unpickle(file):
with open(file, 'rb') as fo:
dict = pickle.load(fo, encoding='bytes')
return dict
data_dir = "./cifar-10-batches-py/"
train_data_dirs = [data_dir+"data_batch_{}".format(i) for i in range(1,6)]
raw_data = [unpickle(path) for path in train_data_dirs]
train_data = np.vstack([batch[b'data'] for batch in raw_data])
train_data = np.moveaxis(train_data.reshape(-1,3,32,32),1,-1)
plt.imshow(train_data[0,...])
Conclusion
In this passage, we propose a novel representation for understanding NumPy arrays. We explain such representation through examples and demonstrate the efficiency through functions and concrete examples.