Whenever someone decides to enter into the world of data science, the one library he needs to learn is NumPy. NumPy Stands for Numerical Python and this article will be like a crash course in NumPy.

So Why Numpy?

NumPy is similar to python list and can perform most of the operations python lists perform but NumPy has its own advantages for which it is the preferred library for handling large numerical data. Some of these are:

NumPy, unlike python lists, has homogenous data type because of which it can be densely packed into memory and it also frees memory faster. This makes it 10 to 100 times faster than python lists.
Numpy breaks down a task into smaller fragments and can process those fragments parallelly.
Numpy internally uses languages like Fortran, C, and C++ which have much faster execution time compared to python.

Getting our hands dirty with some code

import numpy as np

The above line will import NumPy and alias it to np(a standard alias followed in the data science community for NumPy).

np.array: This numpy method creates a numpy array. We can pass in 1-D, 2-D or N-D list and it will convert it to numpy array. We can also pass in the datatype of the elements of the array. The below code will create a 2-D numpy array with dtype as float32)

np.array([[1,2],[3,4]],dtype=np.float32)

np.ones: This method will create an array of ones, you just need to pass in the shape of the array. In the below code block shape (3,4) means 3 rows and 4 columns for matrix. If you pass a single integer in np.ones, it will create an array with that many elements.

np.zeros: Similar to np.ones, it will create an array of zeros.

>>>np.zeros(4)
>>>array([0.,0.,0.])

np.eye: This method creates an identity matrix of m*m(since the identity matrix is a square matrix with diagonal elements as 1 and other elements as 0). We just need to pass in the value of m.

np.random.rand: This method generates random elements using uniform distribution. We just need to pass in the shape of the array.

np.random.rand with a shape of (3,4) meaning 3 rows and 4 columns

np.random.randn: This is similar to np.random.rand elements but the elements are chosen from the gaussian or normal distribution.

np.random.randint: This method returns random integers from low(inclusive) to high(exclusive) with a given size of the output array.

np.arange: This function is similar to the python range method. You just need to pass in the start value(inclusive), ending value(exclusive), and interval value.

Reshaping numpy arrays

Reshaping as the name suggests means to change the shape of the numpy array given the new shape is compatible with the old shape. For eg: suppose initially the shape of my numpy array is (3,4) which means it has 12 elements(3*4), now the allowed shapes should be such that they have a product of 12 i.e allowed shape can be (1,12),(2,6),(3,4),(4,3),(6,2),(12,1).

Indexing and Selection

Indexing and slicing in numpy works same as in python lists. Numpy arrays follows 0-indexing.

In the above example we have an array of shape (2,3). Now the first param in slicing is for rows and the second is for columns. So in our examples we want all the rows that is why we have a[:] which means all rows and we want columns with index 1 till end that is why we have a[1:] combining these two we have a[:,1:].

To Keep in mind

The slicing is applied dimension-wise in numpy i.e for dimension 1 (rows) we have a slicing logic and for dimension 2 (columns) we have different logic for slicing. In the same way, we have to handle slicing for the N-d array in NumPy.

Trick question

NumPy sliced object references to the elements of the main array. So if we make any changes in the sliced array, it will be reflected in the main array.

To solve this issue we can use array.copy() method to do a deep copy of array.

Conditional Expressions in NumPy

Suppose you have an array containing both negative and positive elements, and you want only positive elements from the array. Well, NumPy provides an elegant way to perform these conditional operations. Refer to the below image for a better idea.

Numpy Operations

One cool thing about NumPy is the ease of performing arithmetic operations on arrays and the mathematical functions it provides.

Arithmetic operations

All the arithmetic operations in NumPy are performed element-wise.

>>>a = np.array([1,2,3])
>>>b = np.array([4,5,6])
>>>a+b
   array([5,7,9])
>>>a*b
   array([4,10,18])
>>>a/b
   array([0.25,0.4,0.5])

Universal Functions

These are functions provided by numpy and they are applied on element-by-element basis.

>>>a = np.array([1,2,3])
>>>np.sqrt(a) 
    array([1.        , 1.41421356, 1.73205081])
>>>np.sin(a)
   array([0.84147098, 0.90929743, 0.14112001])
>>>np.exp(a)
   array([ 2.71828183,  7.3890561 , 20.08553692])

The above methods are just the tip of the iceberg, you can refer to all these methods using the below link: https://numpy.org/doc/stable/reference/routines.math.html

Homework for you

We have just scratched the surface of the NumPy library. Some topics you need to be familiar with regarding numpy are:

Broadcasting in NumPy: https://numpy.org/doc/stable/user/basics.broadcasting.html
Universal functions in NumPy: https://numpy.org/doc/stable/reference/ufuncs.html

A quick guide to getting started with NumPy

To Keep in mind

Numpy Operations

Arithmetic operations

Universal Functions

Related Posts

Deep Dive into Logistic Regression

Bias and Variance in the Deep Learning era