Boost Your Machine Learning Skills: 20 Must-Know Numpy Methods for Data Manipulation

·

5 min read

Introduction:

Numpy, short for Numerical Python, is a fundamental library in Python for numerical computing. For machine learning applications, its powerful array manipulation capabilities make it a valuable tool. Utilizing the optimized C and Fortran language, Numpy surpasses in implementing array-centered operations, which is essential for data preprocessing, feature engineering, and model evaluation. In this blog, we will go through 20 essential Numpy methods that every machine learning beginner should master for effective data manipulation in machine learning from data creation to advanced mathematical operations.

Array Creation and Initialisation Methods:

Numpy offers various methods for creating and initializing arrays, providing flexibility and convenience in data handling. functions like np.linspace() for generating evenly spaced values, np.eye() for creating identity matrices, and np.random module for generating arrays with random values.

  • np.array()
import numpy as np

# Creating an array from a Python list
arr = np.array([1, 2, 3, 4, 5])
print(arr)

'''
Output: 
[1 2 3 4 5]
'''
  • np.random.random()
import numpy as np

# Generating arrays with random values
arrrr = np.random.random((2, 2))  # 2x2 array 
print(arr)

'''
Output:
[[ 0.36535247  0.04333894]
 [-1.22124971 -0.02123627]]
'''
  • np.random.randint()
import numpy as np

# Generating arrays with random integer values
arr = np.random.randint(1, 10, size=(3, 3))  # 3x3 array with random integers from 1 to 10

'''
Output:
[[1 7 1]
 [8 7 9]
 [6 9 4]]
'''
  • np.linspace()
import numpy as np

# Creating arrays with evenly spaced values
arr = np.linspace(0, 10, num=5)  # Array of 5 evenly spaced values from 0 to 10

print(arr)

'''
Output:
[ 0.   2.5  5.   7.5 10. ]
'''
  • np.identity()
import numpy as np

#Creating identity matrix
arr = np.identity(4) #4 degree identity matrix

'''
Output:
array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])
'''

Array Manipulation Techniques:

Array manipulation techniques in Numpy allow for reshaping, concatenating, splitting, and transforming arrays according to specific requirements. Methods like np.reshape() and np.concatenate() are invaluable for restructuring data, while functions like np.flip() and np.roll() provide options for reversing or shifting array elements.

  • np.reshape()
import numpy as np

# Changing the shape of arrays
arr = np.arange(6)  # 1D array from 0 to 5
reshaped_arr = arr.reshape((2, 3))  # Reshaping into a 2x3 array

print(reshaped_arr)

'''
Output:
[[0 1 2]
 [3 4 5]]
'''
  • np.concatenate()
import numpy as np 

# Concatenating arrays along specified axes
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6]])
arr3 = np.concatenate((arr1, arr2), axis=0)  # Concatenating along rows

print(arr3)

'''
Output:
[[1 2]
 [3 4]
 [5 6]]
'''
  • np.split()
import numpy as np 

# Splitting arrays into multiple sub-arrays
arr = np.arange(9)  # 1D array from 0 to 8
split_arr = np.split(arr, 3)  # Splitting into 3 equal-sized sub-arrays

print(split_arr)

'''
Output:
[array([0, 1, 2]), array([3, 4, 5]), array([6, 7, 8])]
'''

Mathematical Operations:

Numpy provides a comprehensive set of mathematical operations for array computations. Functions like np.sum(), np.mean(), and np.dot() are frequently used for aggregating data and performing matrix operations, while linear algebra functions like np.linalg.norm() and np.linalg.eig() facilitate advanced numerical computations essential for many machine learning algorithms.

  • np.dot()
import numpy as np

# Dot product of arrays
arr1 = np.array([[1, 2], [3, 4]])
arr2 = np.array([[5, 6], [7, 8]])
dot_product = np.dot(arr1, arr2)

print(dot_product)

'''
Output:
[[19 22]
 [43 50]]
'''
  • np.linalg.inv()
import numpy as np

# Computing matrix inverse
matrix = np.array([[1, 2], [3, 4]])
inverse_matrix = np.linalg.inv(matrix)

print(inverse_matrix)

'''
Output:
[[-2.   1. ]
[ 1.5 -0.5]]
'''
  • np.linalg.det()
import numpy as np

# Computing determinant of matrix
matrix = np.array([[1, 2], [3, 4]])
determinant = np.linalg.det(matrix)

print(determinant)

'''
Output:
-2.0000000000000004
'''
  • np.linalg.eig()
import numpy as np

# Computing eigenvalues/eigenvectors
matrix = np.array([[1, 2], [3, 4]])
eigenvalues, eigenvectors = np.linalg.eig(matrix)

print(eigenvalues)
print(eigenvectors)

'''
Output:
[-0.37228132  5.37228132]
[[-0.82456484 -0.41597356]
 [ 0.56576746 -0.90937671]]
'''
  • np.std()
import numpy as np

# Computing standard deviation
arr = np.array([1, 2, 3, 4, 5])
std_arr = np.std(arr)

print(std_arr)

'''
Output:
1.4142135623730951
'''
  • np.argmax()
import numpy as np

# Computing indices of max values
arr = np.array([1, 2, 3, 4, 5])
max_ind = np.std(arr)

print(max_ind)

'''
Output:
4
'''
  • np.argmin()
import numpy as np

# Computing indices of min values
arr = np.array([1, 2, 3, 4, 5])
max_ind = np.std(arr)

print(max_ind)

'''
Output:
0
'''

Advanced Indexing and Slicing:

Advanced indexing and slicing in Numpy offer powerful ways to access and manipulate array elements based on specific conditions or criteria. Techniques like boolean indexing and fancy indexing provide flexibility in selecting elements that satisfy certain conditions or using arrays of indices for more intricate selection patterns.

  • Fancy Indexing
import numpy as np

# Using arrays of indices to access elements
arr = np.array([1, 2, 3, 4, 5])
indices = [0, 2, 4]
selected_elements = arr[indices]

print(selected_elements)

'''
Output:
[1 3 5]
'''
  • np.where()
import numpy as np

# Finding indices of elements that satisfy a condition
arr = np.array([1, 2, 3, 4, 5])
indices = np.where(arr > 2)

print(indices)

'''
Output:
(array([2, 3, 4], dtype=int64),)
'''

Broadcasting and Vectorisation:

Broadcasting and vectorization are key concepts in Numpy that optimize array operations and improve computational efficiency. Broadcasting allows Numpy to perform element-wise operations on arrays of different shapes by automatically aligning their dimensions. Vectorized operations further enhance performance by executing operations in parallel across array elements.

  • Broadcasting
import numpy as np

# Broadcasting rules and examples
arr1 = np.array([[1, 2, 3], [4, 5, 6]])
arr2 = np.array([10, 20, 30])
broadcasted_result = arr1 + arr2  # Broadcasting arr2 to match the shape of arr1

print(broadcasted_result)

'''
Output:
[[11 22 33]
 [14 25 36]]
'''
  • Vectorisation
import numpy as np

# Vectorized operations
arr = np.array([1, 2, 3, 4, 5])
squared_arr = arr ** 2  # Element-wise square

print(squared_arr)

'''
Output:
[ 1  4  9 16 25]
'''
  • np.vectorize()
import numpy as np

# Vectorizing a Python function to operate on arrays element-wise
def square(x):
    return x ** 2

vectorized_square = np.vectorize(square)
squared_arr = vectorized_square(arr)

print(squared_arr)

'''
Output:
[ 1  4  9 16 25]
'''

Conclusion:

Learning these 20 key Numpy methods is crucial for doing well in machine learning. By adopting these techniques into our projects, we can work more effectively, improve our models, and find fresh observations in our data. So, let us continue to practise and play with Numpy to improve our machine learning skills!

Did you find this article valuable?

Support Jay Patel by becoming a sponsor. Any amount is appreciated!