It’s an indisputable fact, Python is one of the most common and preferable programming language for Data Analytics.
NumPy, SciPy (Pronounced as Sigh pie), Pandas and Matplotlib are the Python Libraries used for Data Analysis.
What are NumPy, SciPy, Pandas and Matplotlib?
When Python was developed by Guido Van Rossum, Matrices, Vectors, Data Frames, and Graphs are not the default data types or functionality of the programming language. As Python becomes more and more popular among developers (due to its utter simplicity), this major shortcoming was identified and fixed.
NumPy, SciPy, Pandas and Matplotlib are the Python libraries and each library has its own importance. We will discuss each of them in detail.
What is NumPy?
NumPy is a core Python library for Data Analysis in Python. Numpy has an n-dimensional data structure for efficient computation of arrays and matrices. NumPy arrays are uniform in kind.
There are two parts of an n-dimensional array (ndarray):
- Data Stored in Memory
- Meta Data
NumPy Arrays saves as contiguous C array in Python. The Contiguous array stored in a continuous block of memory.
A Typical 2D, 3 x 3 array
[1, 2, 3],
[4, 5, 6],
[7, 8, 9]
And how python stores it:
NumPy (Numerical Python) was developed:
- To fix the mathematical shortcomings of Python.
- To enhance the capability of Python with ‘n’ dimensional arrays.
- Unlike Python built-in lists n-dimensional arrays are homogeneous in nature
- These n-dimensional arrays are often referred as ndarray in Python.
Installation of NumPy
Assuming that ‘pip’ is already installed on local machine. If ‘pip’ is not installed, then you need to install pip first.
From the Command line terminal. Run the following command
pip install numpy Collecting numpy Downloading https://files.pythonhosted.org/packages/8e/75/7a8b7e3c073562563473f2a61bd53e75d0a1f5e2047e576ee61d44113c22/numpy-1.14.3-cp36-cp36m-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.macosx_10_10_intel.macosx_10_10_x86_64.whl (4.7MB) 100% |████████████████████████████████| 4.7MB 2.2MB/s Installing collected packages: numpy Successfully installed numpy-1.14.3
Now, the NumPy is installed successfully on your system. Now you are
ready to use the NumPy libraries in Python.
Getting Started With NumPy
After the installation of NumPy, let’s get started with NumPy Python Library. Installation of other Python libraries will be covered in the next subsequent blogs.
NumPy Arrays or ndarrays are efficient in numerical operations and the dimensions of the NumPy are called as axes.
import numpy as np # Creates One Dimensional Array numpy_array = np.array([1, 2, 3]) print (numpy_array) # Two Dimensional Array numpy_array = np.array([ [1, 2, 3], [4, 5, 6] ]) print(numpy_array) # Data Type of NumPy Arrays int64 print("Data Type of NumPy Array:", numpy_array.dtype)
np is an alias of numpy.
By Default, 64-bit integer assigns to the numpy array of integer type, but we can explicitly define the data-type.
This code will create the explicit data type of int32.
# Create Array with Explicit Data Type explicit_datatype = np.array((np.arange(5)), dtype=np.int32) print("\nExplicit Data Array:") print(explicit_datatype) print("Data Type:", explicit_datatype.dtype)
Let's do some more experiments with NumPy array. Though it never use in real programming but it's just for fun.
# Create an Array of One one_array = np.ones((3, 5)) print("\nArray of One's") print(one_array)
Above we specify to create a numpy array with 3 rows and 5 Columns. The output is similar to
Array of One's [[1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.] [1. 1. 1. 1. 1.]]
Similarly array of zero's
# Create an Array of Zeros zero_array = np.zeros((3,5)) print("\n Array of Zero's") print(zero_array)
Try to run it yourself and compare the output.
Joining two Numpy Arrays Horizontally
# Joining Two Arrays Horizontally array_1 = np.arange(8).reshape(2, 4) array_2 = np.arange(4).reshape(2, 2) print(array_1)
Reshape converts one dimensional array to multi-dimensional. In the above example, numpy array with 8 elements reshaped two 2 rows and 4 Columns.
[[0 1 2 3] [4 5 6 7]]
hstack is used to join the array horizontally.
import numpy as np # Joining Two Arrays Horizontally array_1 = np.arange(8).reshape(2, 4) array_2 = np.arange(4).reshape(2, 2) print("*** Array 1 ***") print(array_1) print("*** Array 2 ***") print(array_2) array_3 = np.hstack((array_1, array_2)) print("*** Array 1 + Array 2 ***") print(array_3)
The output is:
*** Array 1 *** [[0 1 2 3] [4 5 6 7]] *** Array 2 *** [[0 1] [2 3]] *** Array 1 + Array 2 *** [[0 1 2 3 0 1] [4 5 6 7 2 3]]
Remember: Number of columns must be same while joining the arrays horizontally.
Joining two Numpy Arrays Vertically
vstack is used to join the array horizontally.
import numpy as np # Joining Two Arrays Vertically array_1 = np.arange(8).reshape(2, 4) array_2 = np.arange(4).reshape(1, 4) print("*** Array 1 ***") print(array_1) print("*** Array 2 ***") print(array_2) array_3 = np.vstack((array_1, array_2)) print("*** Array 1 + Array 2 ***") print(array_3)
The output is:
*** Array 1 *** [[0 1 2 3] [4 5 6 7]] *** Array 2 *** [[0 1 2 3]] *** Array 1 + Array 2 *** [[0 1 2 3] [4 5 6 7] [0 1 2 3]]
Test Your Knowledge
- Create two numpy arrays and join them horizontally and then vertically.
- Slice the joined array up-to 3rd Index and stores in a different variable.
Hint: y[:n], starts with zeroth index and goes up-to n – 1 Index
Let me know the in the comments section, How it goes.