In the previous post, I discussed applying Principal Component Analysis (PCA) as a way to reduce feature space, by capturing maximum variance. So, why is variance a big deal? If there is no or little variation, it means we are not getting enough predictive information from the feature.
Here, I wanted to experiment with covariance and correlation matrices and try to derive principal components using linear algebra. I will use the London bike sharing dataset from my previous blog.
### use 4 features features = data.iloc[:,:4] features.sample(5)
So, the idea is to find eigenvalues and eigenvectors of the correlation matrix. Then, we can use eigenvalues to describe the explained variation ratio.
Let’s start with the correlation matrix. We can use either original features or standardized for correlation matrix calculation, as it gives the same result.
from sklearn.preprocessing import StandardScaler # standardize features features_scaled = StandardScaler().fit_transform(features) # get correlation matrix on scaled features np.corrcoef(features_scaled.T) # correlation matrix on unscaled data gives the same result # features.corr() e_vals, e_vects = np.linalg.eig(np.corrcoef(features_scaled.T)) # normalize eigenvalues to get variations for each PC # sorted explained variation # use np.sort(e_val)[::-1] to sort in descending order print([x/e_vals.sum() for x in np.sort(e_vals)[::-1]])
We could do the same using the covariance matrix, but it requires a few more extra matrix manipulations before getting eigenvalues. You can find the code here.
Let's check if we get the same result using sklearn’s PCA. I would like to mention that per sklearn documentation, it uses SVD decomposition instead of eigendecomposition as we did above and we also need to standardize the data.
# Use sklearn's PCA from sklearn.decomposition import PCA pca = PCA() #fit and transform scaled features pca.fit_transform(features_scaled) # get explained varion ratio print(pca.explained_variance_ratio_)
As we can see, we got the same variance ratio. Based on that analysis, we can further decide how many principal components we would use and accordingly apply the projection matrix or sklearn to transform features into principal components.