Manipulation
Let's say you need to see only one column of your dataframe. To see the 'fixed acidity' column of our dataset, you need to write:
df['fixed acidity']
If you add a condition to this column, for example, if you want to see the rows that has a fixed acidity higher than 9:
df[df['fixed acidity']>9]
Sometimes you might need rows with multiple conditions added to columns:
df[(df['fixed acidity']>9) & (df['citric acid']>0.5)]
If you need to find specific columns:
df.loc[:,['volatile acidity', 'chlorides']]
You may want to add conditions with them too, for example, you may want to see the 'volatile acidity' and 'chlorides' content of those rows that have a 'fixed acidity' of 9.2:
df.loc[df['fixed acidity'] == 9.2, ['fixed acidity','volatile acidity', 'chlorides']]
You can view the rows for specific indices (as discussed in the previous chapter) too, like this:
df.loc[0:3, ['volatile acidity', 'chlorides']]
Now, if you want to locate a specific value, for example, the alchohol content of the wine of 0th row:
df['alcohol'].loc[0]
and you will get a value of 9.4
You can find locate a row using its index too:
df.iloc[100]
Now if you want to pinpoint a value within this, for example, the 1st attribute (volatile acidity in this case) of the 100th row, try:
df.iloc[100][1]
and you will get 0.61 as expected.
You can locate specific consecutive rows and columns using this iloc command, for example, first three columns of 3rd to 7th row:
df.iloc[3:8, 0:3]
and non consecutive rows and columns too:
df.iloc[[71, 122, 400], [0, 2]]
What if you want to add a new column to your dataframe? Let's add a 'new column' containing the word 'hi' for all rows:
df['new column'] = 'hi'
df.head()
Let's try changing the value of 'new column' of 0th index of the dataframe using iloc from 'hi' to 'bye':
df.iloc[0, df.columns.get_loc('new column')]= 'bye'
df.head()
Now let's try to find the word starts with 'by' (that we just have added) and replace it with 'hello':
df['new column'].loc[df['new column'].str.startswith('by')] = 'hello'
df.head()
You can also replace null values of your data using pandas. We do not have any null values here, so let's introduce a null value first. Let's replace the string 'hello' with null. To do so, we would need the numpy library.
import numpy as np
df['new column'].loc[df['new column'].str.startswith('hel')] = np.nan
df.head()
To check the number of null values, you can use the isna() method like this:
df.isna().sum()
This isna() method can also be used to locate the null value like this:
pd.isna(df.head())
Let's replace the null value with 'hey'.
df.fillna(value='hey', inplace=True)
df.head()
If you want to drop null values, use the dropna() method.
Now we will try to create a new dataframe using a loop, where one column of the new dataframe would look the same as the 'new column' of our dataframe df.
rows = []
for i in range(df.shape[0]):
rows.append(['hi', 'bye'])
df_new = pd.DataFrame(rows, columns=["new column 2", "new column 3"])
df_new.iloc[0, df_new.columns.get_loc('new column 2')]= 'hey'
df_new.head()
You can merge these two dataframes using their common attributes:
df_merged = df.merge(df_new, left_on='new column', right_on='new column 2')
df_merged.head()
You can make necessary variations in your merging operations by dropping mismatched attributes, or by using a column with common name and so on.
You can also group your dataframes:
df.groupby(['volatile acidity', 'chlorides']).count().head()
You can also group the dataframes using other attributes like sum.
When you are done with manipulation of your dataframes, you are ready to visualize your data.
Top comments (0)