This post was originally published on Stack Vidhya - Add column to dataframe.
Pandas Datafame is a two dimenstional datastructure which stores data in rows and columns structure. After the dataframe is created, you may need to add column to dataframe in pandas for various reasons.
In this tutorial, you'll see different methods available to add column to dataframe in pandas.
If You're in Hurry...
You can use the below code snippet to add new column to pandas dataframe.
To add column with empty values.
df["new_Column"] = pd.NaT
df
where
-
df["new_Column"]
- New column in the dataframe -
pd.NaT
- To specify the values as NaT for all the rows of this column. Its normally used to denote missing values. You can use when you don't know the values upfront.
To add column with values
new_column_values = ['val1','val2','val3','val4','val5']
df["new_Column"] = new_column_values
df
where
-
new_column_values = ['val1','val2','val3','val4','val5']
- List which have values for the cells in the new column. The length of this list must be equal to the length of the dataframe. Otherwise, as error will be raised. -
df["new_Column"]= new_column_values
- Creating a new column in the dataframe and assign the list of values to the new column.
This is how you can add new column to the dataframe in pandas.
If You Want to Understand Details, Read on…
In this tutorial, you’ll learn the different methods available to add column to dataframe. You can add column using
-
Assignment operator or the subscript notation - Use the assignment operator
=
to create a column in the dataframe and assign list of values. -
dataframe.insert() method - Use
insert()
method when you want to insert a column in a specific index position of the dataframe. -
datafame.assign() method - Use
assign()
method when you want to insert a column and create a new dataframe out of it rather inserting a new column in the same dataframe.
Let's look in the details of the scenario of adding a new column to the existing dataframe.
Sample Dataframe
This is the sample dataframe used throughout the tutorial.
import pandas as pd
data = {"product_name":["Keyboard","Mouse", "Monitor", "CPU", "Speakers"],
"Unit_Price":[500,200, 5000, 10000, 250],
"No_Of_Units":[5,5, 10, 20, 8],
}
df = pd.DataFrame(data)
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | |
---|---|---|---|
0 | Keyboard | 500 | 5 |
1 | Mouse | 200 | 5 |
2 | Monitor | 5000 | 10 |
3 | CPU | 10000 | 20 |
4 | Speakers | 250 | 8 |
Let's see the different types of adding a column to pandas dataframe.
Add Empty Column to Dataframe
In this section, you'll learn how to add empty column to dataframe.
You can add a column by using the =
operator with value pd.NaT
.
Snippet
df["new_column"] = pd.NaT
pd.NaT
is used to denote the missing values in the Pandas dataframe. When you assign this value to new column, a new column will be added to the dataframe with values as NaT
which ideally means a null
value.
When you execute the below line, a new column called Total_Price will be added to the dataframe with NaT values.
df["Total_Price"] = pd.NaT
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Total_Price | |
---|---|---|---|---|
0 | Keyboard | 500 | 5 | NaT |
1 | Mouse | 200 | 5 | NaT |
2 | Monitor | 5000 | 10 | NaT |
3 | CPU | 10000 | 20 | NaT |
4 | Speakers | 250 | 8 | NaT |
You've learnt how to add a column to pandas dataframe with empty values.
Next, you'll learn how to add column to with values.
Add Column With Value
In this section, you'll learn how to add column with value to the pandas dataframe.
Using Subscript Notation or Assignment operator**
You can add a column by using the =
operator with list of values. The length of the list of values must be equal to the length of the rows in the dataframe. Otherwise, an error will be raised.
list = ['val1','val2','val3','val4','val5']
df["new_column"] = list
where,
-
list = ['val1','val2','val3','val4','val5']
- creating a list with values -
df["new_column"] = list
- assigning the list to the dataframe column called "new_column".
When you execute the below code snippet, a new column called Tax_new will be added to the dataframe with values available in the list called as tax
.
tax = [10,15,12,10,11]
df['Tax_new %'] = tax
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Total_Price | Tax_new % | |
---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | NaT | 10 |
1 | Mouse | 200 | 5 | NaT | 15 |
2 | Monitor | 5000 | 10 | NaT | 12 |
3 | CPU | 10000 | 20 | NaT | 10 |
4 | Speakers | 250 | 8 | NaT | 11 |
Using the Insert() method
You can add column to pandas dataframe using the insert() method available in the pandas dataframe.
Usage
- When you want to insert a column in specific position
- To avoid inserting duplicate columns with the same name. You can avoid duplicates by specifying
allow_duplicates
flag.
Below is the code snippet to add column using the insert()
method.
# Using DataFrame.insert() to add a column
df.insert(3, "Tax%", [5,10,10,5,10], True)
df
where,
-
3
- Position where the new column needs to be inserted -
Tax%
- Name of the new column -
[5,10,10,5,10]
- List of values to be assigned to the new column -
True
- To allow duplicate columns. IfFalse
, the new column will not be inserted if a column with name Tax% is already existing.
Dataframe Looks Like
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}
.dataframe tbody tr th {
vertical-align: top;
}
.dataframe thead th {
text-align: right;
}
product_name | Unit_Price | No_Of_Units | Tax% | Total_Price | Tax_new % | |
---|---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 | NaT | 10 |
1 | Mouse | 200 | 5 | 10 | NaT | 15 |
2 | Monitor | 5000 | 10 | 10 | NaT | 12 |
3 | CPU | 10000 | 20 | 5 | NaT | 10 |
4 | Speakers | 250 | 8 | 10 | NaT | 11 |
Using the Assign() method
You can add column to pandas dataframe using the assign() method available in the pandas dataframe.
Usage
- When you cant to create a new dataframe with the existing dataframe with additional new columns inserted.
- If you want to avoid modifications in the original dataframe.
Below is the code snippet to add column using the assign()
method.
df2 = df.assign(Remarks = pd.NaT)
df2
where,
-
Remarks = pd.NaT
- Remarks is the column name to be inserted.pd.Nat
is the values to be assigned to the new column. Note that, the column name is not enclosed with single quotes or double quotes.
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Tax% | Total_Price | Tax_new % | Remarks | |
---|---|---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 | NaT | 10 | NaT |
1 | Mouse | 200 | 5 | 10 | NaT | 15 | NaT |
2 | Monitor | 5000 | 10 | 10 | NaT | 12 | NaT |
3 | CPU | 10000 | 20 | 5 | NaT | 10 | NaT |
4 | Speakers | 250 | 8 | 10 | NaT | 11 | NaT |
This is how you can add column with value in three different methods available in the pandas dataframe.
Next, you'll add column at specific index.
Add column At Specific Index
In this section, you'll add a column at specific position.
You can use this method to add column based on index. You'll use the insert() method available in the dataframe to add column at specific index.
Use the below snippet to add column at specific index.
# Using DataFrame.insert() to add a column
df.insert(3, "State Tax", [5,10,10,5,10], True)
df
where,
-
3
- Position where the new column needs to be inserted -
State Tax
- Name of the new column -
[5,10,10,5,10]
- List of values to be assigned to the new column -
True
- To allow duplicate columns. IfFalse
, the new column will not be inserted if a column with name Tax% is already existing.
Index is zero based. Hence you'll see the new column State Tax added in the fourth position of the dataframe.
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | State Tax | Tax% | Total_Price | Tax_new % | |
---|---|---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 | 5 | NaT | 10 |
1 | Mouse | 200 | 5 | 10 | 10 | NaT | 15 |
2 | Monitor | 5000 | 10 | 10 | 10 | NaT | 12 |
3 | CPU | 10000 | 20 | 5 | 5 | NaT | 10 |
4 | Speakers | 250 | 8 | 10 | 10 | NaT | 11 |
The State Tax and the Remarks column is added for demonstration. Now let's delete these columns. Refer how to drop column in pandas dataframe to know about deleting column in pandas dataframe.
Now, use the below snippet to delete the columns at position 3 and 6.
#Droping the column added for demonstration.
# drop the duplicate column
df.drop(df.columns[[3,6]], axis=1, inplace=True)
df
Where,
-
df.columns[[3,6]]
- Specifying column indexes to be deleted. Note that, the column numbers are enclosed in double square brackets.[[
,]]
. This is necessary, if you want to delete more than one column at once. -
axis=1
- Specifying the drop option to be made in the column axis.axis=0
will perform drop operation in the row axis. which means the row will be deleted. -
inplace=True
- Specifying the drop operation must be made in the same dataframe rather creating a copy of the dataframe after delete operation.
Now the columns in the index 3 and 6 will be deleted.
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Tax% | Total_Price | |
---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 | NaT |
1 | Mouse | 200 | 5 | 10 | NaT |
2 | Monitor | 5000 | 10 | 10 | NaT |
3 | CPU | 10000 | 20 | 5 | NaT |
4 | Speakers | 250 | 8 | 10 | NaT |
You've learnt how to add column at specific index.
Next, you'll learn how to add column with constant value.
Add Column to Dataframe With Constant Value
In this section, you'll learn how to add column to dataframe with constant value. This means, all the cells in the newly added column will have the same constant value.
You can do this by assigning a single value using the assignment operation as shown below.
df["Price_Increase_Col"] = 200
df
Where,
-
df["Price_Increase_Col"]
- specifying the new column in the dataframe. -
200
- Constant value to be added to all the cells in the new column.
Now, a new column called Price_Increase_Col will be added to the dataframe with the value 200 in all the cells.
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Tax% | Total_Price | Price_Increase_Col | |
---|---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 | NaT | 200 |
1 | Mouse | 200 | 5 | 10 | NaT | 200 |
2 | Monitor | 5000 | 10 | 10 | NaT | 200 |
3 | CPU | 10000 | 20 | 5 | NaT | 200 |
4 | Speakers | 250 | 8 | 10 | NaT | 200 |
You've learnt how to add column to dataframe in various cases.
Next, you'll learn how to add multiple column to dataframe at once.
Add Multiple Column to Dataframe
In this section, you'll learn how to add multiple columns to dataframe in pandas.
You can do this also by using the assignment operator.
Syntax
df['new_column_1'], df['new_column_2'] = [constant_value_for_Col_1, constant_value_for_Col_2]
df
You can use this to add multiple columns at once and the cells will have the same constant values when you use the above syntax.
In the below example, you're adding two columns Product_Category and Available_Units to the dataframe df
.
df['Product_Category'], df['Available_Units'] = [pd.NaT, 3]
df
Where,
-
df['Product_Category'], df['Available_Units']
- List of new columns to be added separated by comma. -
[pd.NaT, 3]
- List of constant values to be added as a default value for the newly added column respectively.
Now, two new columns are added to the dataframe.
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Tax% | Total_Price | Price_Increase_Col | Product_Category | Availabile_Units | |
---|---|---|---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 | NaT | 200 | NaT | 3 |
1 | Mouse | 200 | 5 | 10 | NaT | 200 | NaT | 3 |
2 | Monitor | 5000 | 10 | 10 | NaT | 200 | NaT | 3 |
3 | CPU | 10000 | 20 | 5 | NaT | 200 | NaT | 3 |
4 | Speakers | 250 | 8 | 10 | NaT | 200 | NaT | 3 |
Now, you've learnt how to add multiple columns to dataframe at once.
Next, you'll need to drop the added columns to cleanup the dataframe. So we can use the same for the upcoming usecases.
Four columns added are Total_Price, Price_Increase_Col, Product_Category, Available_Units in the index 4,5,6,7 respectively.
Use the below snippet to drop these columns.
#Droping the column added for demonstration.
# drop the duplicate column
df.drop(df.columns[[4,5,6,7]], axis=1, inplace=True)
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Tax% | |
---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 |
1 | Mouse | 200 | 5 | 10 |
2 | Monitor | 5000 | 10 | 10 |
3 | CPU | 10000 | 20 | 5 |
4 | Speakers | 250 | 8 | 10 |
This is how you can multiple columns at once to the existing dataframe.
Next, you'll learn how to add column with values based on other columns in the dataframe.
Add New column to Dataframe Based on Other Columns
In this section, you'll learn how to add new column to Dataframe based on other columns in the Dataframe.
This would be useful when you want to perform data manipulations using values in the existing columns and store it in a new column.
You can use the apply() method for this. This will apply a function to a column.
In the below example, you calculate the total price by multiplying unit_price and the No_Of_Units column.
A lambda function is created to mutliply these two columns of each row. and this is applied to the dataframe column Total_Price.
df['Total_Price'] = df.apply(lambda row: row.Unit_Price * row.No_Of_Units, axis=1)
df
Where,
-
df['Total_Price']
- New column to be created -
lambda row: row.Unit_Price * row.No_Of_Units
- Lambda function which will multiply unit_price column and the no_of_Units column of each row. -
axis=1
- To specify the apply function should happen at the column level.
Now the apply function is executed, the total price is calculated and added to the dataframe.
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Tax% | Total_Price | |
---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 | 2500 |
1 | Mouse | 200 | 5 | 10 | 1000 |
2 | Monitor | 5000 | 10 | 10 | 50000 |
3 | CPU | 10000 | 20 | 5 | 200000 |
4 | Speakers | 250 | 8 | 10 | 2000 |
This is how you can add new column to dataframe based on other columns in a dataframe.
Next, you'll learn how to just add column Names to Dataframe.
Add Column Names to Dataframe
In this section, you'll add column names to Dataframe. This means, just adding a column with name and no values to the cell of this column. It is similar to adding pd.NaT
values available in the pandas dataframe.
You can do this by assigning None
value to the Dummy Column as shown below. None
is a pandas object
to denote the missing data for the type - dtype strings
df["Dummy_Column"] = None
df
Dataframe Looks Like
product_name | Unit_Price | No_Of_Units | Tax% | Total_Price | Dummy_Column | |
---|---|---|---|---|---|---|
0 | Keyboard | 500 | 5 | 5 | 2500 | None |
1 | Mouse | 200 | 5 | 10 | 1000 | None |
2 | Monitor | 5000 | 10 | 10 | 50000 | None |
3 | CPU | 10000 | 20 | 5 | 200000 | None |
4 | Speakers | 250 | 8 | 10 | 2000 | None |
Conclusion
To summarize, you've learnt how to add column to pandas dataframe. You've learnt different methods available in the pandas Dataframe to add new column in the existing dataframe along with the different usecases to add new columns.
If you've any questions or feedback feel free to comment below.
Top comments (0)