Here I'll put down the basics of working with Pandas DataFrames.
DataFrame is the primary Pandas data structure, which allows us to easily work with data tables.
A data frame can be constructed from a dict:
import pandas as pd
frame = pd.DataFrame({'numbers': range(3), 'chars': ['a'] * 3})
it gives us the following output:
chars | numbers | |
---|---|---|
0 | a | 0 |
1 | a | 1 |
2 | a | 2 |
Also, DataFrame can be initialized from a .csv file:
frame = pd.read_csv('file.csv', headers=0, sep='\t')
The first argument is a file name, the second is indexes of rows containing headers (int or a list of ints), the third is a data separator that will be used, a tab here ('\s' for a single whitespace, '\s+' for multiple whitespaces).
Columns headers can be extracted using the following:
frame.columnsm
# Index([u'chars', u'numbers], dtype='object')
Another useful command returns the frame size:
frame.shape
# (3, 2)
Let's add a row:
new_line = {'chars': 'b', 'numbers': 8}
frame.append(new_line, ignore_index=True, inplace=True)
frame
it gives us the following output:
chars | numbers | |
---|---|---|
0 | a | 0 |
1 | a | 1 |
2 | a | 2 |
3 | b | 8 |
The inplace keyword shows that the result should be written into the original variable, not just return the resulting frame to the output.
Adding a columns is easier:
frame['bools'] = [False] * 3 + [True]
frame
chars | numbers | bools | |
---|---|---|---|
0 | a | 0 | False |
1 | a | 1 | False |
2 | a | 2 | False |
3 | b | 8 | True |
Rows and columns can be dropped:
# the first argument is a list if indexes,
# the second is the axis (0 is for rows, 1 is for columns)
frame.drop([0,1], axis=0, inplace=True)
chars | numbers | bools | |
---|---|---|---|
0 | a | 2 | False |
1 | b | 8 | True |
The result can be saved into a .csv file:
frame.to_csv('updated.csv', set=',', header=True, index=None)
Top comments (0)