DEV Community

Anderson Braz
Anderson Braz

Posted on • Originally published at andersonbraz.com on

Data Science in Python: Pandas Introduction

In this post I show basic knowledge and notes for data science beginners. You will find in this post an link to jupyter file with code and execution.

Pandas Basics

Pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language.

Use the following import convention:

import pandas as pd

Enter fullscreen mode Exit fullscreen mode

Pandas Data Structure

Series

A one-dimensional labeled array capable on hold any data type

s = pd.Series([23, 55, -7, 2], index=['a', 'b', 'c', 'd'])
s

Output:
a 23
b 55
c -7
d 2
dtype: int64

Enter fullscreen mode Exit fullscreen mode

DataFrame

A two-dimensional labeled data structure with columns of potentially different types

data = {'Country' : ['China', 'India', 'United States', 'Indonesia', 'Pakistan', 'Brazil', 'Nigeria', 'Bangladesh', 'Russia', 'Mexico'],
'Population':[1406371640, 1372574449, 331058112, 270203917, 225200000, 212656200, 211401000, 170054094, 146748590, 126014024] }
df = pd.DataFrame(data, columns=['Country', 'Population'])
df

Output:
Country Population
0 China 1406371640
1 India 1372574449
2 United States 331058112
3 Indonesia 270203917
4 Pakistan 225200000
5 Brazil 212656200
6 Nigeria 211401000
7 Bangladesh 170054094
8 Russia 146748590
9 Mexico 126014024

Enter fullscreen mode Exit fullscreen mode

Selection

Also see NumPy Arrays

Getting

s['b']

Output: 5

Enter fullscreen mode Exit fullscreen mode

AND

df[6:]

Output:
Country Population
6 Nigeria 211401000
7 Bangladesh 170054094
8 Russia 146748590
9 Mexico 126014024

Enter fullscreen mode Exit fullscreen mode

Selecting, Boolean, Indexing & Selecting

By Position

df.iloc[3, 0]

Output: 'Indonesia'

Enter fullscreen mode Exit fullscreen mode

By Label

df.loc[[6], 'Country']

Output:
6 Nigeria
Name: Country, dtype: object

Enter fullscreen mode Exit fullscreen mode

Boolean Indexing

result = df[df['Population'] > 270203917]
result

Output:
Country Population
0 China 1406371640
1 India 1372574449
2 United States 331058112

Enter fullscreen mode Exit fullscreen mode

Setting

s['a'] = 777
s['d'] = 999
s

Output:
a 777
b 5
c -7
d 999
dtype: int64

Enter fullscreen mode Exit fullscreen mode

Conclusion

Pandas is flexible and easy to use analysis and manipulation data.

See on Practice - Code and Execution

Credits

Photo by fabio on Unsplash

Top comments (0)