Awesome Data Science with Python

・1 min read

I have created a list of useful python packages for data science.

r0f1 / datascience

Curated list of python packages and tutorials for data science.

Data Science Awesome


pandas | Data structures built on top of numpy.
scikit-learn | Core ML library.
matplotlib | Plotting library.
seaborn | Python data visualization library based on matplotlib.
pandas_summary | Basic statistics using DataFrameSummary(df).summary().
pandas_profiling | Descriptive statistics using ProfileReport.
sklearn_pandas | Helpful DataFrameMapper class.
janitor | Clean messy column names.
missingno | Missing data visualization.

Pandas and Jupyter

General ticks: link
nteract | Open Jupyter Notebooks with doubleclick.
modin | Parallelization library for faster pandas DataFrame.
xarray | Extends pandas to n-dimensional arrays.
blackcellmagic | Code formatting for jupyter notebooks.
pivottablejs | Drag n drop Pivot Tables and Charts for jupyter notebooks.
qgrid | Pandas DataFrame sorting.
nbdime | Diff two notebook files, Alternative Github App: ReviewNB.


textract | Extract text from any document.

Big Data

spark | DataFrame for big data.
spark cheatsheet
dask | Pandas DataFrame for big data…

Sometimes, I have also linked to Youtube Talks, other Github Repos that contain short examples, etc.

Want to contribute? Let me know.


Short examples are great in this space. Appreciate the list.

Classic DEV Post from Aug 9 '18

What are some blockers for you on contributing to open source projects?

A discussion to understand some of the problems faced by the community on contributing to open source projects

such software.. much wow!