Pandas is a Python library for PANel DAta manipulation and analysis, example: multidimensional time series and cross-sectional data sets commonly found in statistics, experimental science results, econometrics, or finance.
Pandas is implemented primarily using NumPy and Cython; it is intended to be able to integrate very easily with NumPy-based scientific libraries, such as statsmodels.
Pandas is one of the main data science libraries in Python.
Pandas allows importing data from various file formats such as comma-separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.
Pandas allows various data manipulation operations such as merging, reshaping, selecting, as well as data cleaning, and data wrangling features.
- Data structures: for one- and two-dimensional labeled datasets (respectively Series and DataFrames). Some of their main features include:
- Automatically aligning data and interpolation
- Handling missing observations in calculations
- Convenient slicing and reshaping ("reindexing") functions
- Categorical data types
- Provide 'group by' aggregation or transformation functionality
- Tools for merging and joining together data sets
- Simple Matplotlib integration for plotting and graphing
- Multi-Indexing providing structure to indices that allow for representation of an arbitrary number of dimensions.
- Date tools: objects for expressing date offsets or generating date ranges. Dates can be aligned to a specific time zone and converted or compared at will
- Statistical models: convenient ordinary least squares and panel OLS implementations for in-sample or rolling time series and cross-sectional regressions. These will hopefully be the starting point for implementing models
- Intelligent Cython offloading; complex computations are performed rapidly due to these optimizations.
- Static and moving statistical tools: mean, standard deviation, correlation, and covariance
- Rich User Documentation, using Sphinx