What is Data analysis?
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.
Tools used in Data Analysis :
Auto-managed closed tools->
Qwiklabs, Tableau,Looker, Zoho Analytics
The programming language used: Python,R,Julia
Why Python for Data Analysis?
*-very simple and intuitive to learn.
-free and open source
-Amazing community,docs, and conferences
**When to choose R language?
*- When R studio is needed
-When dealing with advanced statistical methods.
-When extreme performance is needed.
**Data analysis Process:
SQL,Scrapping ,File format(CSV,JSON,XML),Consulting APIs,Buying Data,Distributed database
2:Data cleaning ->
• Missing values and
• Data imputation
• Incorrect types
Incorrect or invalid
• Outliers and non
● Statistical sanitization
Indexing data for
and joining data
• Building statistical
• Visualization and
. Correlation vs
• Hypothesis testing
● Statistical analysis
• Building Machine
• Moving ML into
• Building ETL
• Live dashboard and
• Decision making
and real-life tests
**The libraries we can use ...
pandas: The cornerstone of our Data Analysis job with Python
matplotlib:The foundational library for visualizations.Other libraries we'll use will be
built on top of matplotlib.
numpy:The numeric library that serves as the foundation of all calculations in Python.
seaborn:A statistical visualization tool built on top of matplotlib.
statsmodels:A library with many advanced statistical functions.
scipy:Advanced scientific computing, including functions for optimization,linear
algebra, image processing and much more.
scikit-learn:The most popular machine learning library for Python (not deep learning)