Md Manawar Iqbal

Posted on

# Introduction to Data Analysis

What is Data analysis?
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.
Tools used in Data Analysis :
Auto-managed closed tools->
Qwiklabs, Tableau,Looker, Zoho Analytics
The programming language used: Python,R,Julia
Why Python for Data Analysis?
*-very simple and intuitive to learn.
-correct language
-powerful libraries

-free and open source
-Amazing community,docs, and conferences
**When to choose R language?
*
- When R studio is needed
-When dealing with advanced statistical methods.
-When extreme performance is needed.
**Data analysis Process:

1:Data extraction->
SQL,Scrapping ,File format(CSV,JSON,XML),Consulting APIs,Buying Data,Distributed database
2:Data cleaning ->
• Missing values and
empty data
• Data imputation
• Incorrect types
Incorrect or invalid
values
• Outliers and non
relevant data
● Statistical sanitization

1. Data Wrangling->

Hierarchical Data
Handling categorical
data
Reshaping and
transforming
structures
Indexing data for
quick access
Merging,combining
and joining data
4:Analysis->
• Exploration
• Building statistical
models
• Visualization and
representations
. Correlation vs
Causation analysis
• Hypothesis testing
● Statistical analysis
• Reporting
5:Actions->
• Building Machine
Learning Models
Feature Engineering
• Moving ML into
production
• Building ETL
pipelines
• Live dashboard and
reporting
• Decision making
and real-life tests
**PYTHON ECOSYSTEM:
**The libraries we can use ...
pandas: The cornerstone of our Data Analysis job with Python
matplotlib:The foundational library for visualizations.Other libraries we'll use will be
built on top of matplotlib.
numpy:The numeric library that serves as the foundation of all calculations in Python.
seaborn:A statistical visualization tool built on top of matplotlib.
statsmodels:A library with many advanced statistical functions.
scipy:Advanced scientific computing, including functions for optimization,linear
algebra, image processing and much more.
scikit-learn:The most popular machine learning library for Python (not deep learning)