DEV Community

Cover image for Introduction to Data Analysis
Md Manawar Iqbal
Md Manawar Iqbal

Posted on

Introduction to Data Analysis

What is Data analysis?
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.
Tools used in Data Analysis :
Auto-managed closed tools->
Qwiklabs, Tableau,Looker, Zoho Analytics
The programming language used: Python,R,Julia
Why Python for Data Analysis?
*-very simple and intuitive to learn.
-correct language
-powerful libraries

-free and open source
-Amazing community,docs, and conferences
**When to choose R language?
- When R studio is needed
-When dealing with advanced statistical methods.
-When extreme performance is needed.
**Data analysis Process:

1:Data extraction->
SQL,Scrapping ,File format(CSV,JSON,XML),Consulting APIs,Buying Data,Distributed database
2:Data cleaning ->
• Missing values and
empty data
• Data imputation
• Incorrect types
Incorrect or invalid
• Outliers and non
relevant data
● Statistical sanitization

  1. Data Wrangling->

Hierarchical Data
Handling categorical
Reshaping and
Indexing data for
quick access
and joining data
• Exploration
• Building statistical
• Visualization and
. Correlation vs
Causation analysis
• Hypothesis testing
● Statistical analysis
• Reporting
• Building Machine
Learning Models
Feature Engineering
• Moving ML into
• Building ETL
• Live dashboard and
• Decision making
and real-life tests
**The libraries we can use ...
pandas: The cornerstone of our Data Analysis job with Python
matplotlib:The foundational library for visualizations.Other libraries we'll use will be
built on top of matplotlib.
numpy:The numeric library that serves as the foundation of all calculations in Python.
seaborn:A statistical visualization tool built on top of matplotlib.
statsmodels:A library with many advanced statistical functions.
scipy:Advanced scientific computing, including functions for optimization,linear
algebra, image processing and much more.
scikit-learn:The most popular machine learning library for Python (not deep learning)

Top comments (0)