DEV Community 👩‍💻👨‍💻

Md Meraj Kausar
Md Meraj Kausar

Posted on

An introductory view of Data Analysis

What is Data analysis?
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.
Tools used in Data Analysis :
Auto-managed closed tools->
Qwiklabs, Tableau,Looker, Zoho Analytics
The programming language used: Python,R,Julia
Why Python for Data Analysis?
*-very simple and intuitive to learn.
-correct language
-powerful libraries

-free and open source
-Amazing community,docs, and conferences
**When to choose R language?
*- When R studio is needed
-When dealing with advanced statistical methods.
-When extreme performance is needed.
**Data analysis Process:
1:Data extraction->
SQL,Scrapping ,File format(CSV,JSON,XML),Consulting APIs,Buying Data,Distributed database
2:Data cleaning ->
• Missing values and
empty data
• Data imputation
• Incorrect types
Incorrect or invalid
values
• Outliers and non
relevant data
● Statistical sanitization

Data Wrangling->
Hierarchical Data
Handling categorical
data
Reshaping and
transforming
structures
Indexing data for
quick access
Merging,combining
and joining data
4:Analysis->
• Exploration
• Building statistical
models
• Visualization and
representations
. Correlation vs
Causation analysis
• Hypothesis testing
● Statistical analysis
• Reporting
5:Actions->
• Building Machine
Learning Models
Feature Engineering
• Moving ML into
production
• Building ETL
pipelines
• Live dashboard and
reporting
• Decision making
and real-life tests
**PYTHON ECOSYSTEM:
**The libraries we can use ...
pandas: The cornerstone of our Data Analysis job with Python
matplotlib:The foundational library for visualizations.Other libraries we'll use will be
built on top of matplotlib.
numpy:The numeric library that serves as the foundation of all calculations in Python.
seaborn:A statistical visualization tool built on top of matplotlib.
statsmodels:A library with many advanced statistical functions.
scipy:Advanced scientific computing, including functions for optimization,linear
algebra, image processing and much more.
scikit-learn:The most popular machine learning library for Python (not deep learning)

Top comments (0)

DEV

Thank you.

 
Thanks for visiting DEV, we’ve worked really hard to cultivate this great community and would love to have you join us. If you’d like to create an account, you can sign up here.