DEV Community 👩‍💻👨‍💻

Md Meraj Kausar
Md Meraj Kausar

Posted on

An introductory view of Data Analysis

What is Data analysis?
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, informing conclusions, and supporting decision-making.
Tools used in Data Analysis :
Auto-managed closed tools->
Qwiklabs, Tableau,Looker, Zoho Analytics
The programming language used: Python,R,Julia
Why Python for Data Analysis?
*-very simple and intuitive to learn.
-correct language
-powerful libraries

-free and open source
-Amazing community,docs, and conferences
**When to choose R language?
*- When R studio is needed
-When dealing with advanced statistical methods.
-When extreme performance is needed.
**Data analysis Process:
1:Data extraction->
SQL,Scrapping ,File format(CSV,JSON,XML),Consulting APIs,Buying Data,Distributed database
2:Data cleaning ->
• Missing values and
empty data
• Data imputation
• Incorrect types
Incorrect or invalid
• Outliers and non
relevant data
● Statistical sanitization

Data Wrangling->
Hierarchical Data
Handling categorical
Reshaping and
Indexing data for
quick access
and joining data
• Exploration
• Building statistical
• Visualization and
. Correlation vs
Causation analysis
• Hypothesis testing
● Statistical analysis
• Reporting
• Building Machine
Learning Models
Feature Engineering
• Moving ML into
• Building ETL
• Live dashboard and
• Decision making
and real-life tests
**The libraries we can use ...
pandas: The cornerstone of our Data Analysis job with Python
matplotlib:The foundational library for visualizations.Other libraries we'll use will be
built on top of matplotlib.
numpy:The numeric library that serves as the foundation of all calculations in Python.
seaborn:A statistical visualization tool built on top of matplotlib.
statsmodels:A library with many advanced statistical functions.
scipy:Advanced scientific computing, including functions for optimization,linear
algebra, image processing and much more.
scikit-learn:The most popular machine learning library for Python (not deep learning)

Top comments (0)


Thank you.

Thanks for visiting DEV, we’ve worked really hard to cultivate this great community and would love to have you join us. If you’d like to create an account, you can sign up here.