**
Introduction
**
Data analytics is a carefully curated process to convert and evaluate raw data into practical solutions that optimize the performance of companies and decision making by creating a storytelling process with data. Data analytics mainly comprises five major steps: data collection, data storage, data processing, data cleansing, and data analysis.
**
Data collection processes
**
Once the objectives and aims of the analysis is set, a data analyst has to identify and collect data from reputable sources. It is therefore crucial that data collected is an accurate representation of the facts. This process can be categorized into ELT or ETL.
Extract Transform Load (ETL)
In ETL, the first step is to extract data from files, databases, APIs, etc., then this data undergoes transformation processes by duplicate removal, identifying missing data, dropping null values, aggregates, etc. After transformation, data is loaded into a database or a warehouse for storage.
Extract Load Transform (ELT)
In ELT, the first step is to extract data similar to the ETL process, and then the data collected is loaded into a data repository and thereafter transformed.
Some of the python libraries that are crucial in these two processes include Pandas, NumPy, Apache Airflow and SQL Alchemy. These two approaches have different benefits in data analytics. ETL ensures high quality data therefore promotes flexibility and agility of data. It also ensures that security concerns are addressed especially where data is of sensitive nature. ETL therefore allows for masking of data and provides the option to include encryption keys. Where the data volume is extremely big such as in the case of big data, ELT will be a good option because it ensures that data loss does not occur and provides analysts with the option of extracting data they may need from a larger dataset.
**
Data analysis and visualization Types
**
After data is processed, organized and securely stored, this data is utilized to interpret and visualize this information. Data analytics can be categorized into four types: descriptive, prescriptive, predictive, or diagnostic analytics.
*Descriptive Analytics
This provides a summary of what and why occurrences are there. It is used to identify trends.
Prescriptive Analytics
Provides recommendations by using machine learning and industry knowledge to give solutions on steps to be taken.
Predictive Analytics
This is an analytic type that forecasts and predicts what may or may not happen in future. It often involves the application of machine learning.
Diagnostic Analytics
This analytic type focuses on diagnosis / analyzing the why. Analysts do this to understand why patterns and trends are the way they are by analyzing previous patterns from past events.
**
Data Analytics Techniques
**
There are many different analytics techniques. These are some of the techniques often implemented in analytics.
- Regression analysis
- cluster analysis
- Time series analysis
- Classification analysis
- Text Analysis (NLP- Natural language processing))
- Principal component analysis
- Descriptive statistics
- Inferential statistics
**
Fundamental Tools for Data Analytics
**
These are some of the crucial tools that are necessary for every data analyst.
1. Microsoft Excel
This is a tool that is used to clean and transform data using formulation as well as visualization.
2. Python
This programming language is a staple with libraries such as NumPy, pandas, Matplotlib, seaborn, Sklearn and beautiful soup. All these libraries have different functions that are important in numerical analysis and data manipulation.
- R This is another programming language that is specifically designed to conduct statistical computations. 4. Tableau and power BI These are two popular visualization software that create collaborative dashboards to visualize data and indicate patterns. 5. Structured Query Language (SQL) This tool is essential in examining and managing relational databases. It is useful in extracting, filtering and joining tables. Other SQL tools include NoSQL, which also stores and retrieves data in a more flexible way. 6. Jupyter Notebooks This is a free open-source web application that allows for creation and sharing of documents containing live codes. It can be run on any browser or desktop and allows for the creation of reports from them. **
Conclusion
**
It is crucial to understand that the choice of tools to be used in data analytics is subjective and dependent on the intended outcome. Data analytics is a wide field, and any data analyst, data scientist, or machine learning expert should familiarize themselves with these tools and many more in order to enhance their productivity and increase efficiency in the data driven ecosystem.
Top comments (0)