DEV Community

Ashutosh
Ashutosh

Posted on

Data Analysis Introduction

This article is originally posted at 99mentor.com.

Data Analysis

Data Analysis is a buzzword heard a lot these days. Data is not new to this world, its been there for centuries. So why there is such a hue and cry now? Digitalization has connected the world to such an extent, as we have Terabytes of data generated every hour and, we have hundreds of terabytes of data in day and so on. With this wealth of information, there might be some hidden inferences that need to be understood for interpretation. With relevant inferences, companies or individual can obtain a meaningful insight which can be capitalized into a business opportunity. That's why Data analysis is gaining a lot of traction these days.

What is Data Analysis

Data Analysis is a method of applying statistical/logical techniques to describe, Illustrate, and represent data. In other words, making sense out of Data. Data can be very deceptive, can represent a completely untrue story.

According to Mark Twain, There are three kinds of lies: lies, damned lies, and statistics.

One should aim to represent the data in the best possible way. The need of the hour is to manage and, inference the statistics to suit the business requirements. However, the question arises Why do we analyze the data? What are the different types of data analysis? What issues do we face while analyzing the data? And last but not the least, What are the steps involved in data analysis?

Why Data Analysis?

Making sense out fo Data, That's what precisely analyzing does it for you else it will remain a pile of untapped information. Data analysis differentiates an individual from the rest of the population as it helps individuals from making largely unsubstantiated claims or remarks. It is for the sake of publication or social networking post nor it has anything to do with statistical significance as most of the time distinction has to be made between statistical significance and practical/business significance. After all, statistical significance can be achieved if the sample size is large enough.

From a Quality Standpoint, the analysis provides a basis to interpret it in ways that action can be taken. It provides the building blocks for a decision-maker where he can substantiate an argument about the findings.

Data analysis is to vet the results of our actions. Whether the calin we are making is reproducible and unquestionable. In particular, Businesses are not interested in significance or insignificance, we as an individual must do data analysis to assert our claim. You can simply report the result of the action whether it's 'passed' or 'failed' to stakeholders but behind that, there is data analysis performed (which is not necessarily to be reported).

Types of Data analysis

There are several types of Data Analysis techniques that are in practice. The types of data analysis are:

  • Descriptive Analysis
  • Predictive Analysis
  • Prescriptive Analysis

Issues in Data Analysis

Several issues are reported that an individual should be cognizant of concerning data analysis. Key Issues are(Not limited to):

  • Individual skills to analyze the data
  • Data collection plan and completeness
  • Biased inference
  • Subgroup analysis missing
  • No statistical significance
  • Data Capturing method
  • Improper Data cleaning
  • Outlier Treatment Missing

How to Analyse the data

The Data Analysis Process allows you to make use of available tools to explore the data and find a pattern, Inturn helping in making decisions and getting to conclusions.

Data Analysis consists of the following phases:

  • Data Gathering
  • Data Collection Method
  • Data Preparation (ETL)
  • Data Crunching (Analysing)
  • Data Interpreting
  • Data Visualizing

Data Gathering

First of all, you need to establish the need to analyze the data. All you need is to find the purpose of connecting to a stakeholder which tells you the business problem. Here, you need to have a problem statement in place and what data is available to investigate the matter.

Data Collection Method

After Data gathering, you will have a fair bit of idea on what needs to be the measure to get the desired output. Now the focus should to entail down the data requirements and reach out to a person/department who can help you in getting the data. Data is collected from various resources, please do maintain a log to track if something goes wrong or you need any additional information.

Data Preparation (ETL)

Most critical step but the most forgotten one. In no business, you get the data the way you want. Everywhere data need to be transformed to get the desired result. Common issues in data are duplicate records, missing values, outliers, no subgroups, etc. ALL this needs to be taken care of before loading any data into the model. Our aim should be to get the data as clean and error-free as possible. This phase is a must before Analysis and will keep you closer to your expected outcome.

Data Crunching

Once the data is prepared, it is good to go for Analysis. As you crunch its data, you may end up getting what you desire or it might lead you to further data requirement. Here our powerful software like Python, R, SAS, Julia, etc comes handy which will help you to understand, interpret, and derive conclusions based on the business problem.

Data Interpreting

The most critical part, its like the nervousness you fell before exam results. IN this phase statistical/Business significance comes into the picture. This is required to verify the claim you are making against data. If there is less of a sign you may have to repeat all the steps to justify or your claim shall get refuted. You can choose the way to express or communicate your data analysis either you can use simply in words or maybe a table or chart.

Data Visualizing

Visualization makes it easy to interpret and understand that's what this phase has its significance. There are many ways to visualize some may di it with tables, some with charts or some other method. Which one to choose depends on an individual, there is no thumb rule to it. Many times hidden/unknown facts are better understood when we visualize the data, By observing relationships and comparing datasets, once can unearth some meaningful information.

Summary

  • Data analysis is a process of Exploring, transforming and loading data to unearth useful information for better decision-making
  • Types of Data Analysis are Descriptive, Predictive, Prescriptive Analysis
  • Data Analysis consists of Data Requirement Gathering, Data Collection, Data Cleaning, Data Analysis, Data Interpretation, Data Visualization

Top comments (2)

Collapse
 
adamellsworth profile image
Adam

That Twain quote had me rolling lol.

Thanks for putting this overview together. Any suggested further reading?

Collapse
 
akuks profile image
Ashutosh

Thanks Adam.
I am writing the further articles on DA. Quite busy in some damn projects. It'll be released soon.