Something that's so exciting when it comes to data science is making your hands dirty!, not literally but you get the drill, right? I hope so. Exploratory Data Analysis sounds like such a big scary statement but its not, it literally means exploring data, analysing the dataset that you have. But you see, inasmuch as it sounds so easy, it doesn't mean you skip it. Because if you do, you'll be shown "shege" in short, you will cry till the end.
As a data scientist/ analyst/ engineer, this is the part that brings us all together, because we need to pass here to go to any next step. EDA in short, includes data preprocessing and data visualization basically. For you to get to visualization, you need to do the preprocessing first. Don't be scared, it's actually very fun.
Now let's dive into it, Shall we?
EDA's importance to a datascientist is: It helps in understanding the data being used better and helps in identifying any outliers.
Bar Plots: The most used and common plots that are used when visualizing data. I know you've probably come across barplots in Microsoft Excel. They are also applicable here. However their main purpose is to identify categorical values within a dataset. Categorical values are commonly to describe attributes in a dataset.
Histograms: These are also commonly used in data visualization and they usually help in identifying numerical values in a dataset.
Box Plots: These are the very important in data visualization as they help in identifying outliers in a dataset. For an easier way to look for anomalies in your dataset, always count on boxplots to come through for you.
Scatter plots : As the name suggests, they have a scattered pattern on them and they are commonly used to identify correlations within the dataset. They also help in identifying outliers. Count on them to do a good job.
Heatmaps : They don't necessarily have heat in them as the name suggests, but the pattern kinda reveal heat, you gerrit. If not, here is an example of a dataset I'm working on at the moment
You now, get it?
Word Clouds: As from Google's definition, an image composed of words used in a particular text or subject, in which the size of each word indicates its frequency or importance.
In my opinion these are the types of plots you'll encounter or use when visualizing data. The others include line plots, pie charts and pair plots, among others. You'll definitely find a use for each of them as you navigate the field of data science. My intention here is to provide some guidance on the ways to visualize data as you progress in this space.
Remember, exploratory data analysis (EDA) is not about visualizing data; it's about telling a story bringing data to life and gaining insights from your analysis. Keep that in mind. Anyway that's all from me for now. I hope you found my article on EDA techniques interesting. If you did feel free to leave a like or comment as feedback.. If theres anything I may have missed or if you have any suggestions, for improvement please let me know! It helps me grow and learn. Alright then Bye!
Top comments (0)