DEV Community

Cover image for I did Exploratory data analysis on Medium blogs. which can answer: On what day to post a blog, etc.
Kedar Kodgire
Kedar Kodgire

Posted on

I did Exploratory data analysis on Medium blogs. which can answer: On what day to post a blog, etc.

Note: The results you see below may not be 100% correct. This analysis is based on a dataset that has approximately 6500 blog-posts.

Hello, Welcome to this post.

I have been blogging for quite a time now, here on dev. I planned to do this analysis on Medium blogs because it can help to answer some questions like

Here is the link for kaggle version of this notebook you can find the dataset and everything here. Feel free to fork and use/modify the code with your own dataset to see interesting results. this dataset contains information about randomly chosen medium articles published in 2019 from these 7 publications: Towards Data Science, UX Collective, The Startup, The Writing Cooperative, Data-Driven, Investor, Better Humans, Better Marketing

What day is good to post the blog

Alt Text

From this heatmap we can say that most of the blogs were posted on Monday and among all the months around 500 articles were published on Monday in the month of October. If we consider two weekdays where the number of articles published highest we get

  • Monday
  • Thursday

So the next time when you publish your article, try to do it on Monday or Thursday hopefully it will get seen by more number of audience.

Now that we understood what is the best day to post an article, let's try to get understand how long should the article be.


What should be the ideal reading time

Alt Text

we have plotted the graph with the number of claps/likes vs the reading time of the article. The assumption is that a person will give a clap to an article only when he reads it completely and finds it useful/entertaining.

Along with the claps/likes, the comments are also the important factors in the article writing because they help us to understand the quality of the blog i.e is it well-written / it's plagiarised / it is helpful / conversation starter, etc. and you can also do further sentimental analysis on it to understand if it's good or bad. let's plot another scatterplot for it.

Alt Text

So we can say that the ideal reading time for an article should be 5 - 10 minutes. And if the article is too long there is a possibility that the user may not read it as it will take a lot of their time and hence will not comment or clap to it.


which image extensions to use in the blogs

Alt Text

As we see that most of the images are of type jpeg i.e about 50% and then jpg and png. There is gonna be some reason behind this, let's see what it is

JPEG - JPEG is a lossy compression method used to ensure the digital images being used are as small as possible and load quickly when someone wants to view them. Here are some important points about it

The file size of the image being compressed is permanently reduced by eliminating unnecessary (redundant) information from the image.
Image quality does suffer, though it’s often so slight the average site visitor can’t tell.

JPG - Well, when it comes to .jpeg vs .jpg, the truth is there is no difference between the two except the number of characters.

The term JPG exists because of the earlier versions of Windows operating systems. Specifically, the MS-DOS 8.3 and FAT-16 file systems had a maximum 3-letter limit when it came to filing names, unlike the UNIX-like operating systems like Mac or Linux, which didn’t have this limit.

I read these points in this article, you can check it out if you want to learn more about it

well coming to png's

PNG - Portable Network Graphics (PNGs) are just as popular as JPEGs on websites. They also support millions of colors, although you’re much better off using PNGs for images that contain fewer color data. Otherwise, your image is going to be ‘heavier’ than the same image saved as a JPEG.
so we can conclude that

  • JPEG/JPG: This is an ideal image format for all types of photographs.
  • PNG: This format is perfect for screenshots and other types of imagery where there’s not a lot of color data.
  • GIF: If you want to show off animated graphics on your site, this is the best image format for you.

publication with the most number of blogs

Alt Text

The Startup publication has the most number of articles i.e. around 3000 and Better Humans has the least number of articles published on medium i.e. around 10.

We can also Infer that most of the people read articles from The Startup, Towards Data Science, and Data-Driven Investor assuming they have posted a huge number of articles based on the demand/response from the audience.

(Note: we cannot be sure about this Inference because the dataset we have might not be the complete data)

The startup's mission: to help readers get smarter at building their things;

we can fairly assume that help readers get smarter at building their things may be a hot topic for blog


average reading time of articles according to the publications

Now that we know the publications with the most number of blogs, let's look at the average reading time of articles according to the publications

Alt Text

As we know that The Startup wrote the most number of articles their average time is 5.9min. This again backs up our facts that the ideal time is 5-10 minutes.

Similarly, we can also see reading time for the article with the highest claps among the publications, and here are the results.

Publication Reading Time
Better Marketing 7 min
Towards Data Science 11 min
UX collective 5 min
Data Driven Investor 6 min
The Startup 5 min
The writing Cooprative 5 min
Better Humans 5 min

Even here popular reading time is bet 5-10 min.

The things we explored are just an idea you can do a lot more with this analysis with even better datasets. Let me know in the comments if you plan to do something similar, let's collaborate.

Well, That's all for the post folks.
Hope you enjoyed it and hopefully, some insights will be useful for your next blog.

Further reads

Latest comments (0)