DEV Community

Param Mehrotra
Param Mehrotra

Posted on

IMDB movie scrapper - my first project

I have started my programming journey with a project that incorporated a lot of things I have never done before. I was exposed to many new python libraries available for use and also learnt the format of programming projects.

So, I started off by first reading about the library beautiful soup. This library helps us to extract and decode the html code of the website that we are extracting data from. In this case, I was extracting data of the top 250 English and top 250 Hindi movies from IMDB. (links below)

After that, I read up a bit on html to understand the html code of the website. here, I found out that all titles and their years were given in the 'td' section with

''' python
{"class": "titleColumn"}
'''

and all the ratings were given with '{"class": "ratingColumn imdbRating"}' in the same section. So, I extracted them all using the findall function in beautiful soup. I then separated thee title into name and year by splitting the string into two. I then created a class Movie which took in the values name, year and rating.

Now, that I had extracted the data, I had to sort it based on a custom criterion (either movie name "(alphabetically)", Rating or Release Year).

Links:
1)IMDB top 250 English movies:
https://www.imdb.com/chart/top/?ref_=nv_mv_250
2) IMDB top 250 Hindi movies:
https://www.imdb.com/india/top-rated-indian-movies/?pf_rd_m=A2FGELUUNOQJNL&pf_rd_p=4da9d9a5-d299-43f2-9c53-f0efa18182cd&pf_rd_r=F22A2RC934X0Q4NDWHBA&pf_rd_s=right-4&pf_rd_t=15506&pf_rd_i=top&ref_=chttp_ql_7

Top comments (0)