DEV Community

Cover image for Web Scrapping with Python
AlixaProDev
AlixaProDev

Posted on

Web Scrapping with Python

In simple words, Web scrapping is the art of grabbing data from websites. You can grab the data of your interest from a web page using web scrapping.

While there are many ways to do web scrapping but as a programmer you should know how to do web scrapping with your favorite programming language.

No matter what programming language you are using there should be a way of web scrapping with that language. Unless you are using HTML Programming language๐Ÿ˜‚.

I love Python for its simplicity and multitasking. You can do whatever you want with python, and web scraping is not an exception.

Python provides some modules and libraries to helps in our web scraping. Among them requests, beautiful soup, and scrappy are the popular ones.

But I am not here to talk about these modules and Libraries. I will here introduce you to the best python module for web scraping; requests-HTML.

Though beautifulsoup and requests do the job but with the requests-html library, things become much more simpler. You can scrape web pages that use Javascript for rendering HTML.

Enough with the discussion, let's get our hands dirty on it.

Install requests-html library:-

Before Installing requests-html Library, Set up Your Python Installation.

Once you complete Python Installation. Open your favorite terminal and run the following command to install requests-html library.

python -m pip install requests-html 
Enter fullscreen mode Exit fullscreen mode

If you have any error during installation, make sure you check the complete guide to requests-html library.

Get questions from stackoverflow with requests-html

Well, this will be in interesting use case of requests-html, though we can do it using requests and beautifulsoup libraries.

How to get Started?
There are a few steps you should follow to get all questions related to a topic provided.

Step No 1: Find the Keyword
For example lets say you want to grab all questions related to 'python' or 'javascript'.

Step No 2: Open your Favorite IDE
I am using VsCode and I am kind of addict to it. you can use any of your favorite IDE.

Step No 3: Write the following Python code in IDE

from requests_html import HTMLSession
session = HTMLSession()
keyword='python'
url = f"https://stackoverflow.com/questions/tagged/{keyword}"
response = session.get(url)
response.html.render(sleep=1, keep_page = True, scrolldown = 2)
question_elements=response.html.find('a.s-link')
for question_e in question_elements:
    print(question_e.text)
Enter fullscreen mode Exit fullscreen mode

The output of the code is the all the questions related to python that appears on the first page.
Image description

what is next?

You should follow me, becuase I will come up with other interesting python tutorials very soon. Stay connect with me on youtube as well. Link to Youtube channel : https://www.youtube.com/codewithaliyt.

Top comments (2)

Collapse
 
hsukang profile image
hsukang • Edited

It seems that the line "response.html.render(sleep=1, keep_page = True, scrolldown = 2)" is completely not necessary.

Collapse
 
hsukang profile image
hsukang • Edited

'a.s-link' should be changed to '#questions .s-link'