DEV Community

Seth Bang
Seth Bang

Posted on

Web Scraping Tutorial with Python and Beautiful Soup

In this tutorial, we will use Python and a popular web scraping library called Beautiful Soup to scrape a website. We will cover the basics of web scraping, including making requests, parsing HTML, and extracting data.

Prerequisites

  1. Basic understanding of Python.
  2. Familiarity with HTML.

Tools and Libraries

  1. Python 3.x
  2. Beautiful Soup 4
  3. Requests

Step 1: Install Required Libraries

First, you need to install Beautiful Soup and Requests libraries. You can do this using pip:

pip install beautifulsoup4
pip install requests
Enter fullscreen mode Exit fullscreen mode

Step 2: Import Required Libraries

In your Python script, import the required libraries:

import requests
from bs4 import BeautifulSoup
Enter fullscreen mode Exit fullscreen mode

Step 3: Make an HTTP Request

To scrape a website, you first need to download its HTML content. You can use the Requests library to do this:

url = 'https://example.com'  # Replace this with the website you want to scrape
response = requests.get(url)

# Check if the request was successful (status code 200)
if response.status_code == 200:
    html_content = response.text
else:
    print(f"Failed to fetch the webpage. Status code: {response.status_code}")
Enter fullscreen mode Exit fullscreen mode

Step 4: Parse the HTML Content

Now that you have the HTML content, you can parse it using Beautiful Soup:

soup = BeautifulSoup(html_content, 'html.parser')
Enter fullscreen mode Exit fullscreen mode

Step 5: Extract Data

With the parsed HTML, you can now extract specific data using Beautiful Soup's methods:

# Find a single element by its tag
title_tag = soup.find('title')

# Extract the text from the tag
title_text = title_tag.text
print(f"The title of the webpage is: {title_text}")

# Find all the links on the webpage
links = soup.find_all('a')
for link in links:
    href = link.get('href')
    link_text = link.text
    print(f"{link_text}: {href}")
Enter fullscreen mode Exit fullscreen mode

Step 6: Save Extracted Data

You can save the extracted data in any format you prefer, such as a CSV or JSON file. Here's an example of how to save extracted data to a CSV file:

import csv

# Assuming you have a list of dictionaries with the extracted data
data = [{'text': 'Link 1', 'url': 'https://example.com/link1'},
        {'text': 'Link 2', 'url': 'https://example.com/link2'}]

with open('extracted_data.csv', 'w', newline='') as csvfile:
    fieldnames = ['text', 'url']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()
    for row in data:
        writer.writerow(row)
Enter fullscreen mode Exit fullscreen mode

And that's it! This basic tutorial should help you get started with web scraping using Python and Beautiful Soup. Remember to always respect the website's terms of service and robots.txt file, and avoid overloading the server with too many requests in a short period of time.

Top comments (1)

Collapse
 
oto profile image
oto

Is there any code to extract the image here?