DEV Community

Cover image for Decomposition.
Robin Kiplangat
Robin Kiplangat

Posted on


In the world of data science, web scraping is a common method for gathering data from the internet. However, it can be a complex task, especially for those new to programming.
This is where decomposition comes into play. It's a method that helps you understand the overall structure and logic of a program by presenting it in a simplified, step-by-step manner, making it more accessible to both techies and non-techies alike.

In this post, we'll walk you through an example of how decomposition can be used to scrape data from a website.

What is decomposition.?

Decomposition refers to the process of breaking down a large, complex tasks into smaller, more manageable subtasks. This approach simplifies the problem-solving process, making it easier to understand, design, and implement solutions.

It involves creating a detailed plan, executing each step individually, and constantly reviewing and adjusting the plan as needed.

This method is particularly useful in programming tasks, as it allows for better error handling and debugging.

The Task at Hand

Our goal is to scrape data from pages on a website. We want to extract information such as the title, description, contact information, and images of each page.

Step 1: Planning

The first step is to create a detailed plan. For our task, the plan might look something like this:

  1. Fetch the HTML content of the webpage.
  2. Parse the HTML content to extract the required data.
  3. Save the extracted data in a structured format.
  4. Download the image of the initiative.

Step 2: Fetching the Webpage Content

We'll use the requests library in Python to fetch the HTML content of the webpage. Here's a simple function that does this:

import requests

def fetch_page_content(url):
  response = requests.get(url)
return response.text

Enter fullscreen mode Exit fullscreen mode

Step 3: Parsing the HTML Content

Next, we'll use the BeautifulSoup library to parse the HTML content and extract the required data. We'll create a function called extract_data that takes the HTML content as input and returns a dictionary with the extracted data.

from bs4 import BeautifulSoup

def extract_data(html_content):
  soup = BeautifulSoup(html_content, 'html.parser')
  # Extract the data…
return data
Enter fullscreen mode Exit fullscreen mode

Step 4: Saving the Data

Once we have the data, we can save it in a structured format. For simplicity, we'll just print the data for now.

for url in urls:
  html_content = fetch_page_content(url)
  data = extract_data(html_content)

Enter fullscreen mode Exit fullscreen mode

Step 5: Downloading the images

Finally, we'll create a function to download the image of each initiative. We'll use the requests library again to fetch the image, and then save it to a file.

def download_image(url, title):
  response = requests.get(url)
  with open(f'{title}.png', 'wb') as f:
Enter fullscreen mode Exit fullscreen mode

So Now . .

And that's it! With decomposition, we've broken down a complex task into manageable steps, making it easier to understand and execute. This method is not only useful for web scraping, but for any programming task. So next time you're faced with a complex task, give it a try!

Top comments (0)