HARUN NZAI

Posted on Dec 14

Real time E commerce price tracker and Deal finder

#brightdatachallenge #devchallenge #webdev #api

Project Title

Real-Time E-commerce Price Tracker and Deal Finder

Introduction

In today’s competitive online shopping landscape, finding the best deals can be overwhelming for consumers. This project, Real-Time E-commerce Price Tracker and Deal Finder, is designed to scrape real-time product prices and discounts from e-commerce websites to help users identify the best deals.

Using Bright Data Scraping Browser, the project effectively handles complex, dynamic web elements like infinite scrolling, AJAX requests, and user authentication pages to extract structured data.

Objective

To create a tool that:

Scrapes real-time prices and discounts from dynamic e-commerce websites.
Filters and organizes data by categories like electronics, clothing, and home appliances.
Provides a user-friendly dashboard to search and explore the best deals.

Technical Setup

Tools and Technologies

Bright Data Scraping Browser: To handle dynamic content and simulate user interactions.

Python: For scripting and automation.

Selenium: To control the browser and extract web data.

Pandas: To process and structure the scraped data.

Streamlit: For creating an interactive dashboard.

Website Targeted

For this project, we focused on websites with infinite scrolling and AJAX-based content. An example is an e-commerce website such as Amazon, which requires handling dynamic elements to extract information about product prices, discounts, and ratings.

Challenges and Solutions

Challenge 1: Infinite Scrolling

Issue: Dynamic websites often load new content as users scroll down.

Solution: Implemented scrolling logic in Selenium to trigger additional data loads and used Bright Data’s infrastructure to maintain session stability.

Challenge 2: Dynamic Content Rendering

Issue: Many sites rely on JavaScript to load elements.

Solution: Bright Data Scraping Browser processed JavaScript-heavy pages to retrieve fully loaded content.

Challenge 3: Anti-Bot Measures

Issue: CAPTCHAs and rate-limiting blocked traditional scrapers.

Solution: Bright Data handled these using its CAPTCHA bypass and rotating IPs.

Implementation

Code Snippets

Step 1: Scraping Dynamic Data

The following Python script handles infinite scrolling, dynamic content, and data extraction:

from selenium import webdriver
from selenium.webdriver.common.by import By
import pandas as pd
import time

Configure WebDriver

options = webdriver.ChromeOptions()
options.add_argument('--headless') # Run browser in background
driver = webdriver.Chrome(options=options)

def scrape_ecommerce_data(url):
driver.get(url)
time.sleep(5) # Allow page to load

products = []

  
  
  Scroll and extract data


for _ in range(5):  # Adjust the range for more scrolling

    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    time.sleep(2)  # Wait for new content to load

items = driver.find_elements(By.CLASS_NAME, "product-class")  # Update class names for the site

for item in items:
    try:
        name = item.find_element(By.CLASS_NAME, "name-class").text  # Product name
        price = item.find_element(By.CLASS_NAME, "price-class").text  # Product price
        rating = item.find_element(By.CLASS_NAME, "rating-class").text  # Product rating

        products.append({
            "Name": name,
            "Price": price,
            "Rating": rating
        })
    except:
        continue  # Handle missing data gracefully



    

    




return pd.DataFrame(products)

Example usage

url = "https://www.example-ecommerce.com"
data = scrape_ecommerce_data(url)
data.to_csv("ecommerce_data.csv", index=False)
driver.quit()

Step 2: Building the Dashboard

Using Streamlit, I created an interactive dashboard to showcase the data:

import streamlit as st
import pandas as pd

Load scraped data

data = pd.read_csv("ecommerce_data.csv")

Dashboard design

st.title("Real-Time E-commerce Price Tracker")
st.write("Explore the best deals in various categories!")

Filters

category = st.selectbox("Select Category", options=["All", "Electronics", "Clothing", "Home Appliances"])
price_range = st.slider("Price Range", 0, 5000, (0, 5000))

Filter data

filtered_data = data[(data["Price"] >= price_range[0]) & (data["Price"] <= price_range[1])]

st.write("### Filtered Results")
st.write(filtered_data)

Results

Sample Output
The following table shows the extracted data (mock sample for illustration):
Interactive Dashboard
The Streamlit dashboard allowed users to filter deals by price and categories in real time.

Conclusion

This project demonstrates how Bright Data’s Scraping Browser simplifies the process of extracting data from complex websites. By leveraging its tools, the challenges of infinite scrolling, dynamic content, and CAPTCHA protection were effectively addressed.

Future Enhancements

Include multiple e-commerce websites.

Add alerts for price drops and better deals.

Expand to regional websites for localized deal tracking.

Files and Submission

The following files are ready for submission:

Scraper Code: ecommerce_scraper.py
Dashboard Code: dashboard.py
Sample Data: ecommerce_data.csv
Documentation: This file as README.md.

DEV Community

Real time E commerce price tracker and Deal finder

Configure WebDriver

Scroll and extract data

Example usage

Load scraped data

Dashboard design

Filters

Filter data

Top comments (0)

Read next

Build a Weather App with React Native, OpenCageData, and OpenWeatherMap

Implementing Auth in .NET WebApi & SPAs: Why is it still so painful?

¡Hola Wagtail!

React Performance: Boost it with Tree Shaking