DEV Community

Cover image for Building a Web Scraper with React.js, Express and TailwindCSS: A Journey into Data Collection
drruvari
drruvari

Posted on

Building a Web Scraper with React.js, Express and TailwindCSS: A Journey into Data Collection

Introduction

Hello, fellow developers! πŸ‘‹ I've embarked on an exciting project journey, inspired by an amazing list of 53 project ideas. The first stop on this adventure? A Web Scraper built using React.js,Express and TailwindCSS. In this post, I'll share my experiences, challenges, and the lessons I learned along the way. Whether you're a beginner or an advanced developer, I hope you find this journey insightful!

Why a Web Scraper?

Data is everywhere, but not always in the format we need. A web scraper automates the process of extracting data from websites, turning a tedious task into a breeze. For this project, I wanted to develop a tool that could help users collect data efficiently, with a clean and responsive UI.

Backend Setup

Our server will handle the scraping logic. We'll use Express for the server framework, Axios for HTTP requests, Cheerio for parsing HTML, and CORS to allow cross-origin requests.

Setup Your Project

Start by creating a new folder for your project and navigate into it:

mkdir web-scraper
cd web-scraper
Enter fullscreen mode Exit fullscreen mode

Initialize a new Node.js project:

npm init -y
Enter fullscreen mode Exit fullscreen mode

Install Dependencies

Install the necessary packages:

npm install express axios cheerio cors
Enter fullscreen mode Exit fullscreen mode

Creating the Server

Create a file named server.js and set up your Express server:

const express = require("express");
const axios = require("axios");
const cheerio = require("cheerio");
const cors = require("cors");

const app = express();
app.use(cors());

const PORT = process.env.PORT || 5000;
app.listen(PORT, () => console.log(`Server running on port ${PORT}`));

app.get("/scrape", async (req, res) => {
  const url = req.query.url;
  try {
    const response = await axios.get(url);
    const html = response.data;
    const $ = cheerio.load(html);
    const data = [];
    $("a").each((index, element) => {
      data.push({
        text: $(element).text(),
        href: $(element).attr("href"),
      });
    });
    res.json(data);
  } catch (error) {
    res.status(500).json({ message: "Error accessing the URL" });
  }
});
Enter fullscreen mode Exit fullscreen mode

This server listens on port 5000 and provides a /scrape endpoint that takes a URL as a query parameter, fetches its content, and returns links found on the page.

Code Explanation

  • Dependencies: The server uses express for creating the server, axios for making HTTP requests, cheerio for parsing the HTML, and cors to handle cross-origin requests.
  • Server Setup: We set up the Express server to listen on port 5000 and include the necessary middleware.
  • Scraping Logic: In the /scrape endpoint, we use Axios to fetch the HTML from the provided URL, parse it with Cheerio, and extract all the anchor (<a>) elements. The data is then sent back as a JSON response.

Frontend Setup

The frontend will be a simple React application using Tailwind CSS for styling.

Create React App

Set up the React app:

npx create-react-app client
cd client
Enter fullscreen mode Exit fullscreen mode

Add TailwindCSS

Install TailwindCSS and initialize it:

npm install -D tailwindcss
npx tailwindcss init
Enter fullscreen mode Exit fullscreen mode

Configure your template paths by adding the paths to all of your template files in your tailwind.config.js file:

/** @type {import('tailwindcss').Config} */
module.exports = {
  content: ["./src/**/*.{html,js}"],
  theme: {
    extend: {
      colors: {
        brand: "#0f172a",
      },
    },
  },
  plugins: [],
}
Enter fullscreen mode Exit fullscreen mode

Add the Tailwind directives to your CSS. Create a file named src/index.css and add the following:

@tailwind base;
@tailwind components;
@tailwind utilities;
Enter fullscreen mode Exit fullscreen mode

Building the React Interface

Navigate to the src folder and update the App.js:

import React, { useState } from "react";
import axios from "axios";

function App() {
  const [url, setUrl] = useState("");
  const [data, setData] = useState([]);
  const [isLoading, setIsLoading] = useState(false);

  const fetchData = async () => {
    setIsLoading(true);
    try {
      const result = await axios(`http://localhost:5000/scrape?url=${encodeURIComponent(url)}`);
      setData(result.data);
    } catch (error) {
      console.error("Error fetching data:", error);
      setData([]);
    } finally {
      setIsLoading(false);
    }
  };

  return (
    <div className="p-8">
      <h1 className="text-2xl font-bold text-blue-800 mb-4">Web Scraper</h1>
      <div className="mb-4 flex items-center justify-center">
        <input
          type="text"
          value={url}
          onChange={(e) => setUrl(e.target.value)}
          placeholder="Enter URL to scrape"
          className="p-2 border border-gray-300 rounded w-full"
        />
        <button
          onClick={fetchData}
          className={`bg-brand text-white px-4 py-2 rounded ml-2 ${isLoading ? "opacity-50 cursor-not-allowed" : ""}`}
          disabled={isLoading}
        >
          {isLoading ? "Loading..." : "Scrape!"}
        </button>
      </div>
      <ul>
        {data.map((item, index) => (
          <li
            key={index}
            className="list-disc mb-1 p-2 border-b border-gray-200 hover:bg-gray-100"
          >
            <strong>{item.text}</strong> -
            <a
              href={item.href}
              className

="text-blue-500 hover:text-blue-700 break-all"
            >
              {item.href}
            </a>
          </li>
        ))}
      </ul>
    </div>
  );
}

export default App;
Enter fullscreen mode Exit fullscreen mode

Code Explanation

  • State Management: We use React's useState hook to manage the URL input, fetched data, and loading state.
  • Fetching Data: When the "Scrape!" button is clicked, the fetchData function sends a request to the backend server with the URL to scrape. The response is stored in the data state.
  • Rendering Data: The data is displayed in a list, with each item showing the text and hyperlink extracted from the scraped page.
  • Styling: TailwindCSS classes are used to style the components, providing a clean and responsive UI.

Conclusion

Congratulations! You've just built a fully functional web scraper that can extract data from any webpage and display it elegantly. This project not only enhances your skills in handling APIs and user interfaces but also adds a valuable tool to your developer toolkit.

Feel free to extend the functionality or improve the UI as per your needs. Happy coding! πŸš€

Top comments (0)