Introduction
Hello, fellow developers! π I've embarked on an exciting project journey, inspired by an amazing list of 53 project ideas. The first stop on this adventure? A Web Scraper built using React.js
,Express
and TailwindCSS
. In this post, I'll share my experiences, challenges, and the lessons I learned along the way. Whether you're a beginner or an advanced developer, I hope you find this journey insightful!
Why a Web Scraper?
Data is everywhere, but not always in the format we need. A web scraper automates the process of extracting data from websites, turning a tedious task into a breeze. For this project, I wanted to develop a tool that could help users collect data efficiently, with a clean and responsive UI.
Backend Setup
Our server will handle the scraping logic. We'll use Express for the server framework, Axios for HTTP requests, Cheerio for parsing HTML, and CORS to allow cross-origin requests.
Setup Your Project
Start by creating a new folder for your project and navigate into it:
mkdir web-scraper
cd web-scraper
Initialize a new Node.js project:
npm init -y
Install Dependencies
Install the necessary packages:
npm install express axios cheerio cors
Creating the Server
Create a file named server.js
and set up your Express server:
const express = require("express");
const axios = require("axios");
const cheerio = require("cheerio");
const cors = require("cors");
const app = express();
app.use(cors());
const PORT = process.env.PORT || 5000;
app.listen(PORT, () => console.log(`Server running on port ${PORT}`));
app.get("/scrape", async (req, res) => {
const url = req.query.url;
try {
const response = await axios.get(url);
const html = response.data;
const $ = cheerio.load(html);
const data = [];
$("a").each((index, element) => {
data.push({
text: $(element).text(),
href: $(element).attr("href"),
});
});
res.json(data);
} catch (error) {
res.status(500).json({ message: "Error accessing the URL" });
}
});
This server listens on port 5000 and provides a /scrape
endpoint that takes a URL as a query parameter, fetches its content, and returns links found on the page.
Code Explanation
-
Dependencies: The server uses
express
for creating the server,axios
for making HTTP requests,cheerio
for parsing the HTML, andcors
to handle cross-origin requests. - Server Setup: We set up the Express server to listen on port 5000 and include the necessary middleware.
-
Scraping Logic: In the
/scrape
endpoint, we use Axios to fetch the HTML from the provided URL, parse it with Cheerio, and extract all the anchor (<a>
) elements. The data is then sent back as a JSON response.
Frontend Setup
The frontend will be a simple React application using Tailwind CSS for styling.
Create React App
Set up the React app:
npx create-react-app client
cd client
Add TailwindCSS
Install TailwindCSS and initialize it:
npm install -D tailwindcss
npx tailwindcss init
Configure your template paths by adding the paths to all of your template files in your tailwind.config.js
file:
/** @type {import('tailwindcss').Config} */
module.exports = {
content: ["./src/**/*.{html,js}"],
theme: {
extend: {
colors: {
brand: "#0f172a",
},
},
},
plugins: [],
}
Add the Tailwind directives to your CSS. Create a file named src/index.css
and add the following:
@tailwind base;
@tailwind components;
@tailwind utilities;
Building the React Interface
Navigate to the src
folder and update the App.js
:
import React, { useState } from "react";
import axios from "axios";
function App() {
const [url, setUrl] = useState("");
const [data, setData] = useState([]);
const [isLoading, setIsLoading] = useState(false);
const fetchData = async () => {
setIsLoading(true);
try {
const result = await axios(`http://localhost:5000/scrape?url=${encodeURIComponent(url)}`);
setData(result.data);
} catch (error) {
console.error("Error fetching data:", error);
setData([]);
} finally {
setIsLoading(false);
}
};
return (
<div className="p-8">
<h1 className="text-2xl font-bold text-blue-800 mb-4">Web Scraper</h1>
<div className="mb-4 flex items-center justify-center">
<input
type="text"
value={url}
onChange={(e) => setUrl(e.target.value)}
placeholder="Enter URL to scrape"
className="p-2 border border-gray-300 rounded w-full"
/>
<button
onClick={fetchData}
className={`bg-brand text-white px-4 py-2 rounded ml-2 ${isLoading ? "opacity-50 cursor-not-allowed" : ""}`}
disabled={isLoading}
>
{isLoading ? "Loading..." : "Scrape!"}
</button>
</div>
<ul>
{data.map((item, index) => (
<li
key={index}
className="list-disc mb-1 p-2 border-b border-gray-200 hover:bg-gray-100"
>
<strong>{item.text}</strong> -
<a
href={item.href}
className
="text-blue-500 hover:text-blue-700 break-all"
>
{item.href}
</a>
</li>
))}
</ul>
</div>
);
}
export default App;
Code Explanation
-
State Management: We use React's
useState
hook to manage the URL input, fetched data, and loading state. -
Fetching Data: When the "Scrape!" button is clicked, the
fetchData
function sends a request to the backend server with the URL to scrape. The response is stored in thedata
state. - Rendering Data: The data is displayed in a list, with each item showing the text and hyperlink extracted from the scraped page.
- Styling: TailwindCSS classes are used to style the components, providing a clean and responsive UI.
Conclusion
Congratulations! You've just built a fully functional web scraper that can extract data from any webpage and display it elegantly. This project not only enhances your skills in handling APIs and user interfaces but also adds a valuable tool to your developer toolkit.
Feel free to extend the functionality or improve the UI as per your needs. Happy coding! π
Top comments (0)