Welcome to the ultimate tutorial that will guide you through the exhilarating journey of building a complete website from scratch. With a pinch of coding wizardry and a dash of creativity, you'll learn the art of crafting a captivating frontend using Next.js. But we won't stop there! We'll delve into the realm of backend development, where you'll master the server-side sorcery with Python and a powerful framework like FastAPI. From integrating databases to creating dynamic content, every aspect of website development will be unveiled.
What you'll learn
Consider this tutorial as your all-access pass to the world of cutting-edge technologies mentioned in job descriptions. And even if they're not explicitly listed, you'll be equipped with the skills to tackle any challenge that shares a similar architecture. You'll learn how to
- create an API using FastAPI
- scrape data using scrapy
- insert data into a cloud database
- create lambda functions to trigger the scraper
- create a cloud run service to deploy your API
Pre-requisites
Before you begin, make sure you have the following tools installed in your working environment
- Docker
- Python (≥ 3.8) and pip
- Npm & Npx (≥ 9.x.x)
Acquiring Data with Scrapy
Our first step on this exciting journey will be to acquire data, and we'll do so by leveraging the dynamic capabilities of Scrapy, a reliable companion to Python. While static websites may lull you into a slumber (unless, of course, you're creating a portfolio for sheep enthusiasts), we're aiming for a more engaging experience. Scrapy's refined design architecture, akin to a debonair secret agent's tuxedo, exudes both elegance and efficiency. Of course, alternative methods too are available for the adventurous souls among us.
Without further ado, fire up your shell and type the following to install scrapy.
pip install scrapy
Check if the installation was a success by typing:
scrapy --version
You should see the following output:
Before you start scraping, you will have to set up a new Scrapy project. Enter a directory where you'd like to store your code and run: (taken this from https://docs.scrapy.org/en/latest/intro/tutorial.html)
scrapy startproject tutorial
This will create a tutorial directory with the following contents:
tutorial/
scrapy.cfg # deploy configuration file
tutorial/ # project's Python module, you'll import your code from here
__init__.py
items.py # project items definition file
middlewares.py # project middlewares file
pipelines.py # project pipelines file
settings.py # project settings file
spiders/ # a directory where you'll later put your spiders
__init__.py
This is the code for our first Spider. Save it in a file named quotes_spider.py
under the tutorial/spiders
directory in your project:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
"<https://quotes.toscrape.com/page/1/>",
]
def parse(self, response):
for quote in response.css("div.quote"):
yield {
"text": quote.css("span.text::text").get(),
"author": quote.css("small.author::text").get(),
"tags": quote.css("div.tags a.tag::text").getall(),
}
next_page = response.css("li.next a::attr(href)").get()
if next_page is not None:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)
Now, from your tutorial/spiders
directory, run the following command to check if your spider is working well:
scrapy crawl quotes
You should see a couple of debug logs in your terminal along with the scraped data and a summary at the end with the stats:
See the output as above? Good, now let's set up our database for storing all the scraped data. I'm choosing the serverless XataDB for this. There are couple of other serverless DBs that are as good as not more than this. However, for our $0 architecture and ease of use, I'll be using XataDB.
Open up https://xata.io/ and create a free account. Set up your database name, location and click next.
create a new table and click next, next..
You'll see the following dashboard. This is your table where the data will reside.
We'll modify the schema to add a few features to our table. Go to Schema tab and create three columns, author (string) - quote (text) - date (Datetime)
Click on your database name and go to it's settings.
Note the endpoint of your db, you'll need this for the connection string.
go back and click on your profile. Open Account settings.
In the Personal API keys section, create a new API. You'll be using this to access your table to store the data. Do not share your API. Click save and note the key.
We'll need the Python Xata SDK to make it work in our scrapy project. Open up your shell and type
pip install xata
This will install the 1.xx version in your environment. If, for some reason, the version installed is 0.xx, there are not that major changes in the functionality and you can still use that to perform transactions all the same. Refer to (https://xata.io/docs/sdk/python/examples) for more information.
Before we connect our database we have to modify our spider a bit. First, create a new Item
in tutorial/items.py
file.
import scrapy
from itemloaders.processors import TakeFirst
class QuotesItem(scrapy.Item):
text = scrapy.Field(output_processor=TakeFirst())
date = scrapy.Field(output_processor=TakeFirst())
author = scrapy.Field(output_processor=TakeFirst())
Now, open up your quotes_spider.py
file and update your spider.
import scrapy
from datetime import datetime
from zero.items import QuotesItem
from scrapy.loader import ItemLoader
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
"<https://quotes.toscrape.com/page/1/>",
]
def parse(self, response):
date = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")
for quote in response.css("div.quote"):
loader = ItemLoader(item=QuotesItem())
loader.add_value("text", quote.css("span.text::text").get())
loader.add_value("author", quote.css("small.author::text").get())
loader.add_value("date", date)
yield loader.load_item()
next_page = response.css("li.next a::attr(href)").get()
if next_page is not None:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)
Create a new class or modify the existing one in your tutorial/pipelines.py
file.
from xata.client import XataClient
from itemadapter import ItemAdapter
API_KEY = "<YOUR API>"
DB_URL = "<YOUR ENDPOINT>"
class ZeroPipeline:
def __init__(self):
self.client = XataClient(api_key=API_KEY, db_url=DB_URL)
def process_item(self, item, spider):
text = item['text']
date = item['date']
author = item['author']
record = {
"author": author,
"quote": text,
"date": date
}
resp = self.client.records().insert("zero", record)
spider.logger.info("Record id is: %s" %resp.json()["id"])
return item
💡I don't recommend using your secret keys in the same file in a plain string text as above but for this tutorial, I'll skip.
Finally, in your tutorial/settings.py
file, enable the pipeline. You'lll see the following setting commented out. Un-comment it. Or, you're using a custom classname then change the "ZeroPipeline" with the name you specified.
ITEM_PIPELINES = {
"zero.pipelines.ZeroPipeline": 300,
}
In your terminal, run the following command to start your spider:
scrapy crawl quotes
Check the debug logs and make sure there aren't any 400 errors. If successful, you'll see the data populated in your table.
Congratulations, you're halfway through.
Now it's time to make an API to fetch that data.
Creating an API using FastAPI
Being the buzz-word in the backend word, I'm using FastAPI to leverage it's fast capabilities. One thing about FastAPI I find impressive is the use of strong typing model (pydantic) to validate the models you build. You can also use Flask for this task but considering how everyone is now demanding 10 years of experience in FastAPI, it should be lucrative to get to know it.
Launch your shell and type the following command
pip install fastapi
You will also need an ASGI server, for production such as Uvicorn or Hypercorn.
pip install "uvicorn[standard]"
Create a new directory called api, next to your web-scraper. Create a new file api/main.py
and enter the following content:
from typing import Any, Union
from datetime import datetime
from pydantic import BaseModel
from xata.client import XataClient
from fastapi import FastAPI, HTTPException, Query, Body
app = FastAPI()
API_KEY = "<YOUR API>"
DB_URL = "<YOUR ENDPOINT>"
xata = XataClient(api_key=API_KEY, db_url=DB_URL)
class FilterParams(BaseModel):
"""Base Model class to pass into the FastAPI routes for xata client."""
table_name: str
columns: list
filter: Union[dict, None] = None
sort: Union[dict, None] = None
@app.get("/")
def read_root():
return {"message": "All OK"}
@app.post("/read-records/")
def read_records(filter_params: FilterParams):
"""Reads the records from XataDB"""
try:
resp = xata.data().query(filter_params.table_name, {
"columns": filter_params.columns,
"filter": filter_params.filter,
"sort": filter_params.sort
})
if resp.is_success():
records = resp.json()["records"]
for record in records:
date_str = record["date"]
date_obj = datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ")
formatted_date = date_obj.strftime("%A, %B %d, %Y")
record["formatted_date"] = formatted_date
return records
else:
raise HTTPExecption(status_code=resp.status_code, detail="Error reading records")
except Exception as error:
raise HTTPException(status_code=500, detail=str(error)) from error
Save the file and write the following in your shell
uvicorn main:app --reload
This will start your server on port 8000 on your localhost.
Click on the link or go to localhost:8000
and you'll be greeted with the message:
Let's go the the docs and interact with our API. append /docs at the end of your link e.g localhost:8000/docs
and FastAPI will generate the UI using swagger.
Click on the POST method /read-records/
to test whether our API fetches the required data. Modify the body to reflect the changes below. Change the date accordingly.
{
"table_name": "zero",
"columns": [
"date", "author", "quote"
],
"filter": {"date": "2023-09-18T11:34:27Z"},
"sort": {"date": "asc"}
}
Click on execute and check the response
You should see the following output.
If you've made it this far, good job.
Dockerizing your API
It's time to flex your Docker skills. Actually, not much is needed here except a few lines of code.
First, create api/requirements.txt
file and enter the following requirements.
annotated-types==0.5.0
click==8.1.7
deprecation==2.1.0
fastapi==0.101.1
h11==0.14.0
httptools==0.6.0
orjson==3.9.5
pydantic==2.2.1
pydantic_core==2.6.1
starlette==0.27.0
uvicorn==0.23.2
uvloop==0.17.0
watchfiles==0.19.0
websockets==11.0.3
xata==1.0.1
Create a Dockerfile
and copy the following content into it.
FROM python:3.9-slim@sha256:980b778550c0d938574f1b556362b27601ea5c620130a572feb63ac1df03eda5
ENV PYTHONUNBUFFERED True
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
ENV PORT 1234
RUN pip install --no-cache-dir -r requirements.txt
CMD exec uvicorn main:app --host 0.0.0.0 --port ${PORT} --workers 1
Build your image by running the following command:
docker build -t api .
Check if the container is working correctly
docker run -dp 8000:1234 -e PORT=1234 api
Go to localhost:8000 and you should see a response of {"message": "All OK"} You can skip the next part if you intend to only work locally. However, I would suggest not to.
Deploying your API on Google Cloud Run
Register for the google cloud account if you haven't already. Once you're in, create a new project.
Enter the project name and click Create.
Select the project to start working on it.
Make sure your billing account is linked to this project, if not, link it. Note the project ID to be used. In this case, it is zerodollar-399414
I'll be using Gcloud CLI to manage and deploy my service. Refer → https://cloud.google.com/sdk/docs/install to install it on your system.
Link your project id with the gcloud CLI
gcloud config set project <YOUR PROJECT ID>
Authenticate your account by running the command. This will prompt you for your google credentials by opening up a modal.
gcloud auth login
After log-in, check whether these services are enabled. If not, you have to enable them by using the following command: gcloud services enable <servicename>.googleapis.com
Create a new repository in Artifact registry. Google provides 0.5gb of free storage, make sure that you do not exceed this to remain inside the $0 budget.
gcloud artifacts repositories create zero --repository-format=docker --location=us-east1 --description="Zero Dollar API"
Before pushing the images to your repository, you first have to configure docker authentication. Type: (change the region accordingly)
gcloud auth configure-docker us-east1-docker.pkg.dev
Tag your image by entering the following command:
docker tag api:latest us-east1-docker.pkg.dev/<YOUR PROJECT ID>/zero/api
Finally, push your image to the Artifact registry:
docker push us-east1-docker.pkg.dev/<YOUR PROJECT ID>/zero/api
Deploying your Image to Cloud Run
It's time to deploy your image as a service using cloud run. Run the following command:
gcloud run deploy zero-dollar-api --image us-east1-docker.pkg.dev/zerodollar-399414/zero/api
--allow-unauthenticated
After a while, you will see the following message on success.
Go to your google cloud console and click on the hamburger icon on the left of the screen and find Cloud Run. Click on it.
You'll see your service running. Click on it.
Go to the URL specified in the above service. Hopefully, you'll see the OK message indicating the success of your deployed API.
Wonderful! now your API is publically available. How does it feel? amazing, right?
💡 Given the number of words exceeded for a typical tutorial article here, I'll be skipping the scheduling process using Lambda function using EventBridge. And also, deploying the scrapy spiders.
Front-end using Next.js
Next.js is chosen as the frontend for our website because it has many advantages. It helps the website load faster and perform better. It's easy to scale your website as it grows. Next.js also makes it simple to develop and deploy your website. Create a new project by typing:
npx create-next-app@latest
select the default values and press enter
cd into your project and create new folders: pages/api/
Your folder structure now should look like this.
Inside your /api
folder, create a new file called fetch-quotes.ts
and enter the following code.
import type { NextApiRequest, NextApiResponse } from "next";
type ResponseData = {
date: string,
author: string,
quote: string,
}
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
try {
const URL: string = "http://localhost:8080/read-records/"
const response = await fetch(URL, {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
"table_name": "zero",
"columns": ["date", "author", "quote"],
"filter": {"date": "2023-09-18T11:34:27Z"},
"sort": {"date": "asc"}
})
});
const data: Promise<ResponseData> = await response.json();
res.status(200).json(data);
} catch (error) {
console.error("ERROR: fetching data", error);
res.status(500).json({error: "Error fetching data"});
}
}
💡 The reason we're making Next api is to mask our google cloud API.
We'll first be using our locally run docker API to fetch the data. Make sure that your container is running on port 8080.
Create a new fileTable.tsx
in your components folder and enter the following content
'use client'
import { useEffect, useState } from 'react';
interface QuoteData {
date: string;
author: string;
quote: string;
formatted_date: string;
}
const Table = () => {
const [quoteData, setQuoteData] = useState<QuoteData[]>([]);
useEffect(() => {
const fetchData = async () => {
try {
const response = await fetch('/api/fetch-quotes');
if (response.ok) {
const data: QuoteData[] = await response.json();
setQuoteData(data);
} else {
console.error('Error fetching data:', response.statusText);
}
} catch (error) {
console.error('Error fetching data:', error);
}
};
fetchData();
}, []);
return (
<table className="w-full border-collapse">
<thead>
<tr>
<th className="border p-2">Date</th>
<th className="border p-2">Author</th>
<th className="border p-2">Quote</th>
</tr>
</thead>
<tbody>
{quoteData.map((quote, index) => (
<tr key={quote.date}>
<td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.formatted_date}</td>
<td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.author}</td>
<td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.quote}</td>
</tr>
))}
</tbody>
</table>
);
};
export default Table;
Finally, use your component in your page.tsx
import Table from "@/components/Table"
export default function Home() {
return (
<div>
<Table />
</div>
)
}
Run: npm run dev
to start your next app. Hopefully, you'll see the following table.
Now edit your Table.tsx
and replace the URL with your Cloud Run API url
import type { NextApiRequest, NextApiResponse } from "next";
type ResponseData = {
date: string,
author: string,
quote: string,
}
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
try {
const URL: string = "<Your API>/read-records/"
const response = await fetch(URL, {
method: "POST",
headers: {
"Content-Type": "application/json",
},
body: JSON.stringify({
"table_name": "zero",
"columns": ["date", "author", "quote"],
"filter": {"date": "2023-09-18T11:34:27Z"},
"sort": {"date": "asc"}
})
});
const data: Promise<ResponseData> = await response.json();
res.status(200).json(data);
} catch (error) {
console.error("ERROR: fetching data", error);
res.status(500).json({error: "Error fetching data"});
}
}
The page will refresh and populate the data from the API. You can look at the metrics for verification.
Congratulations! you can now build an end-to-end applications incorporating serverless cloud services.
For the complete tutorial including Lambda Functions and Spiders deployment go here
You can find all the source code here
Top comments (0)