Hamzza K

Posted on Sep 24, 2023

$0 Architecture: Full-Stack application on Serverless Cloud from scratch

#webdev #cloud #python #serverless

Welcome to the ultimate tutorial that will guide you through the exhilarating journey of building a complete website from scratch. With a pinch of coding wizardry and a dash of creativity, you'll learn the art of crafting a captivating frontend using Next.js. But we won't stop there! We'll delve into the realm of backend development, where you'll master the server-side sorcery with Python and a powerful framework like FastAPI. From integrating databases to creating dynamic content, every aspect of website development will be unveiled.

What you'll learn

Consider this tutorial as your all-access pass to the world of cutting-edge technologies mentioned in job descriptions. And even if they're not explicitly listed, you'll be equipped with the skills to tackle any challenge that shares a similar architecture. You'll learn how to

create an API using FastAPI
scrape data using scrapy
insert data into a cloud database
create lambda functions to trigger the scraper
create a cloud run service to deploy your API

Pre-requisites

Before you begin, make sure you have the following tools installed in your working environment

Docker
Python (≥ 3.8) and pip
Npm & Npx (≥ 9.x.x)

Acquiring Data with Scrapy

Our first step on this exciting journey will be to acquire data, and we'll do so by leveraging the dynamic capabilities of Scrapy, a reliable companion to Python. While static websites may lull you into a slumber (unless, of course, you're creating a portfolio for sheep enthusiasts), we're aiming for a more engaging experience. Scrapy's refined design architecture, akin to a debonair secret agent's tuxedo, exudes both elegance and efficiency. Of course, alternative methods too are available for the adventurous souls among us.

Without further ado, fire up your shell and type the following to install scrapy.
pip install scrapy
Check if the installation was a success by typing:
scrapy --version
You should see the following output:

Before you start scraping, you will have to set up a new Scrapy project. Enter a directory where you'd like to store your code and run: (taken this from https://docs.scrapy.org/en/latest/intro/tutorial.html)
scrapy startproject tutorial
This will create a tutorial directory with the following contents:

tutorial/
    scrapy.cfg            # deploy configuration file
    tutorial/             # project's Python module, you'll import your code from here
        __init__.py
        items.py          # project items definition file
        middlewares.py    # project middlewares file
        pipelines.py      # project pipelines file
        settings.py       # project settings file
        spiders/          # a directory where you'll later put your spiders
            __init__.py

This is the code for our first Spider. Save it in a file named quotes_spider.py under the tutorial/spiders directory in your project:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        "<https://quotes.toscrape.com/page/1/>",
    ]
    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
                "tags": quote.css("div.tags a.tag::text").getall(),
            }
        next_page = response.css("li.next a::attr(href)").get()
        if next_page is not None:
            next_page = response.urljoin(next_page)
            yield scrapy.Request(next_page, callback=self.parse)

Now, from your tutorial/spiders directory, run the following command to check if your spider is working well:
scrapy crawl quotes
You should see a couple of debug logs in your terminal along with the scraped data and a summary at the end with the stats:

See the output as above? Good, now let's set up our database for storing all the scraped data. I'm choosing the serverless XataDB for this. There are couple of other serverless DBs that are as good as not more than this. However, for our $0 architecture and ease of use, I'll be using XataDB.
Open up https://xata.io/ and create a free account. Set up your database name, location and click next.

create a new table and click next, next..

You'll see the following dashboard. This is your table where the data will reside.

We'll modify the schema to add a few features to our table. Go to Schema tab and create three columns, author (string) - quote (text) - date (Datetime)

Click on your database name and go to it's settings.

Note the endpoint of your db, you'll need this for the connection string.

go back and click on your profile. Open Account settings.

In the Personal API keys section, create a new API. You'll be using this to access your table to store the data. Do not share your API. Click save and note the key.

We'll need the Python Xata SDK to make it work in our scrapy project. Open up your shell and type
pip install xata
This will install the 1.xx version in your environment. If, for some reason, the version installed is 0.xx, there are not that major changes in the functionality and you can still use that to perform transactions all the same. Refer to (https://xata.io/docs/sdk/python/examples) for more information.

Before we connect our database we have to modify our spider a bit. First, create a new Item in tutorial/items.py file.

import scrapy
from itemloaders.processors import TakeFirst

class QuotesItem(scrapy.Item):
    text   = scrapy.Field(output_processor=TakeFirst())
    date   = scrapy.Field(output_processor=TakeFirst())
    author = scrapy.Field(output_processor=TakeFirst())

Now, open up your quotes_spider.py file and update your spider.

import scrapy
from datetime import datetime
from zero.items import QuotesItem
from scrapy.loader import ItemLoader

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        "<https://quotes.toscrape.com/page/1/>",
    ]
    def parse(self, response):
        date = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

        for quote in response.css("div.quote"):
            loader = ItemLoader(item=QuotesItem())
            loader.add_value("text", quote.css("span.text::text").get())
            loader.add_value("author", quote.css("small.author::text").get())
            loader.add_value("date", date)
            yield loader.load_item()
        next_page = response.css("li.next a::attr(href)").get()
        if next_page is not None:
            next_page = response.urljoin(next_page)
            yield scrapy.Request(next_page, callback=self.parse)

Create a new class or modify the existing one in your tutorial/pipelines.py file.

from xata.client import XataClient
from itemadapter import ItemAdapter

API_KEY = "<YOUR API>"
DB_URL  = "<YOUR ENDPOINT>"

class ZeroPipeline:
    def __init__(self):
        self.client = XataClient(api_key=API_KEY, db_url=DB_URL)
    def process_item(self, item, spider):
        text   = item['text']
        date   = item['date']
        author = item['author']
        record = {
            "author": author,
            "quote": text,
            "date": date
        }

        resp = self.client.records().insert("zero", record)
        spider.logger.info("Record id is: %s" %resp.json()["id"])
        return item

💡I don't recommend using your secret keys in the same file in a plain string text as above but for this tutorial, I'll skip.

Finally, in your tutorial/settings.py file, enable the pipeline. You'lll see the following setting commented out. Un-comment it. Or, you're using a custom classname then change the "ZeroPipeline" with the name you specified.

ITEM_PIPELINES = {
   "zero.pipelines.ZeroPipeline": 300,
}

In your terminal, run the following command to start your spider:
scrapy crawl quotes
Check the debug logs and make sure there aren't any 400 errors. If successful, you'll see the data populated in your table.

Congratulations, you're halfway through.
Now it's time to make an API to fetch that data.

Creating an API using FastAPI

Being the buzz-word in the backend word, I'm using FastAPI to leverage it's fast capabilities. One thing about FastAPI I find impressive is the use of strong typing model (pydantic) to validate the models you build. You can also use Flask for this task but considering how everyone is now demanding 10 years of experience in FastAPI, it should be lucrative to get to know it.
Launch your shell and type the following command
pip install fastapi
You will also need an ASGI server, for production such as Uvicorn or Hypercorn.
pip install "uvicorn[standard]"
Create a new directory called api, next to your web-scraper. Create a new file api/main.py and enter the following content:

from typing import Any, Union
from datetime import datetime
from pydantic import BaseModel
from xata.client import XataClient
from fastapi import FastAPI, HTTPException, Query, Body

app = FastAPI()

API_KEY = "<YOUR API>"
DB_URL = "<YOUR ENDPOINT>"

xata = XataClient(api_key=API_KEY, db_url=DB_URL)

class FilterParams(BaseModel):
    """Base Model class to pass into the FastAPI routes for xata client."""
    table_name: str
    columns: list
    filter: Union[dict, None] = None
    sort: Union[dict, None] = None

@app.get("/")
def read_root():
    return {"message": "All OK"}

@app.post("/read-records/")
def read_records(filter_params: FilterParams):
    """Reads the records from XataDB"""
    try:
        resp = xata.data().query(filter_params.table_name, {
            "columns": filter_params.columns,
            "filter": filter_params.filter,
            "sort": filter_params.sort
        })
        if resp.is_success():
            records = resp.json()["records"]
            for record in records:
                date_str = record["date"]
                date_obj = datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ")
                formatted_date = date_obj.strftime("%A, %B %d, %Y")
                record["formatted_date"] = formatted_date
            return records
        else:
            raise HTTPExecption(status_code=resp.status_code, detail="Error reading records")
    except Exception as error:
        raise HTTPException(status_code=500, detail=str(error)) from error

Save the file and write the following in your shell
uvicorn main:app --reload
This will start your server on port 8000 on your localhost.

Click on the link or go to localhost:8000 and you'll be greeted with the message:

Let's go the the docs and interact with our API. append /docs at the end of your link e.g localhost:8000/docs and FastAPI will generate the UI using swagger.

Click on the POST method /read-records/ to test whether our API fetches the required data. Modify the body to reflect the changes below. Change the date accordingly.

{
  "table_name": "zero",
  "columns": [
    "date", "author", "quote"
  ],
  "filter": {"date": "2023-09-18T11:34:27Z"},
  "sort": {"date": "asc"}
}

Click on execute and check the response

You should see the following output.

If you've made it this far, good job.

Dockerizing your API

It's time to flex your Docker skills. Actually, not much is needed here except a few lines of code.
First, create api/requirements.txt file and enter the following requirements.

annotated-types==0.5.0
click==8.1.7
deprecation==2.1.0
fastapi==0.101.1
h11==0.14.0
httptools==0.6.0
orjson==3.9.5
pydantic==2.2.1
pydantic_core==2.6.1
starlette==0.27.0
uvicorn==0.23.2
uvloop==0.17.0
watchfiles==0.19.0
websockets==11.0.3
xata==1.0.1

Create a Dockerfile and copy the following content into it.

FROM python:3.9-slim@sha256:980b778550c0d938574f1b556362b27601ea5c620130a572feb63ac1df03eda5

ENV PYTHONUNBUFFERED True
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
ENV PORT 1234
RUN pip install --no-cache-dir -r requirements.txt
CMD exec uvicorn main:app --host 0.0.0.0 --port ${PORT} --workers 1

Build your image by running the following command:
docker build -t api .
Check if the container is working correctly
docker run -dp 8000:1234 -e PORT=1234 api
Go to localhost:8000 and you should see a response of {"message": "All OK"} You can skip the next part if you intend to only work locally. However, I would suggest not to.

Deploying your API on Google Cloud Run

Register for the google cloud account if you haven't already. Once you're in, create a new project.

Enter the project name and click Create.

Select the project to start working on it.

Make sure your billing account is linked to this project, if not, link it. Note the project ID to be used. In this case, it is zerodollar-399414

I'll be using Gcloud CLI to manage and deploy my service. Refer → https://cloud.google.com/sdk/docs/install to install it on your system.

Link your project id with the gcloud CLI
gcloud config set project <YOUR PROJECT ID>

Authenticate your account by running the command. This will prompt you for your google credentials by opening up a modal.
gcloud auth login
After log-in, check whether these services are enabled. If not, you have to enable them by using the following command: gcloud services enable <servicename>.googleapis.com

Create a new repository in Artifact registry. Google provides 0.5gb of free storage, make sure that you do not exceed this to remain inside the $0 budget.
gcloud artifacts repositories create zero --repository-format=docker --location=us-east1 --description="Zero Dollar API"
Before pushing the images to your repository, you first have to configure docker authentication. Type: (change the region accordingly)
gcloud auth configure-docker us-east1-docker.pkg.dev

Tag your image by entering the following command:

docker tag api:latest us-east1-docker.pkg.dev/<YOUR PROJECT ID>/zero/api

Finally, push your image to the Artifact registry:
docker push us-east1-docker.pkg.dev/<YOUR PROJECT ID>/zero/api

Deploying your Image to Cloud Run

It's time to deploy your image as a service using cloud run. Run the following command:

gcloud run deploy zero-dollar-api --image us-east1-docker.pkg.dev/zerodollar-399414/zero/api
 --allow-unauthenticated

After a while, you will see the following message on success.

Go to your google cloud console and click on the hamburger icon on the left of the screen and find Cloud Run. Click on it.

You'll see your service running. Click on it.

Go to the URL specified in the above service. Hopefully, you'll see the OK message indicating the success of your deployed API.
Wonderful! now your API is publically available. How does it feel? amazing, right?

💡 Given the number of words exceeded for a typical tutorial article here, I'll be skipping the scheduling process using Lambda function using EventBridge. And also, deploying the scrapy spiders.

Front-end using Next.js

Next.js is chosen as the frontend for our website because it has many advantages. It helps the website load faster and perform better. It's easy to scale your website as it grows. Next.js also makes it simple to develop and deploy your website. Create a new project by typing:
npx create-next-app@latest
select the default values and press enter

cd into your project and create new folders: pages/api/ Your folder structure now should look like this.

Inside your /api folder, create a new file called fetch-quotes.ts and enter the following code.

import type { NextApiRequest, NextApiResponse } from "next";

type ResponseData = {
    date: string,
    author: string,
    quote: string,
}
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
    try {
        const URL: string = "http://localhost:8080/read-records/"
        const response = await fetch(URL, {
            method: "POST",
            headers: {
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                "table_name": "zero",
                "columns": ["date", "author", "quote"],
                "filter": {"date": "2023-09-18T11:34:27Z"},
                "sort": {"date": "asc"}
                })
        });
        const data: Promise<ResponseData> = await response.json();
        res.status(200).json(data);
    } catch (error) {
        console.error("ERROR: fetching data", error);
        res.status(500).json({error: "Error fetching data"});
    }
}

💡 The reason we're making Next api is to mask our google cloud API.
We'll first be using our locally run docker API to fetch the data. Make sure that your container is running on port 8080.
Create a new file Table.tsx in your components folder and enter the following content

'use client'
import { useEffect, useState } from 'react';

interface QuoteData {
  date: string;
  author: string;
  quote: string;
  formatted_date: string;
}
const Table = () => {
  const [quoteData, setQuoteData] = useState<QuoteData[]>([]);
  useEffect(() => {
    const fetchData = async () => {
      try {
        const response = await fetch('/api/fetch-quotes');
        if (response.ok) {
          const data: QuoteData[] = await response.json();
          setQuoteData(data);
        } else {
          console.error('Error fetching data:', response.statusText);
        }
      } catch (error) {
        console.error('Error fetching data:', error);
      }
    };
    fetchData();
  }, []);
  return (
    <table className="w-full border-collapse">
      <thead>
        <tr>
          <th className="border p-2">Date</th>
          <th className="border p-2">Author</th>
          <th className="border p-2">Quote</th>
        </tr>
      </thead>
      <tbody>
        {quoteData.map((quote, index) => (
          <tr key={quote.date}>
            <td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.formatted_date}</td>
            <td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.author}</td>
            <td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.quote}</td>
          </tr>
        ))}
      </tbody>
    </table>
  );
};
export default Table;

Finally, use your component in your page.tsx

import Table from "@/components/Table"

export default function Home() {
  return (
    <div>
      <Table />
    </div>
  )
}

Run: npm run dev to start your next app. Hopefully, you'll see the following table.
Now edit your Table.tsx and replace the URL with your Cloud Run API url

import type { NextApiRequest, NextApiResponse } from "next";

type ResponseData = {
    date: string,
    author: string,
    quote: string,
}
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
    try {
        const URL: string = "<Your API>/read-records/"
        const response = await fetch(URL, {
            method: "POST",
            headers: {
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                "table_name": "zero",
                "columns": ["date", "author", "quote"],
                "filter": {"date": "2023-09-18T11:34:27Z"},
                "sort": {"date": "asc"}
                })
        });
        const data: Promise<ResponseData> = await response.json();
        res.status(200).json(data);
    } catch (error) {
        console.error("ERROR: fetching data", error);
        res.status(500).json({error: "Error fetching data"});
    }
}

The page will refresh and populate the data from the API. You can look at the metrics for verification.

Congratulations! you can now build an end-to-end applications incorporating serverless cloud services.

For the complete tutorial including Lambda Functions and Spiders deployment go here

You can find all the source code here

DEV Community

$0 Architecture: Full-Stack application on Serverless Cloud from scratch

What you'll learn

Pre-requisites

Acquiring Data with Scrapy

Creating an API using FastAPI

Dockerizing your API

Deploying your API on Google Cloud Run

Deploying your Image to Cloud Run

Front-end using Next.js

Top comments (0)

Read next

Create Different Type of Flavor on Flutter Application

Next.js: La Guía Definitiva del Framework React más Popular

Optimizando la Integración de APIs de Blog: Lecciones Aprendidas con Dev.to y Hashnode

JSDoc: La Guía Definitiva para Documentar tu Código JavaScript