DEV Community

Hamzza K
Hamzza K

Posted on

$0 Architecture: Full-Stack application on Serverless Cloud from scratch

Welcome to the ultimate tutorial that will guide you through the exhilarating journey of building a complete website from scratch. With a pinch of coding wizardry and a dash of creativity, you'll learn the art of crafting a captivating frontend using Next.js. But we won't stop there! We'll delve into the realm of backend development, where you'll master the server-side sorcery with Python and a powerful framework like FastAPI. From integrating databases to creating dynamic content, every aspect of website development will be unveiled.

What you'll learn

Consider this tutorial as your all-access pass to the world of cutting-edge technologies mentioned in job descriptions. And even if they're not explicitly listed, you'll be equipped with the skills to tackle any challenge that shares a similar architecture. You'll learn how to

  • create an API using FastAPI
  • scrape data using scrapy
  • insert data into a cloud database
  • create lambda functions to trigger the scraper
  • create a cloud run service to deploy your API

Pre-requisites

Before you begin, make sure you have the following tools installed in your working environment

  • Docker
  • Python (≥ 3.8) and pip
  • Npm & Npx (≥ 9.x.x)

Acquiring Data with Scrapy

Our first step on this exciting journey will be to acquire data, and we'll do so by leveraging the dynamic capabilities of Scrapy, a reliable companion to Python. While static websites may lull you into a slumber (unless, of course, you're creating a portfolio for sheep enthusiasts), we're aiming for a more engaging experience. Scrapy's refined design architecture, akin to a debonair secret agent's tuxedo, exudes both elegance and efficiency. Of course, alternative methods too are available for the adventurous souls among us.

Scrapy Architecture

Without further ado, fire up your shell and type the following to install scrapy.
pip install scrapy
Check if the installation was a success by typing:
scrapy --version
You should see the following output:

scrapy version

Before you start scraping, you will have to set up a new Scrapy project. Enter a directory where you'd like to store your code and run: (taken this from https://docs.scrapy.org/en/latest/intro/tutorial.html)
scrapy startproject tutorial
This will create a tutorial directory with the following contents:

tutorial/
    scrapy.cfg            # deploy configuration file
    tutorial/             # project's Python module, you'll import your code from here
        __init__.py
        items.py          # project items definition file
        middlewares.py    # project middlewares file
        pipelines.py      # project pipelines file
        settings.py       # project settings file
        spiders/          # a directory where you'll later put your spiders
            __init__.py
Enter fullscreen mode Exit fullscreen mode

This is the code for our first Spider. Save it in a file named quotes_spider.py under the tutorial/spiders directory in your project:

import scrapy

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        "<https://quotes.toscrape.com/page/1/>",
    ]
    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").get(),
                "author": quote.css("small.author::text").get(),
                "tags": quote.css("div.tags a.tag::text").getall(),
            }
        next_page = response.css("li.next a::attr(href)").get()
        if next_page is not None:
            next_page = response.urljoin(next_page)
            yield scrapy.Request(next_page, callback=self.parse)
Enter fullscreen mode Exit fullscreen mode

Now, from your tutorial/spiders directory, run the following command to check if your spider is working well:
scrapy crawl quotes
You should see a couple of debug logs in your terminal along with the scraped data and a summary at the end with the stats:
scrapy stats

See the output as above? Good, now let's set up our database for storing all the scraped data. I'm choosing the serverless XataDB for this. There are couple of other serverless DBs that are as good as not more than this. However, for our $0 architecture and ease of use, I'll be using XataDB.
Open up https://xata.io/ and create a free account. Set up your database name, location and click next.

XataDB
create a new table and click next, next..

XataDB creating a Database
You'll see the following dashboard. This is your table where the data will reside.

Table
We'll modify the schema to add a few features to our table. Go to Schema tab and create three columns, author (string) - quote (text) - date (Datetime)

XataDB schema
Click on your database name and go to it's settings.

Database settings
Note the endpoint of your db, you'll need this for the connection string.

XataDB endpoint
go back and click on your profile. Open Account settings.

XataDB account settings
In the Personal API keys section, create a new API. You'll be using this to access your table to store the data. Do not share your API. Click save and note the key.

XataDB private keys
We'll need the Python Xata SDK to make it work in our scrapy project. Open up your shell and type
pip install xata
This will install the 1.xx version in your environment. If, for some reason, the version installed is 0.xx, there are not that major changes in the functionality and you can still use that to perform transactions all the same. Refer to (https://xata.io/docs/sdk/python/examples) for more information.

XataDB SDK
Before we connect our database we have to modify our spider a bit. First, create a new Item in tutorial/items.py file.

import scrapy
from itemloaders.processors import TakeFirst

class QuotesItem(scrapy.Item):
    text   = scrapy.Field(output_processor=TakeFirst())
    date   = scrapy.Field(output_processor=TakeFirst())
    author = scrapy.Field(output_processor=TakeFirst())
Enter fullscreen mode Exit fullscreen mode

Now, open up your quotes_spider.py file and update your spider.

import scrapy
from datetime import datetime
from zero.items import QuotesItem
from scrapy.loader import ItemLoader

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = [
        "<https://quotes.toscrape.com/page/1/>",
    ]
    def parse(self, response):
        date = datetime.now().strftime("%Y-%m-%dT%H:%M:%SZ")

        for quote in response.css("div.quote"):
            loader = ItemLoader(item=QuotesItem())
            loader.add_value("text", quote.css("span.text::text").get())
            loader.add_value("author", quote.css("small.author::text").get())
            loader.add_value("date", date)
            yield loader.load_item()
        next_page = response.css("li.next a::attr(href)").get()
        if next_page is not None:
            next_page = response.urljoin(next_page)
            yield scrapy.Request(next_page, callback=self.parse)
Enter fullscreen mode Exit fullscreen mode

Create a new class or modify the existing one in your tutorial/pipelines.py file.

from xata.client import XataClient
from itemadapter import ItemAdapter

API_KEY = "<YOUR API>"
DB_URL  = "<YOUR ENDPOINT>"

class ZeroPipeline:
    def __init__(self):
        self.client = XataClient(api_key=API_KEY, db_url=DB_URL)
    def process_item(self, item, spider):
        text   = item['text']
        date   = item['date']
        author = item['author']
        record = {
            "author": author,
            "quote": text,
            "date": date
        }

        resp = self.client.records().insert("zero", record)
        spider.logger.info("Record id is: %s" %resp.json()["id"])
        return item
Enter fullscreen mode Exit fullscreen mode

💡I don't recommend using your secret keys in the same file in a plain string text as above but for this tutorial, I'll skip.

Finally, in your tutorial/settings.py file, enable the pipeline. You'lll see the following setting commented out. Un-comment it. Or, you're using a custom classname then change the "ZeroPipeline" with the name you specified.

ITEM_PIPELINES = {
   "zero.pipelines.ZeroPipeline": 300,
}
Enter fullscreen mode Exit fullscreen mode

In your terminal, run the following command to start your spider:
scrapy crawl quotes
Check the debug logs and make sure there aren't any 400 errors. If successful, you'll see the data populated in your table.

XataDB Table
Congratulations, you're halfway through.
Now it's time to make an API to fetch that data.

Creating an API using FastAPI

Being the buzz-word in the backend word, I'm using FastAPI to leverage it's fast capabilities. One thing about FastAPI I find impressive is the use of strong typing model (pydantic) to validate the models you build. You can also use Flask for this task but considering how everyone is now demanding 10 years of experience in FastAPI, it should be lucrative to get to know it.
Launch your shell and type the following command
pip install fastapi
You will also need an ASGI server, for production such as Uvicorn or Hypercorn.
pip install "uvicorn[standard]"
Create a new directory called api, next to your web-scraper. Create a new file api/main.py and enter the following content:

from typing import Any, Union
from datetime import datetime
from pydantic import BaseModel
from xata.client import XataClient
from fastapi import FastAPI, HTTPException, Query, Body

app = FastAPI()

API_KEY = "<YOUR API>"
DB_URL = "<YOUR ENDPOINT>"

xata = XataClient(api_key=API_KEY, db_url=DB_URL)

class FilterParams(BaseModel):
    """Base Model class to pass into the FastAPI routes for xata client."""
    table_name: str
    columns: list
    filter: Union[dict, None] = None
    sort: Union[dict, None] = None

@app.get("/")
def read_root():
    return {"message": "All OK"}

@app.post("/read-records/")
def read_records(filter_params: FilterParams):
    """Reads the records from XataDB"""
    try:
        resp = xata.data().query(filter_params.table_name, {
            "columns": filter_params.columns,
            "filter": filter_params.filter,
            "sort": filter_params.sort
        })
        if resp.is_success():
            records = resp.json()["records"]
            for record in records:
                date_str = record["date"]
                date_obj = datetime.strptime(date_str, "%Y-%m-%dT%H:%M:%SZ")
                formatted_date = date_obj.strftime("%A, %B %d, %Y")
                record["formatted_date"] = formatted_date
            return records
        else:
            raise HTTPExecption(status_code=resp.status_code, detail="Error reading records")
    except Exception as error:
        raise HTTPException(status_code=500, detail=str(error)) from error
Enter fullscreen mode Exit fullscreen mode

Save the file and write the following in your shell
uvicorn main:app --reload
This will start your server on port 8000 on your localhost.

FastAPI server
Click on the link or go to localhost:8000 and you'll be greeted with the message:

FastAPI response
Let's go the the docs and interact with our API. append /docs at the end of your link e.g localhost:8000/docs and FastAPI will generate the UI using swagger.

FastAPI docs
Click on the POST method /read-records/ to test whether our API fetches the required data. Modify the body to reflect the changes below. Change the date accordingly.

FastsAPI response body

{
  "table_name": "zero",
  "columns": [
    "date", "author", "quote"
  ],
  "filter": {"date": "2023-09-18T11:34:27Z"},
  "sort": {"date": "asc"}
}
Enter fullscreen mode Exit fullscreen mode

Click on execute and check the response
FastAPI docs executing request
You should see the following output.

FastAPI docs response
If you've made it this far, good job.

Dockerizing your API

It's time to flex your Docker skills. Actually, not much is needed here except a few lines of code.
First, create api/requirements.txt file and enter the following requirements.

annotated-types==0.5.0
click==8.1.7
deprecation==2.1.0
fastapi==0.101.1
h11==0.14.0
httptools==0.6.0
orjson==3.9.5
pydantic==2.2.1
pydantic_core==2.6.1
starlette==0.27.0
uvicorn==0.23.2
uvloop==0.17.0
watchfiles==0.19.0
websockets==11.0.3
xata==1.0.1
Enter fullscreen mode Exit fullscreen mode

Create a Dockerfile and copy the following content into it.

FROM python:3.9-slim@sha256:980b778550c0d938574f1b556362b27601ea5c620130a572feb63ac1df03eda5

ENV PYTHONUNBUFFERED True
ENV APP_HOME /app
WORKDIR $APP_HOME
COPY . ./
ENV PORT 1234
RUN pip install --no-cache-dir -r requirements.txt
CMD exec uvicorn main:app --host 0.0.0.0 --port ${PORT} --workers 1
Enter fullscreen mode Exit fullscreen mode

Build your image by running the following command:
docker build -t api .
Check if the container is working correctly
docker run -dp 8000:1234 -e PORT=1234 api
Go to localhost:8000 and you should see a response of {"message": "All OK"} You can skip the next part if you intend to only work locally. However, I would suggest not to.

Deploying your API on Google Cloud Run

Register for the google cloud account if you haven't already. Once you're in, create a new project.
Google cloud run dashboard
Enter the project name and click Create.
Google cloud new project
Select the project to start working on it.
Google cloud select project
Make sure your billing account is linked to this project, if not, link it. Note the project ID to be used. In this case, it is zerodollar-399414
Google cloud billing account
I'll be using Gcloud CLI to manage and deploy my service. Refer → https://cloud.google.com/sdk/docs/install to install it on your system.
Google cloud sdk
Link your project id with the gcloud CLI
gcloud config set project <YOUR PROJECT ID>

Google cloud link
Authenticate your account by running the command. This will prompt you for your google credentials by opening up a modal.
gcloud auth login
After log-in, check whether these services are enabled. If not, you have to enable them by using the following command: gcloud services enable <servicename>.googleapis.com

Google cloud services list
Create a new repository in Artifact registry. Google provides 0.5gb of free storage, make sure that you do not exceed this to remain inside the $0 budget.
gcloud artifacts repositories create zero --repository-format=docker --location=us-east1 --description="Zero Dollar API"
Before pushing the images to your repository, you first have to configure docker authentication. Type: (change the region accordingly)
gcloud auth configure-docker us-east1-docker.pkg.dev

Google auth configure
Tag your image by entering the following command:

docker tag api:latest us-east1-docker.pkg.dev/<YOUR PROJECT ID>/zero/api
Enter fullscreen mode Exit fullscreen mode

google cloud tagged image
Finally, push your image to the Artifact registry:
docker push us-east1-docker.pkg.dev/<YOUR PROJECT ID>/zero/api

Deploying your Image to Cloud Run

It's time to deploy your image as a service using cloud run. Run the following command:

gcloud run deploy zero-dollar-api --image us-east1-docker.pkg.dev/zerodollar-399414/zero/api
 --allow-unauthenticated
Enter fullscreen mode Exit fullscreen mode

After a while, you will see the following message on success.
Google cloud deployed
Go to your google cloud console and click on the hamburger icon on the left of the screen and find Cloud Run. Click on it.

Google cloud run UI
You'll see your service running. Click on it.

Google cloud running services
Go to the URL specified in the above service. Hopefully, you'll see the OK message indicating the success of your deployed API.
Wonderful! now your API is publically available. How does it feel? amazing, right?

💡 Given the number of words exceeded for a typical tutorial article here, I'll be skipping the scheduling process using Lambda function using EventBridge. And also, deploying the scrapy spiders.

Front-end using Next.js

Next.js is chosen as the frontend for our website because it has many advantages. It helps the website load faster and perform better. It's easy to scale your website as it grows. Next.js also makes it simple to develop and deploy your website. Create a new project by typing:
npx create-next-app@latest
select the default values and press enter

Next default values
cd into your project and create new folders: pages/api/ Your folder structure now should look like this.

Project structure
Inside your /api folder, create a new file called fetch-quotes.ts and enter the following code.

import type { NextApiRequest, NextApiResponse } from "next";

type ResponseData = {
    date: string,
    author: string,
    quote: string,
}
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
    try {
        const URL: string = "http://localhost:8080/read-records/"
        const response = await fetch(URL, {
            method: "POST",
            headers: {
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                "table_name": "zero",
                "columns": ["date", "author", "quote"],
                "filter": {"date": "2023-09-18T11:34:27Z"},
                "sort": {"date": "asc"}
                })
        });
        const data: Promise<ResponseData> = await response.json();
        res.status(200).json(data);
    } catch (error) {
        console.error("ERROR: fetching data", error);
        res.status(500).json({error: "Error fetching data"});
    }
}
Enter fullscreen mode Exit fullscreen mode

💡 The reason we're making Next api is to mask our google cloud API.
We'll first be using our locally run docker API to fetch the data. Make sure that your container is running on port 8080.
Create a new file Table.tsx in your components folder and enter the following content

front-end structure

'use client'
import { useEffect, useState } from 'react';

interface QuoteData {
  date: string;
  author: string;
  quote: string;
  formatted_date: string;
}
const Table = () => {
  const [quoteData, setQuoteData] = useState<QuoteData[]>([]);
  useEffect(() => {
    const fetchData = async () => {
      try {
        const response = await fetch('/api/fetch-quotes');
        if (response.ok) {
          const data: QuoteData[] = await response.json();
          setQuoteData(data);
        } else {
          console.error('Error fetching data:', response.statusText);
        }
      } catch (error) {
        console.error('Error fetching data:', error);
      }
    };
    fetchData();
  }, []);
  return (
    <table className="w-full border-collapse">
      <thead>
        <tr>
          <th className="border p-2">Date</th>
          <th className="border p-2">Author</th>
          <th className="border p-2">Quote</th>
        </tr>
      </thead>
      <tbody>
        {quoteData.map((quote, index) => (
          <tr key={quote.date}>
            <td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.formatted_date}</td>
            <td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.author}</td>
            <td className={`border p-2 ${index !== quoteData.length - 1 ? 'border-b' : ''}`}>{quote.quote}</td>
          </tr>
        ))}
      </tbody>
    </table>
  );
};
export default Table;
Enter fullscreen mode Exit fullscreen mode

Finally, use your component in your page.tsx

import Table from "@/components/Table"

export default function Home() {
  return (
    <div>
      <Table />
    </div>
  )
}
Enter fullscreen mode Exit fullscreen mode

Run: npm run dev to start your next app. Hopefully, you'll see the following table.
Now edit your Table.tsx and replace the URL with your Cloud Run API url

import type { NextApiRequest, NextApiResponse } from "next";

type ResponseData = {
    date: string,
    author: string,
    quote: string,
}
export default async function handler(req: NextApiRequest, res: NextApiResponse) {
    try {
        const URL: string = "<Your API>/read-records/"
        const response = await fetch(URL, {
            method: "POST",
            headers: {
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                "table_name": "zero",
                "columns": ["date", "author", "quote"],
                "filter": {"date": "2023-09-18T11:34:27Z"},
                "sort": {"date": "asc"}
                })
        });
        const data: Promise<ResponseData> = await response.json();
        res.status(200).json(data);
    } catch (error) {
        console.error("ERROR: fetching data", error);
        res.status(500).json({error: "Error fetching data"});
    }
}
Enter fullscreen mode Exit fullscreen mode

The page will refresh and populate the data from the API. You can look at the metrics for verification.

Congratulations! you can now build an end-to-end applications incorporating serverless cloud services.

For the complete tutorial including Lambda Functions and Spiders deployment go here

You can find all the source code here

Top comments (0)