Lorenz Vanthillo for AWS Community Builders

Posted on Oct 6, 2022 • Edited on Nov 17, 2022 • Originally published at lvthillo.com

How I got my new bicycle thanks to AWS!

#programming #aws #serverless #cloud

I bought my first bicycle in 2019. The pandemic had just started and during that period all team sports were stopped. The bicycle was already a few years old and I bought it for a few hundred euros. I really started to enjoy cycling and and I joined a small cycling team.

During the summer of 2021, I started looking around for a new bike. It was time to trade my old one for a new one. I became interested in a Canyon Endurance CF SL 8 but unfortunately they were always all sold out. In September 2021, I subscribed to the mailing list. I would get an email if the model was back in stock.

After a few months I started to wonder why I never got an email. I saw that similar bikes were sold and after doing a little investigation on Reddit, I read that the notification system wasn't working properly. The emails were probably sent in small badges, and it didn't take more than a few minutes before all Canyon Endurance CF SL 8s were sold out again.

It was time to look for another method. I read about a tool called Distrill, which could be used to check the site every so many seconds. The free version was working with a browser plugin. To be able to use it without your browser you needed a paid plan.

So I decided to create something similar which was cheaper.
I decided to develop a Lambda function to scrape* the web page and check if the bicycle in my size was still unavailable. I checked the content of the div.

I made use of the Python module BeautifulSoup to scrape the webpage and find a match for the Coming soon text.
If the text would change, I would get notified by email using SNS.

""" Scrape Canyon site."""
import requests
import os

import boto3
from bs4 import BeautifulSoup

client = boto3.client("sns")

url = "https://www.canyon.com/xxx"

def lambda_handler(event, context):
    """Main."""
    page = requests.get(url)
    results = BeautifulSoup(page.content, "html.parser")

    items = []
    for div in results.findAll(
        "div", attrs={"class": "productConfiguration__availabilityMessage"}
    ):
        text = div.text
        items.append(text.strip())

    # size small is 4th of the list
    small_item = items[3]
    print("item: " + small_item)

    if "Soon" not in small_item:
        print("alert!")
        client.publish(
            TopicArn=os.environ["TOPIC"],
            Message="Time to buy a Canyon!",
            Subject="Time to buy a Canyon!",
        )

The code (and actually the whole solution) is really basic. I didn't need advanced integrations or checks.
I used a simple EventBridge rule to trigger the Lambda every minute (except when I need to sleep!).

  Event:
    Type: AWS::Events::Rule
    Properties:
      Description: Trigger every minute
      Name: ScraperEvent
      # Run every minute when I don't sleep
      ScheduleExpression: cron(0/1 6-23 * * ? *)
      Targets:
        - Arn: !GetAtt Scraper.Arn #Lambda Arn
          Id: canyon-scraper
  LambdaPermission:
    Type: AWS::Lambda::Permission
    Properties:
      FunctionName: !GetAtt Scraper.Arn
      Action: lambda:InvokeFunction
      Principal: events.amazonaws.com
      SourceArn: !GetAtt Event.Arn

SNS was used to send notifications.

  Topic:
    Type: AWS::SNS::Topic
    Properties: 
      DisplayName: canyon-topic
      Subscription: 
        - Endpoint: me@mail.com
          Protocol: email
      TopicName: canyon-topic

I also made use of Lambda layers to make the function as fast and lightweight as possible. I used Docker to build my layer and I uploaded it to S3.

$ docker run --rm \
--volume=$(pwd):/lambda-build \
-w=/lambda-build \
lambci/lambda:build-python3.8 \
pip install -r requirements.txt --target python

$ zip -vr python.zip python/

$ aws s3 cp python.zip s3://xxx-layers/python.zip

I configured my Lambda to make use of this layer.

  Scraper:
    Type: AWS::Serverless::Function
    Properties:
      FunctionName: canyon-scraper
      CodeUri: src/
      Handler: lambda.lambda_handler
      Runtime: python3.8
      Role: !GetAtt ScraperRole.Arn
      Environment:
        Variables:
          TOPIC: !Ref Topic
      Layers:
        - !Ref libs
  libs:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: python-lib
      Description: Dependencies for the canyon scraper
      ContentUri: s3://xxx-layers/python.zip
      CompatibleRuntimes:
        - python3.8

The function was fast enough to run with the minimum amount of memory and with a timeout of a few seconds.

This basic solution was monitoring the website ever minute. It's important to note that Cron expressions that lead to rates faster than 1 minute are not supported. If you need a faster solution, then check out this blog of a fellow Community Builder! Be sure that you're allowed to scrape and that you're not flooding the server!

Now I just had to wait, and after a month I got an email...

And I was able to order my favorite bicycle! It gave me great satisfaction. Like many, I'm also interested in new AWS features, fancy integrations and big setups, but sometimes you don't need the new fancy stuff to accomplish your needs.

All code is available here. Feel free to fork it and adapt it to your needs.

*Web scraping is legal if you follow the rules (avoid scraping personal data or intellectual property, check copyrights and robots.txt of the website, you're not allowed to flood the servers, ...!