In my last post I created a lambda that accepts a request, stores it in a dynamodb table and sends a message to an SQS queue.
Let’s now create another lambda to read from that queue and process the request by scraping the url using selenium.
Installing Selenium
Create a new file under src called “chrome-deps.txt” and copy the following into it -
acl adwaita-cursor-theme adwaita-icon-theme alsa-lib at-spi2-atk at-spi2-core
atk avahi-libs cairo cairo-gobject colord-libs cryptsetup-libs cups-libs dbus
dbus-libs dconf desktop-file-utils device-mapper device-mapper-libs elfutils-default-yama-scope
elfutils-libs emacs-filesystem fribidi gdk-pixbuf2 glib-networking gnutls graphite2
gsettings-desktop-schemas gtk-update-icon-cache gtk3 harfbuzz hicolor-icon-theme hwdata jasper-libs
jbigkit-libs json-glib kmod kmod-libs lcms2 libX11 libX11-common libXau libXcomposite libXcursor libXdamage
libXext libXfixes libXft libXi libXinerama libXrandr libXrender libXtst libXxf86vm libdrm libepoxy
liberation-fonts liberation-fonts-common liberation-mono-fonts liberation-narrow-fonts liberation-sans-fonts
liberation-serif-fonts libfdisk libglvnd libglvnd-egl libglvnd-glx libgusb libidn libjpeg-turbo libmodman
libpciaccess libproxy libsemanage libsmartcols libsoup libthai libtiff libusbx libutempter libwayland-client
libwayland-cursor libwayland-egl libwayland-server libxcb libxkbcommon libxshmfence lz4 mesa-libEGL mesa-libGL
mesa-libgbm mesa-libglapi nettle pango pixman qrencode-libs rest shadow-utils systemd systemd-libs trousers ustr
util-linux vulkan vulkan-filesystem wget which xdg-utils xkeyboard-config
Create another file called “install-browser.sh” and copy the following -
#!/bin/bash
echo "Downloading Chromium..."
curl "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F$CHROMIUM_VERSION%2Fchrome-linux.zip?generation=1652397748160413&alt=media" > /tmp/chromium.zip
unzip /tmp/chromium.zip -d /tmp/
mv /tmp/chrome-linux/ /opt/chrome
curl "https://www.googleapis.com/download/storage/v1/b/chromium-browser-snapshots/o/Linux_x64%2F$CHROMIUM_VERSION%2Fchromedriver_linux64.zip?generation=1652397753719852&alt=media" > /tmp/chromedriver_linux64.zip
unzip /tmp/chromedriver_linux64.zip -d /tmp/
mv /tmp/chromedriver_linux64/chromedriver /opt/chromedriver
Update the Dockerfile to look like this -
FROM public.ecr.aws/lambda/python:3.9 as stage
# Hack to install chromium dependencies
RUN yum install -y -q sudo unzip
# Current stable version of Chromium
ENV CHROMIUM_VERSION=1002910
# Install Chromium
COPY install-browser.sh /tmp/
RUN /usr/bin/bash /tmp/install-browser.sh
FROM public.ecr.aws/lambda/python:3.9 as base
COPY chrome-deps.txt /tmp/
RUN yum install -y $(cat /tmp/chrome-deps.txt)
COPY --from=stage /opt/chrome /opt/chrome
COPY --from=stage /opt/chromedriver /opt/chromedriver
COPY create.py ${LAMBDA_TASK_ROOT}
COPY process.py ${LAMBDA_TASK_ROOT}
COPY requirements.txt ${LAMBDA_TASK_ROOT}
COPY db/ ${LAMBDA_TASK_ROOT}/db/
RUN python3.9 -m pip install -r requirements.txt -t .
Update the requirements.txt file and add
selenium==4.4.2
And install the dependency
pip install -r src/requirements.txt
Process the request
Create a new file under src for the new lambda function called “process.py”
Finally, modify the template.yaml file to tell SAM about the new lambda -
Since we created a new lambda function, we need to tell aws where to grab the image from. Modify the samconfig.toml file and add another entry into the image_repositories array for ProcessFunction with the exact same value as that of CreateFunction. So if the row looked like this before -
image_repositories = ["CreateFunction=541434768954.dkr.ecr.us-east-2.amazonaws.com/serverlessarchexample8b9687a4/createfunction286a02c8repo"]
It should now look like this -
image_repositories = ["CreateFunction=541434768954.dkr.ecr.us-east-2.amazonaws.com/serverlessarchexample8b9687a4/createfunction286a02c8repo",
"ProcessFunction=541434768954.dkr.ecr.us-east-2.amazonaws.com/serverlessarchexample8b9687a4/createfunction286a02c8repo"]
Test the changes
Build the app -
sam build
To mimic receiving an event from the queue, we invoke the lambda by passing it a sample payload.
Under the events directory, update the contents of the event.json file -
Now we run the app locally with the following command -
sam local invoke --env-vars ./tests/env.json -e ./events/event.json ProcessFunction
The output should look like -
Check the local dynamodb table to verify that the request was marked complete -
Deploying the changes
Deploy the changes to aws with the following command -
sam deploy
The output should look like this -
Just like before, test the changes by triggering a request for postman & validating the data in the dynamodb table -
You’ll notice that the message from the last test was also processed successfully.
Source Code
Here is the source code for the project created here.
Next: Part 5: Writing a CSV to S3 from AWS Lambda
Top comments (0)