The purpose of this post is to show how to use the Beautiful Soup module in AWS Lambda with Python Runtimes.Keep in mind, AWS Lambda is not integrated with all the modules available for Python. And the only way to import the modules to Lambda is to bundle the lambda function alongside the modules in an isolated environment.
The idea is similar to containers, we create an isolated environment, we import only the dependencies we need to make our application working, we write our code, we bundle it and we export it to the Cloud.
1) Install Pipenv
Pipenv is the module that will enable us to create an isolated environment for our lambda function.
sudo apt update -y sudo apt upgrade -y sudo apt install python-pip pip3 install pipenv
sudo apt update -y sudo apt upgrade -y sudo yum install python-pip pip3 install pipenv
2) Create a Python 3.8 environment
At the time I am writing, the documentation for Beautiful Soup has been written for Python 3.8 https://www.crummy.com/software/BeautifulSoup/bs4/doc/ . As result,we need a Python 3.8 isolated environment. Let's create it:
pipenv --python 3.8
3) Install our bs4 dependency
pip install bs4
4) Write the Lambda function
Create a file name lambda_function.py, it's very important to name it "lambda_function.py", otherwise the Lambda handler will not work
Copy the code supplied in the following link, paste it onto the the lambda_function.py file, and save it. https://github.com/aissa-laribi/bs4-in-lambda/blob/main/lambda_function.py
5) Bundle up the lambda function and the dependencies
Now it's time to move the lambda function alongside the dependencies of our environment.
cp lambda_function.py ~/.local/share/virtualenvs/<yourenvname>/lib/python3.8/site-packages cd ~/.local/share/virtualenvs/<yourenvname>/lib/python3.8/site-packages ls
And normally, you should be able to see your lambda function alongside the dependencies.
Now, we need to zip the whole directory
zip -r9 bs4_in_lambda.zip * cp bs4_in_lambda.zip ~/Desktop cd ~/Desktop
6) Upload the Zip File to your Lambda Function****
Then, create a Lambda Function, and make sure that Python 3.8 Runtime is selected.
Go to Code and in the top right corner, click on Upload from
And select the Zip file we have created.
The Lambda function will show up and we can see on the left side all the pip packages stored in folders.
Then go to Configuration and increase the runtime because there are 10 pages to be scraped.
Let's set a 5 minute runtime to make sure it will scrape all the pages.
Then,return to Code > Test.
Leave the Configuration Test content by default and add any name to the Event Name and Save.
Click on Test.
And sometimes we will get this error message.
The trick is to switch between http and https in our function.
Press Ctrl + F Scroll Down and replace all "https" with "http", Deploy and test the function.
We have a list of websites.
Then go to Monitor > Logs > Click in the LogStream of the first invocation.Then, a new window will open, and you will get access to the full list.