Ryan Nazareth for AWS Community Builders

Posted on Sep 17, 2022 • Originally published at ryannazareth.com

Setting up AWS Code Pipeline to automate deployment of tweets streaming application

#aws #devops #git #testing

Introduction

In this tutorial, we will configure AWS CodePipeline to build an ECR image and deploy the latest version to lambda container. The application code will stream tweets using Tweepy, a Python library for accessing the Twitter API. First we need to setup CodePipeline and the various stages to deploy application code to lambda image which will stream tweets when invoked. The intended devops architecture is as below.

Typically, a CodePipeline job contains the following stages:

Source: In this step the latest version of our source code will be fetched from our repository and uploaded to an S3 bucket. The application source code is maintained in a repository configured as a GitHub source action in the pipeline. When developers push commits to the repository, CodePipeline detects the pushed change, and a pipeline execution starts from the Source Stage. The GitHub source action completes successfully (that is, the latest changes have been downloaded and stored to the artifact bucket unique to that execution). The output artifacts produced by the GitHub source action, which are the application files from the repository, are then used as the input artifacts to be worked on by the actions in the next stage. This is described in more detail in the AWS docs
Build: During this step we will use this uploaded source code and automate our manual packaging step using a CodeBuild project. The build task pulls a build environment image and builds the application in a virtual container.
Unit Test: The next action can be a unit test project created in CodeBuild and configured as a test action in the pipeline.
Deploy to Dev/Test Environment: This deploys the application to a dev/test env environment using CodeDeploy or another action provider such as CloudFormation.
Integration Test: This runs end to end Integration testing project created in CodeBuild and configured as a test action in the pipeline.
Deploy to Production Environment: This deploys the application to a production environment. Could configure the pipeline so this stage requires manual approval to execute.

The stages described above are just an example and could be fewer or more depending on the application. For example, we could have more environments for testing before deploying to production. For the rest of the blog, we will describe how some of these stages can be configured for deploying the twitter streaming application. All source code referenced in the rest of the article can be accessed here.

The Application code for Streaming Tweets

First, you will need to sign up for a Twitter Developer account and create a project and associated developer application. Then generate the following credentials:

API Key and Secret: Username and password for the App
Access Token and Secret: Represent the user that owns the app and will be used for authenticating requests

These steps are explained in more detail here. We can then define a python function which requires these credentials as parameters to make requests to the API with OAuth 1.0a authentication. This will require the latest version of tweepy to be installed pip install tweepy.The event parameter will be the payload with keys keyword and duration to determine the keyword to search for and the duration the stream should last for. For example, to stream tweets containing keyword machine learning for 30 seconds, the payload would be {"keyword": "machine learning", "duration": 30}. The code also excludes any retweets to reduce noise.

import tweepy
import time
import json

def tweepy_search_api(
    event, consumer_key, consumer_secret, access_token, access_secret
):

    auth = tweepy.OAuth1UserHandler(
        consumer_key, consumer_secret, access_token, access_secret
    )
    start_time = time.time()
    time_limit = event["duration"]
    api = tweepy.API(auth, wait_on_rate_limit=True)
    counter = 0
    for tweet in tweepy.Cursor(
        api.search_tweets, event.get("keyword"), count=100, tweet_mode="extended"
    ).items():
        if time.time() - start_time > time_limit:
            api.session.close()
            print(f"\n {time_limit} seconds time limit reached, so disconnecting stream")
            print(f" {counter} tweets streamed ! \n")
            return
        else:
            if not tweet.full_text.startswith("RT"):
                counter += 1
                dt = tweet.created_at
                payload = {
                    "day": dt.day,
                    "month": dt.month,
                    "year": dt.year,
                    "time": dt.time().strftime("%H:%M:%S"),
                    "handle": tweet.user.screen_name,
                    "text": tweet.full_text,
                    "favourite_count": tweet.user.favourites_count,
                    "retweet_count": tweet.retweet_count,
                    "retweeted": tweet.retweeted,
                    "followers_count": tweet.user.followers_count,
                    "friends_count": tweet.user.friends_count,
                    "location": tweet.user.location,
                    "lang": tweet.user.lang,
                }
                print(f"{payload}")

Since we will be using AWS for doing this, we can store the credentials in AWS Secrets Manager rather than having them defined in the code or passing them as environment variables for greater security. We can use boto sdk for creating a client session for Secrets Manager and accessing the secrets.

import boto3
import json

def get_secrets():
   session = boto3.session.Session()
   client = session.client(service_name="secretsmanager",)
   response = client.list_secrets(Filters=[{"Key": 
   "description", "Values": ["twitterkeys"]}])
   arn = response["SecretList"][0]["ARN"]
   get_secret_value_response = 
   client.get_secret_value(SecretId=arn)
   secret = get_secret_value_response["SecretString"]
   secret = json.loads(secret)
   print(f"Successfully retrieved secrets !")
   return secret

We will also be invoking lambda to call the Twitter API to stream tweets, so we will need to define another script to call the tweepy_search_api and get_secrets functions defined above in a lambda handler, as shown below. This assumes tweepy_search_api and get_secrets functions are defined in modules secrets.py and tweets_api respectively and in the same directory as the lambda function module. The snippet below parses the user parameters from the payload depending on whether the lambda function is invoked via CodePipeline action or invoked directly from the Lambda console or local machine (for testing purposes). If invoked via CodePipeline action, the UserParameters key contains the parameters stored as string value, which need to be converted to a dictionary type programatically.


def handler(event, context):
    from tweets_api import tweepy_search_api
    from secrets import get_secrets
    import itertools
    import boto3
    import json

    print(f"Event payload type: {type(event)}")
    print(f"Event:{event}")
    if event.get("CodePipeline.job") is not None:
        mode = "cloud"
    else:
        mode = "local"
    print(f"Mode: {mode}")
    if mode == "cloud":
        data = event["CodePipeline.job"]["data"]["actionConfiguration"][
            "configuration"
        ]["UserParameters"]
        params = json.loads(data)
        job_id = event["CodePipeline.job"]["id"]
        print(f"Params:{params}, JobID: {job_id}")
    elif mode == "local":
        params = event.copy()
        print(f"Params:{params}")
    code_pipeline = boto3.client("codepipeline")
    response = get_secrets(mode="aws")
    api_keys = list(itertools.islice(response.values(), 4))
    print("Searching and delivering Tweets with Tweepy API: \n")

    try:
        tweepy_search_api(params, *api_keys)
        if mode == "cloud":
            code_pipeline.put_job_success_result(jobId=job_id)
    except Exception as e:
        print(f"Exception:{str(e)}")
        if mode == "cloud":
            code_pipeline.put_job_failure_result(
                jobId=job_id, failureDetails={"message": str(e), "type": "JobFailed"}
            )
        raise

The following folder contains all the code described above in addition to few more configuration files. These are buildspec.yml and Dockerfile for building the docker image (containing the application code) and pushing to ECR in the CodeBuild stage. These will be explained in more detail in the next sections.

Creating the resources with CloudFormation

First, we will need to create the following resources with CloudFormation which will be referenced when configuring pipeline. Note This assumes you already have an ECR repository named tweepy-stream-deploy, which is referenced as a parameter in the CloudFormation template, although it can be overridden.

Lambda Function with URI reference to ECR repository
Roles for Lambda, CodePipeline and CloudFormation

Parameters:
  ECRRepoName:
    Default: "tweepy-stream-deploy"
    Description: ECR repository for tweets application
    Type: String
Resources:
  LambdaImageStaging:
    Type: 'AWS::Lambda::Function'
    Properties:
      PackageType: Image
      FunctionName: "codedeploy-staging"
      Code:
        ImageUri: !Sub '${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${ECRRepoName}:latest'
      Role:
        Fn::ImportValue: RoleLambdaImage-TwitterArn
      Timeout: 302
      MemorySize: 1024

$ aws cloudformation create-stack --stack-name CodeDeployLambdaTweets --template-body file://cf-templates/CodeDeployLambdaTweepy.yaml

Create all the role resources using the templates here.

aws cloudformation create-stack --stack-name RoleCloudFormationforCodeDeploy --template-body file://cf-templates/roles/CloudFormationRole.yaml

aws cloudformation create-stack --stack-name RoleCodePipeline --template-body file://cf-templates/roles/CodepipelineRole.yaml

aws cloudformation create-stack --stack-name RoleLambdaImage --template-body file://cf-templates/roles/RoleLambdaImageStaging.yaml

We can validate the Cloudformation templates before deploying by using validate-template to check the template file for syntax errors. During validation, AWS CloudFormation first checks if the template is valid JSON. If it isn't, CloudFormation checks if the template is valid YAML. If both checks fail, CloudFormation returns a template validation error. Note that the aws cloudformation validate-template command is designed to check only the syntax of your template. It does not ensure that the property values that you have specified for a resource are valid for that resource. Nor does it determine the number of resources that will exist when the stack is created.

Creating the CodePipeline Stages

This will use the CodePipeline definition file to create the Source, Build and Deploy stages via the AWS cli. Alternatively, one could also do this with CloudFormation but since I had initially created CodePipeline via the AWS console, I found it easier to generate the structure of the pipeline using get-pipeline from the cli and reuse the definition file the create the CodePipeline again in the future, which will be described in the next steps.

CodePipeline also provide support for a number of actions as listed here which is part of a sequence in a stage, and is a task performed on an artifact. Codepipeline can integrate with a number of action providers such as CodeCommit, S3, Github, CodeBuild, Jenkins, CodeDeploy, CloudFormation, ECS etc in different stages of Source, Build, Test, Deploy. The full list of providers can be found in the docs. We will be using CodeCommit Source action, CodeBuild build action to , CloudFormation Deploy action and Lambda Invoke action.

Next, we will zip the cf templates folder. This is required for the Deploy stage in CodePipeline, which will use [CloudFormation Actions] to update the roles for CloudFormation, CodePipeline and Lambda, if the templates have changed and update the existing Lambda resource with the latest image tag to deploy the updated application source code. These templates will need to be committed to CodeCommit in the Source stage and output as artifacts. We will copy this zipped folder to S3 and configure code pipeline in the definition file so that action in source stage reads from the s3 location of template file.

$ cd cf-templates 
$ zip template-source-artifacts.zip CodeDeployLambdaTweepy.yaml roles/*
$ aws s3 cp template-source-artifacts.zip s3://codepipeline-us-east-1-49345350114/lambda-image-deploy/template-source-artifacts.zip

The definition json assumes CodePipeline role is created as described above. It is worth having a look at the file contents to understand the settings before we create the pipeline.

{
    "name": "CodeCommitSource",
    "actionTypeId": {
        "category": "Source",
        "owner": "AWS",
        "provider": "CodeCommit",
        "version": "1"
    },
    "runOrder": 1,
    "configuration": {
        "BranchName": "master",
        "OutputArtifactFormat": "CODE_ZIP",
        "PollForSourceChanges": "false",
        "RepositoryName": "deploy-lambda-image"
    },
    "outputArtifacts": [
        {
            "name": "CodeCommitSourceArtifact"
        }
    ],
    "inputArtifacts": [],
    "region": "us-east-1",
     "namespace": "SourceVariables"
   }

This creates a code commit repository named deploy-lambda-image and configures the output artifact CodeCommitSourceArtifact for the CodeCommit action. This is a ZIP file that contains the contents of the configured repository and branch at the commit specified as the source revision for the pipeline execution. We will later pass this artifact to the build stage. The next action in the source stage will be for loading the cloud formation templates zip file that we previously uploaded to the S3 bucket. This is an S3 Source Action which creates a CloudWatch Events Rule when a new object is uploaded to a source bucket. More details about this can be found in the AWS docs here.

{
    "name": "CFTemplatesArtifact",
    "actionTypeId": {
                      "category": "Source",
                      "owner": "AWS",
                      "provider": "S3",
                      "version": "1"
                    },
   "runOrder": 1,
   "configuration": {
                     "PollForSourceChanges":"false",
                      "S3Bucket": "codepipeline-us-east-1-49345350114",
                      "S3ObjectKey": "lambda-image-deploy/template-source-artifacts.zip"
                    },
   "outputArtifacts": [
                       {
                        "name": "CFTemplatesArtifact"
                       }
                       ],
   "inputArtifacts": [],
   "region": "us-east-1"
}

This action will also create an output artifact CFTemplatesArtifact so we can pass this to a downstream code deploy stage. The build stage includes information about how to run a build, including where to get the source code, which build environment to use, which build commands to run, and where to store the build output. It uses the following buildspec.yml file which will be included when copying the source code in the next section, to run a build. This action uses the CodeCommitSourceArtifact containing the application code which needs to be built.

 {
"name": "Build",
"actions": [
    {
        "name": "Build-Tweepy-Stream",
        "actionTypeId": {
            "category": "Build",
            "owner": "AWS",
            "provider": "CodeBuild",
            "version": "1"
        },
        "runOrder": 1,
        "configuration": {
            "BatchEnabled": "false",
            "EnvironmentVariables": "[{\"name\":\"IMAGE_REPO_NAME\",\"value\":\"tweepy-stream-deploy\",\"type\":\"PLAINTEXT\"},{\"name\":\"IMAGE_TAG\",\"value\":\"latest\",\"type\":\"PLAINTEXT\"},{\"name\":\"AWS_DEFAULT_REGION\",\"value\":\"us-east-1\",\"type\":\"PLAINTEXT\"},{\"name\":\"AWS_ACCOUNT_ID\",\"value\":\"[ACCT_ID]\",\"type\":\"PLAINTEXT\"}]",
            "ProjectName": "Build-Twitter-Stream"
        },
        "outputArtifacts": [],
        "inputArtifacts": [
            {
                "name": "CodeCommitSourceArtifact"
            }
        ],
        "region": "us-east-1"
    }
]
}

We will be building docker image to push to ECR. We set the following environment variables as they are referenced in the buildspec.yml.

AWS_DEFAULT_REGION: us-east-1
AWS_ACCOUNT_ID: with a value of account-ID
IMAGE_TAG: with a value of Latest
IMAGE_REPO_NAME: tweepy-stream-deploy

The buildspec.yml file, is similar to the example in the AWS docs for pushing the docker image to ECR. In the pre-build phase, we use the get-login-password cli command to retrieve an authentication token using the GetAuthorizationToken API to authenticate to ECR registry. The token is passed to the login command of the Docker cli to authenticate to the ECR registry and allow Docker to push and pull images from the registry until the authorization token expires (after 12 hours). The build phase runs the steps in the Dockerfile shown in the snippet below, to build the image.

FROM public.ecr.aws/lambda/python:3.9.2022.03.23.16

# Copy function code
COPY main_twitter.py ${LAMBDA_TASK_ROOT}
COPY secrets.py ${LAMBDA_TASK_ROOT}
COPY tweets_api.py ${LAMBDA_TASK_ROOT}

COPY requirements.txt  .
RUN  pip3 install -r requirements.txt --target "${LAMBDA_TASK_ROOT}"

# Set the CMD to your handler
CMD [ "main_twitter.handler" ]

The steps in the Dockerfile include:

Pulling the base python 3.9 image from ECR
Copying the module main_twitter.py containing the lambda handler and the other modules it imports from
Copying the requirements.txt file and install the python dependencies
Finally, setting the container entrypoint to the lambda handler

Once the image is successfully built in the build phase, it is tagged with the Latest tag. Finally, the post-build phase pushes the tagged image to the private ECR repository uri.

The next stage is the Deploy stage named DeployLambda, which use CloudFormation as Action provider for performing a number of actions to update the resource roles, deleting the existing lambda resource and deploying the latest image to lambda. All these actions use the CFTemplatesartifact from the source stage, to reference the path to the cloud formation template (in the TemplatePath property) relative to the root of the artifact. The Input Artifact would need to be the CloudFormation Template script which is output from the Source stage. We provide the stack name and CloudFormation role in the configuration. The ActionMode will depend on whether we need to create, update or delete the stack.

The first three actions will update the roles for CloudFormation, CodePipeline and Lambda, if the CloudFormatiom templates have changed. The runOrder property is set to value of 1 for these actions so that they run in parallel. The next action deletes any existing lambda image which may exist. The runOrder value is incremented to 2 so that it runs after the roles are created.

 {
    "name": "DeleteExistingLambdaImage",
    "actionTypeId": {
        "category": "Deploy",
        "owner": "AWS",
        "provider": "CloudFormation",
        "version": "1"
    },
    "runOrder": 2,
    "configuration": {
        "ActionMode": "DELETE_ONLY",
        "RoleArn": "arn:aws:iam::376337229415:role/CloudFormationRole",
        "StackName": "CodeDeployLambdaTweets"
    },
    "outputArtifacts": [],
    "inputArtifacts": [
        {
            "name": "CFTemplatesArtifact"
        }
    ],
    "region": "us-east-1"
}

Then we will deploy the lambda image using CloudFormation Action called DeployLambdaImage in this stage. In the configuration, we specify the OutputFileName and the outputArtifacts name, which we will pass to the next stage. The TemplatePath will reference the required template yaml in CFTemplatesArtifact

{
    "name": "DeployLambdaImage",
    "actionTypeId": {
        "category": "Deploy",
        "owner": "AWS",
        "provider": "CloudFormation",
        "version": "1"
    },
    "runOrder": 3,
    "configuration": {
        "ActionMode": "CREATE_UPDATE",
        "OutputFileName": "lambda-codedeploy-output",
        "RoleArn": "arn:aws:iam::376337229415:role/CloudFormationRole",
        "StackName": "CodeDeployLambdaTweets",
        "TemplatePath": "CFTemplatesArtifact::CodeDeployLambdaTweepy.yaml"
    },
    "outputArtifacts": [
        {
            "name": "LambdaDeployArtifact"
        }
    ],
    "inputArtifacts": [
        {
            "name": "CFTemplatesArtifact"
        }
    ],
    "region": "us-east-1"
}

In the final stage of CodePipeline, we will do a test invocation of the deployed lambda image. This will use a CodeDeploy action for invoking lambda. We will use the artifact from the previous stage as input. The configuration parameters set the function name and parameter values for invoking the lambda function i.e. we will stream tweets with the keyword Machine Learning for a duration of 10 secs.

{
    "name": "LambdaInvocationTest",
    "actions": [
        {
            "name": "LambdaStagingInvocation",
            "actionTypeId": {
                "category": "Invoke",
                "owner": "AWS",
                "provider": "Lambda",
                "version": "1"
            },
            "runOrder": 1,
            "configuration": {
                "FunctionName": "codedeploy-staging",
                "UserParameters": "{\"keyword\": \"Machine Learning\", \"duration\":10}"
            },
            "outputArtifacts": [
                {
                    "name": "LambdaInvocationArtifact"
                }
            ],
            "inputArtifacts": [
                {
                    "name": "LambdaDeployArtifact"
                }
            ],
            "region": "us-east-1"
        }
    ]
}

Now we can create the CodePipeline resource with the Source, Build and Deploy stages using the following command in cli.

$ aws codepipeline create-pipeline --cli-input-json file://cp-definitions/deploy-lambda-image.json

This should create the pipeline which should be visible in the console or via cli list-pipelines

aws codepipeline list-pipelines

The next section will configure our setup to be able to pull and push to code commit repository from our local machine. The CodeCommit respository that we have just created in CodePipeline is empty so we also need to copy the application code into the repository before running CodePipeline end to end.

Setting up a local repository

In this step, you set up a local repository to connect to your remote CodeCommit repository.This assumes using ssh keys installed on your machine. If not setup ssh keys already using ssh-keygen as described in the AWS docs. Upload your SSH public key to your IAM user. Once you have uploaded your SSH public key, copy the SSH Key ID. Edit your SSH configuration file named "config" in your local ~/.ssh directory. Add the following lines to the file, where the value for User is the SSH Key ID.

Host git-codecommit.*.amazonaws.com
User Your-IAM-SSH-Key-ID-Here
IdentityFile ~/.ssh/Your-Private-Key-File-Name-Here

Once you have saved the file, make sure it has the right permissions by running the following commands:

cd ~/.ssh
chmod 600 config

Clone the CodeCommit repository to your local computer and start working on code. You can get the ssh uri from the console under Clone URL for the CodeCommit repository. Navigate to a local directory (e.g. '/tmp') where you'd like your local repository to be stored and run the following command

$ git clone ssh://git-codecommit.us-east-1.amazonaws.com/v1/repos/deploy-lambda-image

Now we will copy all the files from the following folder
into the local directory you created earlier (for example, /tmp/deploy-lambda-image). Be sure to place the files directly into your local repository. The directory and file hierarchy should look like this, assuming you have cloned a repository named deploy-lambda-image into the /tmp directory:

/tmp
   └-- deploy-lambda-image
        ├── README.md
        ├── __init__.py
        ├── appspec.yaml
        ├── buildspec.yml
        ├── dockerfile
        ├── local_run.py
        ├── main_twitter.py
        ├── requirements.txt
        ├── secrets.py
        └── tweets_api.py

Run the following commands to stage all of the files, commit with a commit message and then push the files to the CodeCommit repository.

git add .
git commit -m "Add sample application files"
git push

The files you downloaded and added to your local repository have now been added to the main branch in the CodeCommit MyDemoRepo repository and are ready to be included in a pipeline.

Code Pipeline Execution

CodePipeline can be configured to trigger with every push to CodeCommit via EventBridge by defining an event rule to trigger CodePipeline when there are changes to the CodeCommit repository associated with the pipeline. This can be done from the console as detailed here and replacing the Arn for respective CodeCommit and CodePipeline resources respectively.

We can now commit the code in our local repository to CodeCommit. The will create a CodeCommit event which will be processed by EventBridge according to the configured rule, which will then trigger CodePipeline to execute the different stages as shown in the screenshots below.

For manual triggering, choose Release change on the pipeline details page on the console. This runs the most recent revision available in each source location specified in a CodeCommit Source action through the pipeline.

Once the pipeline has finished, we can check CloudWatch to see the invocation logs in the corresponding log stream. The main_twitter.handler calls the PutJobSuccessResult and PutJobFailureResult actions to return the success/failure of the lambda execution to the pipeline, which will terminate the LambdaInvocationTest stage with success or failure appropriately.