DEV Community

Cover image for Public Health Data Pipeline with CDK CI/CD
patrickbreen
patrickbreen

Posted on

Public Health Data Pipeline with CDK CI/CD

Introduction

Hello, I'm new to AWS, and as part of my learning I wanted to do some small devops projects. I thought I might as well share the project via blog too.

I decided I would follow along with the acloudguru challenge https://acloudguru.com/blog/engineering/cloudguruchallenge-python-aws-etl. Every month is a new challenge. A description of the challenge is given, and then you're free to implement it in AWS. One of the specific things I'm trying to learn and get familiar with is CI/CD pipelines such as AWS codepipeline. This month's challenge is a covid data tracker that tracks the number of infections, deaths, and persons recovered in the United States. One of the notable pieces of this project is that there is lots of automation behind a seemingly simple end product. There is code that automatically keeps the data in the chart as fresh as possible, and code that keeps the entire project itself continuously deployed with the latest version of the code (more on that later). Here is a picure of the dashboard that I will generate the code for in the rest of this article:

Alt Text

CDK, Cloud Development Kit, keeping this project continuously up-to-date

In late 2019 AWS launched Cloud Development Kit (CDK) (https://aws.amazon.com/cdk/) which is the newest, and perhaps best way of pro grammatically declaring cloud components. These components can be infrastructure, code, or, deployment pipelines.

This is my third small CDK development project, and it's still rough since I'm learning. There are a few things that I would have done differently, including renaming things, but I think that this project is close enough to blog-ready for my standards.

All of the code can be found in my public repository on github (https://github.com/patrickbreen/ACGChallenge2). If you scroll to the bottom of that page, you'll find the directions to deploy and access this project:

Alt Text

At the time of publishing this article, you should be able to copy and paste the URL to load the dashboard that I'm hosting in my AWS account (https://cdk-s3-static-website-blog-pb-2.s3.amazonaws.com/dashboard.html). If it doesn't work, weeks or months from now, it may be because I've removed this deployment. The dashboard uses charts.js to read data from my database in AWS. Fresh data is loaded into my AWS database from The New York Times, and John Hopkins University data sources each day. That means that my chart always stays no more than about a day behind the latest covid data.

How it works

Here is a high level architecture description of the project.

Alt Text

Everything is configured within in the code repository itself. If you were to clone this repository into a repository of your own, you would only need to change the CODECOMMIT_REPO_NAME variable in app.py to the name of your repository. Then, deploy the pipeline with the following terminal command: cdk deploy PipelineDeployingInfraStack. You will wait for it to run for a minute, then select (y) to confirm that you want to create and deploy the pipeline. This is the only manual step required! After running that one command, a pipeline is created. You can watch this pipeline work through the process of building and deploying the rest of the infrastructure and code.

Alt Text

Once everything is green, the infrastructure is deployed. Any further change to your codecommit repository will automatically trigger a new build and deployment.

The future of this blog

There is a lot of explanation needed to fully explain each of the concepts presented here, many of which I haven't mentioned in any detail. This is my first blog article, and I plan on doing about one per month, and I will improve on balancing effective communication while staying concise as I continue with more articles.

Top comments (0)