DEV Community

Cover image for AWS Data Exchange, Subscribe and create Database using Glue
Sudha Chandran B C
Sudha Chandran B C

Posted on

AWS Data Exchange, Subscribe and create Database using Glue

Here is how explored on how we can do Covid Data Analysis using AWS cloud services.

Today I'll write about part 1: Data Setup.

Solution overview

The following diagram illustrates the architecture of the solution.


The workflow is comprised of the following steps:

  1. Subscribe to a data set from AWS Data Exchange and export to Amazon S3
  2. Run an AWS Glue crawler to load product data
  3. Perform queries with Amazon Athena
  4. Visualize the queries and tables with Amazon QuickSight
  5. Run an ETL job with AWS Glue
  6. Create a time series forecast with Amazon Forecast
  7. Visualize the forecasted data with Amazon QuickSight

Step 1. AWS Data Exchange: Subscribe

  1. Go to AWS Data Exchange from AWS Console.
  2. Browse the Catalog and Search for Covid. You'll see few data sets are already available.
  3. I have selected "Enigma U.S. & Global COVID-19 Aggregation" which has global data.
  4. Subscribe to it, and after few minutes you'll see it listed under "My Subscriptions".
  5. On selecting it, you'll see Revisions and when it is updated.

Screenshot 2020-08-06 at 5.18.25 PM.png

Step 2: AWS Data Exchange: Export to S3

  1. You can click on the Revision ID.
  2. Select the data set you want and click on export it to S3.
  3. You will see a new Job created and will be shown as Completed after export is done.

Screenshot 2020-08-06 at 5.20.23 PM.png


Step 3: Amazon S3: Verify the data

Screenshot 2020-08-06 at 5.23.44 PM.png

Step 4: AWS Glue

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics.

Now that you have successfully exported the enigma covid data sets into an Amazon S3 bucket, you create and run an AWS Glue crawler to crawl your Amazon S3 bucket and populate the AWS Glue Data Catalog. Complete the following steps:

  1. On the AWS Glue console, under Data Catalog, choose Crawlers.
  2. Choose Add crawler.
  3. For Crawler name, enter a name; for example, covid-data-exchange-crawler. Screenshot 2020-08-06 at 5.33.30 PM.png
  4. For Crawler source type, choose Data stores.
  5. Choose Next.
  6. For Choose a data store, choose S3.
  7. For Crawl data in, select Specified path in my account.
  8. The crawler points to the following path: s3:///.
  9. Choose Next. Screenshot 2020-08-06 at 5.34.50 PM.png
  10. In the Choose an IAM role section, select Create an IAM role. This is the role that the AWS Glue crawler and AWS Glue jobs use to access the Amazon S3 bucket and its content.
  11. For IAM role, enter the suffix demo-data-exchange.
  12. Choose Next. Screenshot 2020-08-06 at 5.36.28 PM.png
  13. In the schedule section, leave the Frequency with the default Run on Demand.
  14. Choose Next.
  15. In the Output section, choose Add database.
  16. Enter a name for the database; for example, covid-db.
  17. Choose Next, then choose Finish. Screenshot 2020-08-06 at 5.37.54 PM.png
  18. This database contains the tables that the crawler discovers and populates. With these data sets separated into different tables, you join and relationalize the data.
  19. In the Review all steps section, review the crawler settings and choose Finish.
  20. Under Data Catalog, choose Crawlers.
  21. Select the crawler you just created.
  22. Choose Run crawler. Screenshot 2020-08-06 at 5.40.53 PM.png
  23. The AWS Glue crawler crawls the data sources and populates your AWS Glue Data Catalog. This process can take up to a few minutes. When the crawler is finished, you can one table added to your crawler details. See the following screenshot. Screenshot 2020-08-06 at 5.44.56 PM.png
  24. You can now view your new tables.
  25. Under Databases, choose Tables.
  26. Choose your database.
  27. Choose View the tables. The table names correspond to the Amazon S3 folder directory you used to point your AWS Glue crawler. See the following screenshot. Screenshot 2020-08-06 at 5.46.22 PM.png

Next Step: To Query data in Athena and Visualise in QuickSight!

Thank you for Reading 😊

Top comments (0)