Almost 2 years ago, I started experimenting with QuickSight to solve some of the BI issues of the company I was working for. I appreciated QuickSight's first-class integration with many AWS data services and low cost in comparison to other similar tools. It afforded us the ability to rapidly prototype analyses and dashboards. One glaring missing feature that left us scratching our heads was the lack of DynamoDB has a data source option.
I asked the StackOverflow hive-mind how to Visualize DynamoDB in AWS Quicksight on StackOverflow and it has become the most upvoted QuickSight question on the platform because there is a demand for this feature and there was no direct answer... until recently.
Most of the work-around solutions proposed involved exporting (i.e. duplicating) your DynamoDB data to a different repository such as S3 or RDS that could then be added as a data source in QuickSight. We ended up creating scheduled Glue jobs that would move DynamoDB data to S3. The S3 data was then crawled via AWS Glue Crawlers and exposed as AWS Athena tables which were then added as Quick Sight data sets. This worked but was more custom infrastructure than was desirable and also didn't allow for real time direct queries.
In late 2019, AWS announced you could query any datasource with Amazon Athena's new federated query feature.. This was cool and they even showed in diagrams the concept of querying DynamoDB but this required us to develop and maintain our own implementation of this connector and also this feature was only in Preview and AWS QuickSight integration was not updated to allow the usage of this new Athena feature.
In March 2020, I saw that AWS Athena announced a prebuilt Data Connector for DynamoDB. This was exciting as I was able to quickly setup a data connector in Athena that could actually view and query DynamoDB data without any custom code but this feature was still in Preview and there was no support in QuickSight yet.
Fast forward, it's a year later and Athena Data Connectors have been become GA with the Athena Engine Version 2 in a handful of regions and the integration features in QuickSight that allow you to select the required Athena work groups and data sources are present.
This should be enough to make things work until DynamoDB becomes a first-class data source in Amazon QuickSight.
Using the Athena Data Connectors as part of Athena Engine Version 2, I was able to finally visualize DynamoDB data in QuickSight without creating any bespoke resources or duplicating the data to another data source.
- You have data in DynamoDB tables that you want to visualize in QuickSight
- You have access to a user/role in your account with IAM access
- You have a bucket created that can be used to store data Athena query results and data connector spill data.
- If this going to be used in production with large queries, I'd recommend adding S3 lifecycle policies to this bucket to ensure this bucket doesn't grow uncontrollably.
- You are using a region that supports both
If you don't already have an Athena workgroup that uses Athena Engine Version 2, this needs to be created.
At the time of this writing, the default primary workgroup uses Athena Engine Version 1 but they also indicate that these workgroups will be automatically upgraded at some point in the future.
Ain't nobody got time for AWS to upgrade so we need to go ahead (create, if needed) and switch to a workgroup that uses Athena Engine Version 2.
- In the Athena console, select the workgroups tab.
- Ensure that a workgroup exists that is "Athena Engine version 2" and switch to it.
- Most likely, you will not have one and you will need to create one by clicking the "Create Workgroup" button. You can pretty much use the defaults for everything such ensure you select "Manually choose an engine version now." and ensure "Athena engine version 2 (recommended)" is selected when creating the workgroup.
Once we an Athena Engine version 2 workgroup created, let's go ahead and create our DynamoDB Athena Data Connector.
- Navigate to the Data Sources tab of the Athena console and choose "Connect data source" button.
- In first step of the Data Sources wizard, select the "Query a data source" option, then select "Amazon DynamoDB", then click the next button.
The second and final step, AWS wants you to specify the connector lambda. This does not exist yet so we need to deploy it by clicking the "Configure new AWS Lambda function" button.
- This button opens a new window to the Lambda console to deploy the prebuilt AthenaDynamoDBConnector application.
- Under Application settings, at a minimum to need to provide values for
AthenaCatalogNameparameters as they do not have any defaults.
- Click Deploy
- After the the Lambda is deployed, head back to your Athena console window.
Now that our AthenaDynamoDBConnector function is deployed, click the refresh icon next to the "Choose Lambda function" dropdown list and you should now see the newly deployment lambda function. Select your function, give catalog a name, and click the "Connect" button.
The Athena Data Connector works by invoking a Lambda to query and return DynamoDB data. Therefore we need to give QuickSight's service role permissions to invoke the Lambda function.
- Navigate to the IAM Console and select "Roles"
- Find and click the QuickSight service role. The role name should begin with something like "aws-quicksight-service-role", i.e. "aws-quicksight-service-role-v0".
- Click "Attach Policies"
- Select the "AWSLambdaRole" and click "Attach Policy"
Now, let's get this working with QuickSight!
- Open QuickSight and navigate to the Data Sets menu.
- Click "New Dataset" and select Athena as data source.
- Select the workgroup you created previously that uses the Athena Engine version 2.
- Now select the catalog that we created for the DynamoDB connector in the previous steps.
- If you get a "You don’t have sufficient permissions to connect to this dataset or run this query." error, this is most likely because you did not properly add Lambda Invoke permissions to the QuickSight role. See the step above to ensure this is setup.
- Now choose if you want to cache the data via SPICE or directly query the data. Directly querying will give you the ability to see real-time data in DynamoDB but the performance and cost may suffer since it is not cached.
- You can add additional DynamoDB table data sets and continue to reuse the Athena Engine V2 data source you created in Step 2 above.
Now that we have a DynamoDB data set (via Athena and the DynamoDB data connectors) created, we can finally visual DynamoDB data via analyses and dashboards.
I am just using a tiny sample DynamoDB dataset for this example so it's not the most interesting visualization but hopefully you get the idea!
It's been a long time coming but glad it's finally possible to get DynamoDB data into QuickSight without custom resources and duplicate data. This is just one example of the many data sources that can now be more easily be added via the Athena Engine Version 2 data connectors and the new ability to choose Athena workgroups and Athena catalogs in QuickSight.
I'm sure this article will eventually be obsolete sooner rather than later as technology changes and Amazon continues to release new features but it was a good exercise to explore some of these new features.
Please feel free to drop me a line if this was helpful or if I you have any suggestions on thing that could be improved or added.