Vinh Le

Posted on Jun 3, 2020

Scan and Query in DynamoDB

#aws #database #dynamodb

Scan vs. Query

In order to get data from a DynamoDB table, you could either use scan or query.

Query

Query finds items by their primary key or secondary index. An item's primary key could be partition key alone or a combination of partition key and sort key. I explained this in greater details in previous part of this blog.

Scan

Scan on the other hand return items by going through all items in the table. It first dumps the entire table and then filtering outputs by primary key or secondary index, just like query.

However, scanning process is slower and less efficient than query. It takes an extra step of dumping the whole database and going through all items.

We could improve scan performance by pagination as well as parallel scan. Nonetheless, it is still recommended to use query or BatchGetItem over scan.

Secondary index

A different data structure

DynamoDB by nature queries and scans by items' primary key. However, it allows more sufficient access to data from other attributes by secondary index. Essentially, you need to specify attributes that could be secondary indexes and run query or scan against them. End of story! 🥂

It is important to understand that secondary index is a data structure. It is associated with one and only base table where it gets data from.

This data is primarily a subset of attributes and an alternate key to support query and scan operations. We explicitly define which attributes will be projected (copied) from base table to the index as well as alternate key.

After an index is created, we could query or scan it similar to a typical table. DynamoDB actively maintains its secondary index. This synchronization happens when we modify (create, remove, update) items in base table.

Local vs. Global

There are 2 types of indexes that DynamoDB supports: local and global secondary indexes. Local secondary index has the same partition key and different sort key with its base table while global index has different sort key and different partition key. That's how local is different from global one in a nutshell.

As global secondary index has different partition key. Its data is stored in a different partition away from the base table while local secondary index shares the same partition with its base. Query in global index therefore could span across partitions, unlike in local one.

When to use what

Generally, it is recommended to use global secondary index rather than local one. One exception is when you need strongly consistent read for your index. Local secondary index supports this consistency model while global one does not.

Indexing strategy

Secondary indexes consume storage and provisioned throughput. Thus, a good practice is to keep indexes minimal and could be done by:

Project only necessary attributes that your queries or scans really need
Attributes that are expected to be frequently fetched should all be secondary indexes to improve performance and save database resources.
Be aware of item collection size limit if you are using local secondary index. In brief, size of all items in base table and its local indexes cannot exceed 10 GB.

AWS also provides a general guideline that is worth looking into.

TL;DR

To get data from a DynamoDB table, it is recommended to use query over scan for better performance.
Secondary index is a data structure. It copies attributes of items from base table and handles queries just like the base table does. Base table keeps secondary index in sync by updating the index when table data changes.
We could use local or global secondary index. The former shares same partition key with base table while the latter has different partition key and sort key.
Secondary index consumes database's storage and throughput. Therefore, good practices should be done to archieve better performance while keeping item collection size under limit.

Resources

Best Practices for Querying and Scanning Data - Amazon DynamoDB

Improving Data Access with Secondary Indexes - Amazon DynamoDB

General Guidelines for Secondary Indexes in DynamoDB - Amazon DynamoDB

indexing - Difference between local and global indexes in DynamoDB - Stack Overflow

That's the end of the blog. Thanks for reading 😃💪
The blog is published in my original blog site: https://blog.vinhlee.com

DEV Community

Scan and Query in DynamoDB

Scan vs. Query

Query

Scan

Secondary index

A different data structure

Local vs. Global

When to use what

Indexing strategy

TL;DR

Resources

Top comments (0)

Read next

Effective Testing of Cloud Components with Quarkus and LocalStack

Live Streaming from Unity - Real-Time Playback (Part 7)

Deploy serverless Lambda TypeScript API with function url using AWS CDK

Data API for Amazon Aurora Serverless v2 with AWS SDK for Java - Part 5 Basic cold and warm starts measurements