In order to get data from a DynamoDB table, you could either use scan or query.
Query finds items by their primary key or secondary index. An item's primary key could be partition key alone or a combination of partition key and sort key. I explained this in greater details in previous part of this blog.
Scan on the other hand return items by going through all items in the table. It first dumps the entire table and then filtering outputs by primary key or secondary index, just like query.
However, scanning process is slower and less efficient than query. It takes an extra step of dumping the whole database and going through all items.
DynamoDB by nature queries and scans by items' primary key. However, it allows more sufficient access to data from other attributes by secondary index. Essentially, you need to specify attributes that could be secondary indexes and run query or scan against them. End of story! 🥂
It is important to understand that secondary index is a data structure. It is associated with one and only base table where it gets data from.
This data is primarily a subset of attributes and an alternate key to support query and scan operations. We explicitly define which attributes will be projected (copied) from base table to the index as well as alternate key.
After an index is created, we could query or scan it similar to a typical table. DynamoDB actively maintains its secondary index. This synchronization happens when we modify (create, remove, update) items in base table.
There are 2 types of indexes that DynamoDB supports: local and global secondary indexes. Local secondary index has the same partition key and different sort key with its base table while global index has different sort key and different partition key. That's how local is different from global one in a nutshell.
As global secondary index has different partition key. Its data is stored in a different partition away from the base table while local secondary index shares the same partition with its base. Query in global index therefore could span across partitions, unlike in local one.
Generally, it is recommended to use global secondary index rather than local one. One exception is when you need strongly consistent read for your index. Local secondary index supports this consistency model while global one does not.
Secondary indexes consume storage and provisioned throughput. Thus, a good practice is to keep indexes minimal and could be done by:
- Project only necessary attributes that your queries or scans really need
- Attributes that are expected to be frequently fetched should all be secondary indexes to improve performance and save database resources.
- Be aware of item collection size limit if you are using local secondary index. In brief, size of all items in base table and its local indexes cannot exceed 10 GB.
AWS also provides a general guideline that is worth looking into.
- To get data from a DynamoDB table, it is recommended to use query over scan for better performance.
- Secondary index is a data structure. It copies attributes of items from base table and handles queries just like the base table does. Base table keeps secondary index in sync by updating the index when table data changes.
- We could use local or global secondary index. The former shares same partition key with base table while the latter has different partition key and sort key.
- Secondary index consumes database's storage and throughput. Therefore, good practices should be done to archieve better performance while keeping item collection size under limit.
That's the end of the blog. Thanks for reading 😃💪
The blog is published in my original blog site: https://blog.vinhlee.com