DynamoDB Basics, Part 1

#aws #dynamodb #nosql

Along my journey to the cloud, I have discovered amazing services, one of them is DynamoDB. It is NoSQL Database and serverless service with capabilities to store information semi-structured or unstructured with high performance, availability, and scalability.

DynamoDB is an OLTP database that is focused on transaction-oriented tasks, today many of the applications or projects we are working on are to automatize tasks and transactional processing where need CRUD operations over a high volume of data, DynamoDB fits very well for these business cases, and you can use it instead SQL databases. I invite you to read Alex DeBrie's post about The What, Why, and When of Single-Table Design with DynamoDB to know in which cases you can use or not DynamoDB.

One of the big differences between DynamoDB and traditional SQL database is the number of tables, with one single table of DynamoDB you can handle any data model with entities, relationships (1:N, N:N, etc), and attributes then the queries in a database is different also, DynamoDB is a Key-Value database, to request information you should use a partition key, the partition determines the physical storage where data is. We need to learn a new way to design a data model for DynamoDB but before seeing the methodology we can start with the basics of DynamoDB.

When we face new technologies, we need to understand the problem solved, capabilities, boundaries, elements, and the reason why were created. I’m going to use some SQL database concepts to explain DynamoDB because there are similar things, and we can learn more easily by linking new knowledge, let’s start.

Table

Like an SQL database table, DynamoDB table is to store information. AWS replicates the data of the table in different AZ of the region, you can set up a global table to replicate the information in another AWS region. You can create many DynamoDB tables, but you don't need a table per entity as SQL database does, one single DynamoDB table can handle all the entities and relations in a single table.

Primary Key & Attributes

It should be unique and it is the value to identify the record in the table. It could be a single key with the Partition Key (PK) or a composite key with Partition Key + Sort Key (SK). When using a composite primary key you can share the PK in multiple items but the SK should be different in each record.

An attribute is a data element with type and value to store single or composite information like a list or map with data structures like JSON. DynamoDB is schema-less so each record could have different attributes. There is no limit to the number of attributes in one record but the size of the record cannot be higher than 400KB. The size of the record includes the attribute name and value.

Item & Item Collection

It is a record with a collection of attributes. It is like a row of SQL database table, but the structure is more flexible because you can have items with different attributes in the same table rather than rows with the same columns in the SQL table. The maximum item size in DynamoDB is 400 KB, which includes both attribute name and attribute value lengths both with binary length.

All items with the same partition key are part of an item collection. To get items collection is necessary to use the composite primary key and the sort key should be unique. No item collection can exceed 10 GB, so it's possible to run out of space for a particular partition key value.

In the image we have Customer and Order items, both share the partition key so the Item Collection has both entities.

Partition

DynamoDB stores data in partitions, each partition is like a node in a cluster. DynamoDB uses the partition key value and hash function to choose the partition to write and read the data. A partition is an allocation of storage for a table, backed by solid state drives (SSDs) and automatically replicated across multiple Availability Zones within an AWS Region. Partition management is handled entirely by DynamoDB, you never have to manage partitions yourself. A partition can store up to 10 GB of data.

Partition Key

It is an attribute and is used by DynamoDB to store the data in a specific partition and distribute write and read operations load between the partitions. DynamoDB uses a hash function with a partition key value to select the partition. It is a mandatory attribute to create a DynamoDB table and execute write and query operations over the DynamoDB table.

A good practice is using a partition key with many possible values, for example, the personal identification number is a good choice because each person has his own identification and there are many persons with identifications. Another good practice is to use PK as the partition key attribute name because it is general to all entities going to store in a single DynamoDB table.

Sort Key

It is an attribute and is used by DynamoDB to order the data inside a partition. It is an optional field, you can create a DynamoDB table without a sort key but is a good practice to use it to sort the data. Uses SK as a sort key name because it is general to all entities going to store in a single DynamoDB table.

DynamoDB uses UTF-8 character encoding to order data, If the data type of the sort key is Number, the results are returned in numeric order, otherwise, the results are returned in order of UTF-8 bytes. By default, the sort order is ascending. To reverse the order, set the ScanIndexForward parameter to false.

Local Secondary Index (LSI)

It is a table extension of the base table that contains a copy of some or all of the attributes from its base table. You can do read operations over the local secondary index, and the write operation only over the base table because DynamoDB projects the items from the base table to the local secondary index.

You can create up 5 local secondary index per DynamoDB table and only can set up these when you are creating the base table with a composite primary key (partition key and sort key). A local secondary index maintains a partition key from the base table and an alternate sort key, this is helpful when you need to sort the items with more dimensions, for example, sort the CUSTOMER by Name, created date, score, and more.

In this example, the Order item has a timestamp attribute that is the Sort Key of the Local Secondary Index, Dynamo replicates the information from Base Table to Local Secondary Index, the Partition Key is the same, Sort Key is the timestamp and the SK attribute is the Sort Key of the Base Table.

Global Secondary Index (GSI)

It is similar to Local Secondary Index, you can create it when you create the base table or after it and set up Partition Key and Sort Key. You can create up to 20 Global Secondary Index and only execute read operations, the write operation only over the base table because DynamoDB projects the items from the base table to the global secondary index when the item is created with the attribute declared as partition key and sort key(optional) of the index.

In this example, the Base Table has GSI1PK and GSI1SK attributes for the Partition Key and Sort Key of the Global Secondary Index, DynamoDB replicates the information from Base Table to Index which is the Order record, the PK and SK of the Base Table are attributes for the Index.

In this post, we cover the main concepts of DynamoDB, Table, Primary Key, Attributes, Partition, and Index, in the next post DynamoDB Basics Part 2, we are going to see the Read and Write operations, Streams, Global Table, and more about DynamoDB.