DEV Community

Cover image for Architecting for Efficiency: Building Robust DynamoDB Models for Real-World Use Cases
Shaista Aman Khan
Shaista Aman Khan

Posted on • Edited on

Architecting for Efficiency: Building Robust DynamoDB Models for Real-World Use Cases

Solutions Architect πŸ§‘β€πŸ’»πŸ‘· are tech doctor? 🩺

I usually listen to tech podcasts and events. I talk to numerous customers and give them the best possible solution which is efficient and within their budget. Just like a doctor listens to your symptoms, takes tests, and diagnoses the illness, a Solutions Architect dives deep into your business needs. I gather requirements, analyze pain points, and understand the goals.

While working on a project I got the chance to introduce myself to AWS Serverless services mainly Lambda, API Gateway, Step Function, Dynamodb, and more. Cloud's true power lies in customization and to get more value you have to architect for efficiency and cost-effectiveness.

In this blog, I will be referring to

Ref Video

This video is about advanced data modeling with DynamoDB. The speaker, Alex Debries, is an AWS data hero and an expert in DynamoDB. He has written a book on DynamoDB and has spoken at several conferences about it. At Reinvent 2023, he effectively presented advanced data modeling tips for a real-world use case.

DynamoDB Basics:

  • Key terms: tables, items, primary keys, and attributes
  • Example of an airline application using DynamoDB
  • Different types of primary keys: simple and composite Item collections.

Designing for DynamoDB:

  • Understanding how DynamoDB stores data under the hood
  • Knowing your access patterns: read and write patterns
  • Thinking about constraints and data distribution
  • Choosing the right item size

Advanced Data Modeling Patterns:

  • Booking a flight with DynamoDB: right patterns and constraints
  • Handling complex filtering with external systems
  • Integrating DynamoDB with other tools: Elasticsearch, S3

Let's dive into some Dynamodb basics:

DynamoDB is a NoSQL database offered by Amazon Web Services. A NoSQL database is a database that doesn't have a fixed schema, unlike relational databases. This means that you don't have to specify the data types for each attribute upfront.

A DynamoDB table is a collection of items. Each item is a collection of attributes. An attribute is a name-value pair. The primary key is a unique identifier for an item.

There are two types of primary keys: simple and composite.

A simple primary key has a single element, a partition key. A partition key is used to determine how data is partitioned across DynamoDB's servers.

A composite primary key has two elements, a partition key and a sort key. A sort key is used to further partition data within a partition.

Let's use an example of an airline application. In this example, the partition key would be the customer ID and the sort key would be the flight ID.

Let's see item collections. Item collections are a set of records that have the same partition key in a table with a composite primary key.

Here are the steps on how to achieve item collections in the AWS console:

  • Create a table with a composite primary key. Like Customer_ID as the partition key and Flight_ID as the sort key.
  • Use the partition key to query for items. You can run a query in AWS Console to see the results.
  • The results will include all items that have the same partition key, regardless of the sort key value.

Let's deep dive into the details:

1. Understanding Data Storage Under the Hood:

Image description

Partitions: DynamoDB partitions data based on the partition key (part of the primary key). Items with the same partition key are stored together in a partition.

**Distribution: **Partitions are distributed across multiple servers for scalability and availability.

Read/Write Capacity Units (RCUs/WCUs): You provision RCUs and WCUs to handle expected read/write traffic.

Example:

  • Table: CustomerOrders
  • Primary Key: CustomerID (partition key)
  • Data Distribution: Orders for each customer are stored together in a partition.
  • RCUs/WCUs: Allocate based on expected order volume.

2. Knowing Your Access Patterns:

Identify common queries and updates: This dictates table structure and indexing.
Consider read/write frequency and volume: Allocate RCUs/WCUs accordingly.

Example:

Image description

  • Frequent query: Retrieve all orders for a specific customer.
  • Table structure: Ensure CustomerID is the partition key.
  • Indexing: Create a Global Secondary Index (GSI) if you need to query by other attributes.

3. Thinking About Constraints:

Item size limit: 400 KB per item.
Attribute size limit: 400 KB per attribute.
10 GB per partition: Distribute data evenly to avoid "hot" partitions.

For more indepth understandin on this, see DynamoDB Core Concept Interview Challenge by Kobe

Example:

  • Large orders: Store order details in a separate table or S3, referencing them in the main table.

4. Choosing the Right Item Size:

Balance granularity and performance: Smaller items generally mean faster reads/writes.
Denormalize data if needed: Combine related data for frequent access.

Example:

  • Customer information: Store address, contact details, etc., within the customer item for frequent retrieval.

Additional Considerations:

GSIs: Create secondary indexes for flexible querying, but be mindful of additional costs and write overhead.
Data modeling best practices:
Use composite primary keys (partition key + sort key) for efficient retrieval and sorting.
Consider single-table design for related data.
Use GSIs judiciously.
Monitoring and optimization: Track performance and adjust as needed.

Remember:

DynamoDB is a powerful, scalable NoSQL database, but careful design is crucial for optimal performance and cost-efficiency.
Understanding these key concepts will guide you in creating effective DynamoDB data models.

Moving on to the next level and applying what has been discussed so far:

Advanced Data Modeling Patterns in DynamoDB:

Here's a step-by-step breakdown of the data access patterns, using the example of booking a flight:

1. Booking a Flight with DynamoDB:

a. Right Patterns and Constraints:

Tables:

  • Flights: Stores flight details (flight ID, origin, destination, date, etc.).
  • Bookings: Records passenger bookings with references to flights and users.
  • Users: Stores user information.

Primary Keys:

  • Flights: FlightID (simple primary key).
  • Bookings: BookingID (simple primary key).
  • Users: UserID (simple primary key).

Secondary Indexes:

  • Global Secondary Index (GSI) on Bookings for FlightID to efficiently query bookings for a specific flight.

Constraints:

  • Limit Flight size to avoid exceeding the item size limit.
  • Use timestamps for booking creation and modification for filtering.

b. Booking Process:

  • User searches for flights using flight details.
  • Query the Flights table by desired criteria.
  • User selects a flight and books it.
  • Create a new item in Bookings with references to user and flight.
  • Update user availability in the Users table.

2. Handling Complex Filtering with External Systems:

Scenario: Users search for flights with complex criteria (price range, multiple connections, etc.).

Solution:

  • Store basic flight data in DynamoDB for fast querying.
  • Offload complex filtering and aggregation to an external system like Elasticsearch.
  • Use DynamoDB as the source of truth for booking data.
  • Integrate your application with Elasticsearch to retrieve filtered results and book flights in DynamoDB.

3. Integrating DynamoDB with Other Tools:

a. Elasticsearch:

Image description

  • Use it for complex search and aggregation of flight data with advanced features like faceting and geospatial searches.
  • Keep basic flight data in DynamoDB for fast lookups and booking.
  • Synchronize data between DynamoDB and Elasticsearch for consistency.

b. S3:

Image description

  • Store large files like images, PDFs, or flight logs related to bookings.
  • Use DynamoDB to store references to files in S3.
  • Leverage S3's scalability and cost-effective storage for large data.

Remember:

  • Choose the right patterns based on your data access patterns and complexity.
  • Leverage external systems when DynamoDB alone cannot handle specific tasks.
  • Maintain data consistency and synchronization between different tools. These patterns could be refined based on your specific requirements and application needs.

It is recommended to watch two other DynamoDB talks at Reinvent 2023: DAT329 and DAT330 for in-depth knowledge of the underlying architecture of Dynamodb.

Top comments (0)