12 Common Misconceptions about AWS DynamoDB

#dynamodb #serverless #aws #nosql

It's NoSQL database so it's not suitable for relational data

A common misconception I often hear about DynamoDB is that because it's NoSQL and does not support JOINs like traditional relational databases, it is not suitable for relational data. Well, modeling relationship is perfectly doable in DynamoDB.

he two most common approaches use AWS Amplify, which provides all AWS resources, including tables and resolvers, and a single-table design, which uses only one table to fit all data entities into one container and smart keys composition. While I highly recommend the former to get started and for smaller projects, the latter is more "professional" and officially recommended by AWS. If you want to learn more about single-table design, you can learn about it from Alex's DeBrie book about DynamoDB.

In fact, DynamoDB is suitable for almost all types of data. It makes a perfect key value store, metadata store, relational database, event store (e.g. in Event Sourcing) and transactional data store - thanks to transactions support.

DynamoDB is slow

Anoher argument is about the speed. Many times I've heard that developers can fetch data from their relational databases in less than 1ms! And DynamoDB? Same operation takes 10ms or even 20ms, it's too slow, right?

Not really. These scenarios are often based on an oversimplified setups where the speed is measured by fetching one row by an indexed field and on a beefy machine without traffic, erratic spikes and a myriad of other factors. Reality is often a lot messier, especially at scale. As your relational database starts getting more and more traffic, you'll encounter slowdowns related to the load on the machine caused by other operations and processes, connection pool exhaustions, transaction conflicts, and so on.

What about DynamoDB in such conditions? The performance is always the same. No matter if you're sending 1 request per second, or 1,00,000 requests per second, DynamoDB (if data model has been architected correctly), behaves great, sometimes even better under heavy load.

Source: Amazon DynamoDB auto scaling: Performance and cost optimization at any scale

DynamoDB is expensive

Comparing to traditional, non-managed databases, it's much cheaper to scale. Actually, DynamoDB costs are super predictable and directly proportional to the usage. In traditional, non-managed databases, the TCO (Total Cost of Ownership) is much more non-linear and has a lot of hidden costs that might be not visible at first glance:

At the beginning, on a small scale, DynamoDB costs are close to zero since if there's no traffic, there are no costs. Moreover, AWS offers quite generous Free Tier so you might even go to the production with zero database costs. In a non-managed database, you need to provision a VM/Machine/Instance with at least 1 vCPU. That's a hard cost you cannot skip and you're paying it even if your database is completely unused.

Later on, your provisioned database might outperform DynamoDB in terms of costs but at some point you'll encounter a point where current machine will be not enough. You will find yourself in a constant struggle of bumping the specification, investigating transient and hard to debug problems, or tweaking not-so-well documented variables. At some point, you might even consider hiring a dedicated specialist with a hefty pay to deal with these problems for you.

At the same time, DynamoDB requires zero supervision or fine-tuning. It simply works. And while that sometimes it might seem more expensive, you're actually saving a lot of money (and time!) by not having to worry about operational issues, managing backups, and having constant throughput and 99.99% SLA.

Remember, always compare TCO, not just pure cost of running VMs/machines/containers.

DynamoDB is schemaless

That's actually not a misconception. Although DynamoDB is giving users freedom of items shape - they don't have to conform to any schema since DynamoDB is schemaless, lack of any validation or convention might lead to a huge mess.

However, if you're going to design your data model according to single-table design, you're going to have a really strict set of usable access patterns which essentially will become your schema.

DynamoDB is hard to use

Every technology requires some investment and DynamoDB is no exception. However, because DynamoDB API is very minimal (DocumentClient has only 11 operations), the scope of the material to become an expert might be a lot smaller compared to learning PostgreSQL or Elasticsearch. Furthermore, there are books about just running and maintaining these two databases. In DynamoDB that problem simply does not exist. It's ran by Amazon for you and you don't need to worry about it.

You might also argue that lack of SQL support is a blocker. Up until recently, without support for PartiQL, this statement was true but right now, you can use it. However, if you want to use native query language, we've prepared a visual query builder which might help you getting started.

Moreover, the ecosystem of DynamoDB is constantly growing. There are more and more tools providing useful abstractions like DynamoDB Toolbox by Jeremy Daly for working with single-table design, Dynamoose for ORM-like experience, AWS Amplify which hides the DynamoDB layer from the programmer completely, or Dynobase which helps you in navigating between profiles, regions, tables and exploring datasets.

Lastly, DynamoDB integrates with the rest of the AWS ecosystem seamlessly reducing the amount of code you'd need to create. Mechanisms like TTL, Point in Time recovery, streaming, global replication are built into it. Your responsibility is just to use them, not author them.

Cannot use SQL with DynamoDB

This statement was true until some time ago. Recently, AWS announced PartiQL support for DynamoDB. With PartiQL, an SQL-like language, you can use familiar language to interact with DynamoDB and other AWS' components like Athena.

Keep in mind that using SQL is not removing DynamoDB's technical limitations. You still need to understand that your SELECT * FROM ... queries will be translated to scans or queries, that things like UPDATE X WHERE Y probably is not going to work, or that COUNTing is also not permitted.

Only suitable for Serverless workloads

DynamoDB is very often being composed into Serverless-based architectures. In fact, it is a perfect match because just like AWS Lambda, S3, and other managed services, it is billed for the actual usage of it - per amount of read / write operations and gigabytes of disk space your tables consume.

That being said, it is not blocking you from using DynamoDB with non-Serverless workloads. Tables can be accessed from any environment, including EC2 instances, CI/CD systems, VMs, containers, on-premises or even your local machine. All you need to is to have a valid AWS identity with credentials scoped to allow accessing it. With that in place, you can use CLI or SDKs to interact with DynamoDB.

DynamoDB is hard to manage

Totally untrue. No, it's all managed for you. Once you click "Create Table" in the AWS Console, you get a highly available, redundant, scalable, encrypted both at rest and in transit, SOC, PCI and HIPAA compliant database within seconds. Imagine how hard that would be to achieve the same result using on-premise software.

But, wait there's more. With DynamoDB, you can enable cross-region replication with sub-second latency, start streaming data from it to other data sources or data processors, or enable PITR (Point in Time Recovery) with a single API call. How cool is that? And once again, how much harder that would be using non-managed software?

Essentially, DynamoDB allows you to build product on a shoulders of giants allowing you to focus on what's important for your business, on your core competency. Friends don't let friends run databases on instances/VMs/containers/on-premises.

DynamoDB is secure by default

Just like any other database or service, DynamoDB is just a tool. While it has been built with the best security principles in mind, including encryption, IAM, working backups, it's up to you how you're going to use it. Shared responsibility model is key here.

While Amazon ensures the best "security of the cloud", you're still responsible for the "security in the cloud". No one is going to prevent you from storing passwords in plaintext, using wildcard IAM policies, leaking your AWS identity credentials or violating laws and regulations.

DynamoDB actually makes the security of your database easier. Because there are no servers, containers, VMs or clusters to manage, you don't have to worry about restricting network access correctly, patching software regularily, or being up-to-date with latest vulerabilities.

DynamoDB cannot be used for serious projects

That's simply not true. There's a number of really serious companies using DynamoDB in production including Duolingo, an online learning site, uses DynamoDB to store approximately 31 billion data objects on their web server, Nike, who ditched their Cassandra cluster in favor of DynamoDB, or Disney that uses DynamoDB to store metadata of billions of customer actions.

DynamoDB is also dogfood by Amazon internally - it is a Tier-0 service powering most of Amazon. It is also highly advised for all new initiatives at Amazon to use DynamoDB as the database.

DynamoDB is suitable only for big projects with huge traffic

False! Thanks to free tier, you can start using DynamoDB completely for free. After that, the usage costs are scaling as you go. The more clients you get, the more money you earn, the bigger DynamoDB costs are and everything is proportional. Moreover, there's no initial commitment like in other non-managed databases where the entry instance starts with 10USD / month.

DynamoDB is bad for analytics

The last point is a bit more complicated. It is true that because you cannot run arbitrary queries against DynamoDB tables (technically speaking you can but it's extremely inefficient due to Scan nature), you cannot perform ad-hoc reports of your data. That might be disappointing. However, there are solutions to that:

You can stream your data from DynamoDB to Redshift or some other relational database built for reporting and analytics purposes
You can build aggregation functions on your own. These functions plugged to the DynamoDB Streams will be automatically recalculating your composite statistics or rollups providing always the most up-to-date state of some aggregate.