DEV Community

Nicola Apicella
Nicola Apicella

Posted on • Edited on

DynamoDB or Aurora/RDS? Should we always use DynamoDB?

A few days ago, I ended up discussing with some friends
about databases.

We discussed about DynamoDB vs managed relational database (like RDS and Aurora) and which one to pick in use cases where both could work.
One of the ideas which bounced around the room was to always use DynamoDB even when a relational database would be a better fit.
To my surprise, I was the only one to disagree with the statement and my arguments were not convincing enough to inspire in them second toughs.

After my unsuccessful attempt to convince them, I set myself to better understand their argument and to elaborate mine.

The argument pro using DynamoDB

The argument pro using DynamoDB in all circumstances is based
on a first principle:

If X produces good results in almost all the cases, then it is beneficial to always use it.
Because the majority of use cases benefit from it and the cost of not using is extremely high, it should be enforced by a rule or a policy.

Although I agree with the general statement, I think it is not applicable for DynamoDB. As any other database, it shines in some use cases and might be overkill in others.
Thus the first thing to verify is that DynamoDB violates the assumption of being superior to a relational database in almost all the use cases.

DynamoDB use cases

I am only going to go over the main use cases.
Plenty of words have been written on the topic, the interested reader will find more information in the reference section.

What are the use cases in which DynamoDB shines?

  • Key Value access pattern
  • High TPS
  • Access patterns is clear (or can be clarified) from the get-go. By contrast, in relational databases you can always write a new query or create new tables when you need it
  • High storage requirements (i.e, Big Data, etc.)

DynamoDB certainly works also in the complementary [1] use case:

  • Relational access pattern
  • Access pattern is not understood or it might change in the future
  • Low/Medium TPS [2]
  • Low/Medium storage requirements

So it looks like, DynamoDB covers all the possible use cases, thus we should always use it. Case closed, or is it?

Not quite. Using DynamoDB in sub optimal use cases has a big disadvantage: engineering costs and speed of development.
Using a database that better matches an application's needs will improve programmer productivity.

In other words the trade off to make is between the potential growth of a service versus speed of development: when using DynamoDB for applications with a low/medium TPS and a fairly ambiguous
access pattern, the argument is to favor potential growth against speed of development.

In the following section I am going to explain why I believe this might the wrong trade off to make.

The cost of picking the wrong database

Overestimating the expected growth

Most new ideas fail or they do not have the expected customer impact.
Based on this intuition, the probability of a service growing above the supported use cases of a relational DB like Aurora is very limited.
It's more rewarding being able to iterate on ideas as fast as possible in order to increase the likelihood to finally get to the successful one.

When the potential growth is easier to determine, deciding which solution to pick becomes straightforward (system is going to be
used by a well known population, company internal tooling, niche service, etc.)
This makes the choice easier to make.

Underestimating the expected growth

It's hard to tell if something is going to succeed or not.
In the fortunate case in which it does, a relational database might impact service growth, requiring engineers to perform a painful migration to NoSQL, which ultimately will impact speed of delivery for new features.

Should we optimize for this possibility?
Does the cost of a possible migration always outweighs the upfront cost required for NoSQL, even if you might never benefit from its features?

Once again, the probability to underestimate the potential growth of a service is very low, since most of the service do not land where we want them to be.
Thus, starting with DynamoDB, is at the best a premature optimization.

Regarding premature optimization

Migrating to a different database is certainly a very costly operation.
It is equally true, that after hitting a certain growth, the database is not the only bottleneck the system might have.

Unless the whole system architecture was prematurely optimized to sustain that growth, it is necessary to redesign part of the system.
Even a simple problem becomes complex at scale and over indexing on the cost of migrating to a different database risk to miss the big picture.

Operational burden and availability

This is the area in which DynamoDB is superior to Aurora/RDS in all use cases.

DynamoDB requires virtually no maintenance. Managed relational database require minimal maintenance and the automatic patching and update offered by Aurora/RDS makes it even more so.

Dynamo offers five nine of monthly uptime (99.999) versus the four nine of Aurora (99.99).
It's 27 seconds in average of downtime per month for Dynamo versus the 4 minutes and 23 seconds of Aurora.

If this numbers make a difference for you service, then DynamoDB is the way.

Costs

Storing data in DynamoDB can be expensive when compared to other solutions like RDS or Aurora. For example storing 1TB of data in Aurora costs $100/month. By comparison, 1TB in DynamoDB costs $250/month.

By using DynamoDB, most of the data manipulation (like aggregation or filters across partition keys) needs to happen in the application, which means you will incur in the extra cost required to transfer the unfiltered data from DynamoDB to your application.

Conclusion

Overall the discussion led me to challenge my ideas and go through to the process of writing down my toughs.

I am not sure if I'll manage to convince who says
always use DynamoDB.
The conclusion I have personally reached is
when in doubt use DynamoDB.

Unfortunately (or fortunately) there is no silver bullet. We need to keep at evaluating different trade offs based on the specific problem to solve.

You can find me on twitter and github.


References


  • [1] the word complementary in this context is an over simplification which does not take into account use cases better served by a special purpose solution (i.e. Graph Databases, Elastic search, Elastic cache, etc.) Go back to the article
  • [2] Low/medium TPS here means a service with anywhere between 1 and few millions requests per day. If you manage to get 1 million request every day, you will average 11.5 visits per second. Go back to the article

Top comments (2)

Collapse
 
ac4tw profile image
Alex Carreira

Thanks Nicola for sharing your thoughts--two things above were most useful:

  1. "By using DynamoDB, most of the data manipulation (like aggregation or filters across partition keys) needs to happen in the application, which means you will incur in the extra cost required to transfer the unfiltered data from DynamoDB to your application."

  2. "Once again, the probability to underestimate the potential growth of a service is very low, since most of the service do not land where we want them to be.
    Thus, starting with DynamoDB, is at the best a premature optimization."

Collapse
 
e28makaveli profile image
Klaus Nji

There is no silver bullet and I certainly agree with you on picking the best tool for the job. Kudos on this research and thanks for sharing.