DEV Community

Cover image for Graph DB in Serverless Flavor: Hacking DynamoDB

Graph DB in Serverless Flavor: Hacking DynamoDB

Renato Byrro on October 07, 2019

In this article we'll cover: 3️⃣ 3 steps to grasp DynamoDB working as a Graph database 📺 Work with examples and illustrative data 💪 Increasing v...
Collapse
 
losd profile image
Dennis Du Krøger • Edited

Is there really a good reason for all the duplication? It just looks like a manual index to me.

Why not just a GSI with the sort key as primary key reversed, and, depending on your needs, a few projected attributes, like the names? This also looks like the approach Amazon are using in their documentation for sort-key design for adjacency lists.

So, for getting all tags for Post-1, query the main table for "PK = Post-1 AND begins_with(SK,Tag-)".

For getting all posts for Tag-1, query the GSI for "SK = Tag-1 AND begins_with(PK, Post-)". If you want the name in the same query, and you're not tagging anything but posts (or mostly want tags for everything), you can even avoid duplicating the title column by not using begins_with (okay, probably not, you'll use them in the Post->Tag queries).

Wouldn't that both be faster and less error-prone?

Collapse
 
byrro profile image
Renato Byrro • Edited

Hi Dennis, that's a good question!

GSI is a viable solution, yes. It would simplify the implementation, but also comes with its shortcomings.

GSIs can't offer consistency. There will always be a delay between changes to the table and reflections in the secondary index. For some use cases, this is unbearable. With the approach I suggested, it's possible to wrap many item operations into one transaction, what gives you full control and consistency.

Another issue with GSI is that the developer now has two sources for the same dataset and needs to discover on which to read data from.

Consider the Friendly example. If you want to find all Ross' siblings, do you need to query the table, the GSI or both? If your data structure is really really simple, this might not be a problem. Otherwise, it will complicate your life when implementing new features that need to read data. And can easily end up in a mess. You could start missing data that exists by querying the wrong source, and showing incomplete information to your users. And you may never find out...

In order to minimize errors when writing data using the approach I outlined, you could have a single internal service for entering nodes and edges in the DB. You take care of this single entry point and make sure it's handing all relationships correctly. Then all other parts of your application will only invoke this single service to write in the DB.

Does that make sense?

Collapse
 
losd profile image
Dennis Du Krøger

Hmmm, not sure I'd use DDB at all if eventual consistency was a problem, as I he transaction operations throws a lot of performance away (and are somewhat limited), but fair enough.

As we're getting to the connections between equals, like the siblings example, you'd need somewhat arbitrary rules (e.g. eldest has the relation), or do the copying anyway, and then we'd both have the extra redundancy AND the GSI to worry about, so the triplestore model makes a lot of sense there.

I'm not sure that I'm convinced for data with a clearer parent-child relationship, I'll have to let it simmer for a while. :-)

By the way, you're storing the predicates in the DB, but doesn't really use them for anything. I guess that's just for discovery (e.g. when using it as storage for clients that doesn't necessarily know what relation types that exist), but are there other uses for those in a less generalised database? They seem a little bit useless to me unless you add an attribute to make a sparse index from (otherwise you'd have to do a full table scan to find them), but I've probably missed something.

Thanks for the clarification!

Collapse
 
johanmynhardt profile image
Johan Mynhardt 🇿🇦

Thank you for the great article :)

I think there may be typos in the Post/Tags example table, which may confuse readers. The repetition of PK: Tag-1, SK: Tag-1 for each of the tag values cool, awesome, neat is invalid. Only one of them can be represented in the composite partition key.
The rest of the text referring to the example references Tag-2.

Collapse
 
bionicles profile image
bion howard

This is a great concept but the implementation within the article is frustratingly non-intuitive. It’s amazing we’re in the year 2020 (13 years since the dynamo paper was written) and there’s only 1 Serverless Graph Database, Microsoft Azure CosmosDB Autopilot ... which sucks because we’re using AWS. There are so many graph databases out there yet only one serverless option

Collapse
 
byrro profile image
Renato Byrro • Edited

Hi, are you familiar with Cloud Directory? It's not precisely a graph, but follows essential concepts of a graph database. And fully serverless! 😉

Collapse
 
rehanvdm profile image
Rehan van der Merwe

Great examples, thanks!

Collapse
 
byrro profile image
Renato Byrro

Glad it was helpful! I struggled with examples in the AWS official docs, wanted to share something easier to grasp.

Collapse
 
rehanvdm profile image
Rehan van der Merwe

Yes, you have to read them a few times before it actually sets in. I especially like the graph example, don't think I have seen that before.

Collapse
 
jarfarri profile image
James

in the many to many example, these all have the same key..likely typo but

Tag-1 Tag-1 cool null
Tag-1 Tag-1 awesome null
Tag-1 Tag-1 neat null

is a lot different from this
Tag-1 Tag-1 cool null
Tag-2 Tag-2 awesome null
Tag-3 Tag-3 neat null

Collapse
 
hmphu profile image
Phu Hoang

Great example, you save my day. Thanks a lot