loading...

Discussion on: Any NoSQL true believers out there?

Collapse
yaser profile image
Yaser Al-Najjar

Sometimes NoSQL is really important... but I really believe most apps should use a relational model for main stuff.

It's not white and black, it always depends.

Example from real life:

github.com/coretabs-academy/websit...

In our academy system, we have a track which consists of many workshops which has many modules, each module has many lessons.

Okay, so:
Track => Workshop => Module => lesson

Sounds like a document model right?

So our academy library is begging for NoSQL, but we use django... and django hates mongo :(
And here we improvised and used relational model, guess what we end up with?

We got 4 joins between the four tables... so each time the user opens the academy to see the lessons, we need to perform 3 sub queries (let alone the ugly long query for calculating the shown lesson percentage).

Hmm, okay... how would the documentDB solution look like: it's really simple, one query (get track document) !

Yeah, we will rewrite it in dynamodb with lambda soon. Anytime we might get our server loaded.

Collapse
stereobooster profile image
stereobooster

Ok you have 3 joins. What bad about this? I guess response is still 20ms. Did you research how to create local development environment with DynamoDB? Last time I checked AWS wasn't friendly for that case.

Collapse
klockner profile image
Gabriel Klockner

It is pretty simple to create a local environment with DynamoDB. I found a docker image a time ago with it, where you can use the javascript shell playground to learn and test some queries. It also have a jar from AWS if I'm not wrong.

Thread Thread
klockner profile image
Gabriel Klockner

I found the image that I have used (dwmkerr/dynamodb:latest), that was my docker-compose.yml

dynamodb:
image: dwmkerr/dynamodb:latest
ports:
- 8000:8000
command: -sharedDb

Thread Thread
yaser profile image
Yaser Al-Najjar

Thanks... this will help us a lot :)

Collapse
yaser profile image
Yaser Al-Najjar

I haven't put my hands dirty with dynamo, but I'm pretty sure the response time won't be 20ms in the relational model cuz from a scalability point of view, we do this fat query:

github.com/coretabs-academy/websit...

This is the with_is_shown function:

github.com/coretabs-academy/websit...

You see here that we store is_shown values of all users all in one table, and this will get slow in time the user base gets into 100,000 users where each user watched 100 lessons watched = 10,000,000 records to get the shown lessons !

I really think the models are shouting: "Please bring me the DOCUMENT model !" :D

You might mention sharding, but you see the problem isn't with the data growing bigger, the problem is within the model itself.

Thread Thread
stereobooster profile image
stereobooster

It's a shame I'm not that good with Django. If it would be ActiveRecord it would be much easier for me to understand what is behind. I will try to read it but no guarantees.

Can you get output of explain queries from the production db for those queries?

Thread Thread
yaser profile image
Yaser Al-Najjar

It would take me some time to get done right now, cuz I need to:

  1. Spin up the staging env
  2. Copy the production db into the staging env
  3. Turn into debugging mode (to run the debug toolbar)
  4. Get the generated query from there
  5. Run the explain query in DBeaver in the production db with the generated query

I will do once we do the first 3 steps these days

Thread Thread
stereobooster profile image
stereobooster

I hope you will post a blog about how the transition to new DB has gone and what decision process was. Without seeing actual DB (and hardly able to read Django) it is hard to judge, maybe you really have a good case for DocumentDB.

Thread Thread
yaser profile image
Yaser Al-Najjar

Sure... will do ;)

Collapse
hagailuger profile image
Hagai Luger

What about caching?

Collapse
yaser profile image
Yaser Al-Najjar

We do memcaching... but what's the point of caching is_shown for the lessons?

The user will have bad experience and say (I watched this lesson, why isn't shown till now)

Thread Thread
hagailuger profile image
Hagai Luger

Thanks for your reply.

  1. I wasn't asking specifically on the is_shown part, but rather about the performance issues you've talked about. you said "so each time the user opens the academy to see the lessons. we need to perform 3 sub queries" why can't you cache that?

  2. even on the is_shown part - why can't you expire the cache when you need to?

Thread Thread
yaser profile image
Yaser Al-Najjar
  1. We do caching with memcache... but caching is only an optimization, and the caching layer is gonna work after doing the cruel query, here is the caching mechanism:

github.com/coretabs-academy/websit...

  1. As you see, we preferred to give the user direct numbers, cuz the user can watch a lesson in one min, then he wanna see the is_shown True in front of him, to get the feeling of achievement, and not feel irritated

(we really get lots of responses as I watched the lesson why isn't it there, and that's just because of the frontend caching layer... cuz everyone wants the completion certificate :D ). That's why we accept the cruel query for this part.

Aside from all that, do you think optimizing with caching is really enough with all that mess... especially with the m2m ugly relationships :(

Collapse
buinauskas profile image
Evaldas

Use neo4j! A NoSQL graph database. 😉😉😉

Collapse
yaser profile image
Yaser Al-Najjar

Hmmm, I read about the graph DBs... but how does it solve our problem?

I see the problem as an aggregate root of Track (de-normalized all in one model) which is what the document model solves.

How would the graph model look like?

Thread Thread
buinauskas profile image
Evaldas

Neo4j allows you to have entities, quite similar to what a row in a table is. The key difference, subjectively, is flexibility to declare relationships between these entities in an easier manner than in a relational database. Aggregates can be easily created using their query language, Cypher, which isn't too hard and too different from SQL.

Yet again, if read speeds are critical and you can live without immediate consistency, then a key value or a document database would do the job perfectly.

Thread Thread
yaser profile image
Yaser Al-Najjar

Thanks for the elaboration, very appreciated !

Surely, we will discuss that with the team to see how things go... guess we are probably gonna use Neo (or any other suitable graphdb) with the profile model as well.

Collapse
bgadrian profile image
Adrian B.G.

Technically, NoSQL reffers mostly to non-relational databases, and a Graph DB is all about relations, so I would say a Graph is more SQL than a standard RDBMS is :))

Also Neo4J doesn't scale (main advantage of the NoSQL), some new graph databases does like DGraph and Neptune.

Thread Thread
buinauskas profile image
Evaldas

Neo4j and Amazon Neptune are slightly different breeds. They're technically triple store databases. But yeah. Other than that I agree with you.

Collapse
stereobooster profile image
stereobooster

Is it not quite graph database use-case? I thought you would need graph DB when you need to traverse graph, like give me all friends of all friends of A (wherein relational DB you would join table on table N times so eventually you will run out of RAM), but graph DB literally traverse graph, so there is no penalty in memory.

Thread Thread
buinauskas profile image
Evaldas

That's true. It really depends what kind of queries someone wants to run. Even in current example, you could end up joining same table multiple times to get a desired result and graphs would do better than a relational database.

Thread Thread
yaser profile image
Yaser Al-Najjar

Actually the document model fits more cuz we don't actually need to traverse but to compose everything into one UI.

As in the pic, we show all the track workshops on the right side, and we calculate the percentage of the shown lessons of each workshop, so we need to get everything of each workshop at once.

workshops

But for the profile we have a similar case, each profile has dozens of tasks, quizze, and projects... and we will traverse them on demand (lazy-loaded).