Help me decide: Embed subdocument or have a new document collection?

#help #database #question

I've been building a telegram bot recently which has an element of user tracking to allow for whitelisting users within my chat. I am now moving into an area of building out a reputation system. This will allow users to pass reputation to eachother, maybe via a command or maybe by just replying to them and saying thanks.

I'm using neDB right now which is saves to the local file system but behaves very similar to mongoDB, a document store database. I've modelled User, Commands and other objects so far with simple single documents for each case.

When designing the reputation system I'm hitting a bit of a fork in the road where I need to decide where I utilise a subdocument within my User document or I create a new Reputation collection, which might store a reference to my User.

Rough requirements for the reputation system are:

Each user can give reputation to any other user than themselves
Each user has a maximum of 10 reputation points to give per day
Each user can only add reputation to the same person 3 times a day
Reputation points are re-generated each day (back to 10 points)
Users want to see who has given them the most reputation
Users want to who has the most reputation
Admins can remove reputations points incrementally
Admins can remove all reputation points
Users want to see the number of reputation points they have

There are some other requirements but these are the most relevant to my question. So I've been toying with the following ways of doing this:

Utilise an array in my User collection to store the userID of the person who gave the reputation. I can then check array length to get total reputation and push/pop this array to add and remove reputation. My concern here is that the array could forever grow and is it then efficient to be querying a user to pull back an object with a large array attached?
Similar to the above except I hold an array plus a reputation_count this would mean I can filter out the large array when conducting my query and simply pull the count off the user. If I needed to get the "users want to see who has given the most reputation" I can pull the array info also. Once again, I'm not convinced just yet. This is a trade off between easy query vs large data retrieval.
Generate a new collection for reputation, this could be 1 reputation document per user with similar array scenario as before.
Generate a new collection for reputation, but with each document being 1 reputation point containing userID of the target and userID of the person who gave it.
Same scenario as 3 or 4 but also holding a reputation_count at the User document level. This allows me to grab the count very easy but then query the Reputation documents if I need further details.

All of the above as mentioned are really only options because I'm trying to reduce my number of queries dependent on the situation. I'd rather do one query and get the data I need but at the same time, I don't want to pull large documents if I dont need to. I think there are nuances in the requirements that might require me to carry out multiple queries whatever I do. I haven't even discussed in depth the checks around maximum number of times a user can give another use reputation in a single day too.

Any thoughts on how to approach this? maybe even just at a general level when thinking about these problems?

Top comments (4)

Dian Fay • Aug 4 '18

All your ideas are kludges to try to mimic a relational structure.

as you identified, this becomes more difficult and less performant as the rep arrays grow. The real problem isn't retrieving a user with a long rep array though -- it's filtering that array, which you need to do a few different ways.
materializing the total count per user only helps with one specific use case, not for anything else.
this just moves the same problem to another collection.
reinventing the junction table is probably the least-bad approach. A look at neDB doesn't turn up anything about foreign key constraints, so referential integrity is on you. Indexing will be important because this collection will be by far the biggest.
might be required if indexes with #4 aren't sufficient.

Bottom line, you have relational data that can scale arbitrarily and it's always going to be an awkward fit at best in a non-relational datastore. The single best thing you could do is switch to a database that's designed to handle the information structures you're working with.

ImTheDeveloper • Aug 5 '18

Thanks for the input. Filtering and managing an array is pretty simple though and is nothing beyond map reduce functions. I'll give a few options a try and see which works nice

Dian Fay • Aug 5 '18 • Edited

It is simple! The problem is it takes time. Filtering one rep array is O(n); filtering and aggregating (for example, if you wanted to see all rep you've given) is O(n² ). Performance may be acceptable at first but you've chosen a structure which can grow indefinitely and does not admit any shortcuts.

ImTheDeveloper • Aug 5 '18

Thanks for this will have a read.

DEV Community

Help me decide: Embed subdocument or have a new document collection?

Top comments (4)

Read next

VACUUM In Postgres Demystified

Handling Data in SQL: Signed vs. Unsigned Types

Top 8 MySQL Schema Checks to Boost Database Performance

How to Set Up Postgres Using Docker