Discussion on: Trying to find Node backend framework with worker_threads / cluster / fork

View post

Not sure what the references to multi-process/thread in the title, but you are running out of memory while trying to build a big string from JSON.stringify. I have seen that before when you have large arrays. I have no idea how you are writing these 10K records to the DB and I don't know the size of each object in the array, so it's difficult to give advice. You are blowing the memory limit at ~1.4GB.

Maybe try going down to 1K or even 100 records at a time to see if that fixes the issue.

Pacharapol Withayasakpunt • May 13 '20 • Edited

I fixed with worker_threads or threads.js, but some threads die due to SQLite write lock (even if I enable WAL mode.)

Maybe try going down to 1K or even 100 records at a time to see if that fixes the issue.

I am using a web server with POST request / WebSockets, so I assume that I cannot really control concurrency???

simonhaisz • May 13 '20

From the node docs

Workers (threads) are useful for performing CPU-intensive JavaScript operations. They will not help much with I/O-intensive work. Node.js’s built-in asynchronous I/O operations are more efficient than Workers can be.

If each thread is writing then they will block each other, regardless of what mode you use. I don't know why you started using worker threads, maybe you have some expensive request handling before writing to the DB, but they are not going to help you write to the DB faster.

Pacharapol Withayasakpunt • May 13 '20

maybe you have some expensive request handling before writing to the DB,

Thanks, I will try to extract the said steps.

simonhaisz • May 13 '20

The good news is that you clearly have an easy method of testing this. Great! I suggest you use that to better understand you system. My recommendation would be to remove all of the worker thread stuff and then run your test against the system while under a profiler. That should give you the truth about where the work is actually being done in your system. Only then should you then look at applying techniques, whether worker threads or otherwise, to improve performance.

Pacharapol Withayasakpunt • May 13 '20 • Edited

The real issue is PK/FK/UNIQUE. How do I enforce that? My original solution was to MIX READ/WRITE's, but it is probably a bad idea. How about?

UUID PK/FK vs INT PK, but SELECT MAX(PK) + 1
UNIQUE with new Set() / new Map(), where keys are JSON.stringify(Array)
BEGIN TRANSACTION -- COMMIT + Write only, no reads at the time of writing.
Do I need to PRAGMA foreign_keys=off temporarily? Or maybe I don't need to do that?

and then run your test against the system while under a profiler.

Not really sure how do I do this in Node.js

whether worker threads or otherwise, to improve performance.

better-sqlite3 is sync by default. Not really sure if this is a good or bad idea. The only way to go async is to use Worker or Fork.

simonhaisz • May 13 '20

Key uniqueness is an entirely different problem. I don't use SQL DBs that much so I cannot offer advice on the technical options you specify. Though I expect that before anyone else could you would need to share more of the use-case about what you are actually trying to accomplish.

When trying to manage key uniqueness there are three fundamental approaches you can take.

Assume the data is already unique and do nothing apart from throwing errors when this assumption is incorrect.
It is the writers responsibility to ensure uniqueness. This generally involves querying the DB to find out about the existing keys but doesn't have to. If you have access to all of the data outside of the DB before it is inserted you can create unique keys based upon that before inserting.
Define the DB schema so that uniqueness is automatic. Auto incrementing numbers or GUIDs work here.

From the sound of it I'm assuming that your request handler is not just inserting thousands of records but is inserting records into multiple tables, hence your reference to foreign keys. So the crux of the problem isn't necessarily key uniquness but in finding the keys of the records you just inserted into order to refer to them in other records you are inserting. If you got with the above option #3 then you have to query the DB to find the values of the keys that the DB determined for the inserted records. If you go with option #2 you can create the key values yourself before insert so you do not need to query afterwards, but that means you are responsible for keeping them unique. GUIDs are the simplest approach, not sure why you don't like that one. If you went with that your system would already be working.