I agree with spawning concurrent workers, but how do you keep those 10,000 jobs from exhausting the db connection pool? I started wrapping my job logic with a .with_connection block to keep the job from using 10,000 connections to the database.
How do you execute both concurrency and database connection handling?
For our team, we make the connection pool size = concurrency so we don't have to think about the concurrency exhausting the DB connection pool. With this, we have to trade-off having high levels of concurrency for being sure that our DB connection pool won't get exhausted.
There's a whole debate on the Sidekiq repository about this: github.com/mperham/sidekiq/issues/...
I need to process large(size and number) CSVs (couple of MBs having more than 0.1 million records each). Each record in the CSV has to be processed sequentially.
There are lot many relations comes into picture and so I have use memoization a lot to reduce db calls.
So in such cases, I cannot use the technique mentioned in point no 2.
Load raw-ish data into a work table and then process in the database?
One thing we've been doing for a little while with our logic is to encapsulate it in "callable poros", basically poros that have a single call method and no other public interface. The side effect of this is that we don't need to add new jobs, we can simply have a generic CallablePoroJob and pass the name of the poro along with arguments. It makes it trivial to background some logic on the fly without needing to create new worker classes. Which can be especially valuable when you do zero downtime deploys, which would typically necessitate multiple deploys to make sure the job class is available before code that makes use of it is deployed.
Does multiple workers really work on standalone sidekiq? for example, I have 2 employees: ProccessWorker CallbackWorker
when i run sidekiq: bundle exec sidekiq -r./workers/proccess_worker.rb -C./config/sidekiq.yml
only one worker during this time.
customwritingz, Gene E. Burks
I usually just do bundle exec sidekiq -c config/sidekiq.yml. This will serve workers that are in the queues specified in the config/sidekiq.yml :D
bundle exec sidekiq -c config/sidekiq.yml
I used to agree with #1, but I've since come to realise it was (at least in my case) an overreaction.
Now, my feeling is this. If you need to run some logic in a worker, just write it in the worker until your needs change. The one valid reason for extracting it is that it improves something – makes tests easier to write or faster to run, de-duplicates code avoiding bugs etc.
Also consider that the cost of extracting it to another class today (when you don't know how or if you will reuse it) is probably about the same as extracting it later if the need arises. And then you will only do it if you need to, and you will be able to do a better job since you know how it's reused.
It's always the case that it depends, of course. There are nuances and I'll extract some stuff, e.g. things that are very clearly tightly coupled to a model and has the same reasons to change as other parts of that model.
For small, straightroward workers, I agree with you that it could stay in the worker since we don't know how it would grow.
For workers with complex actions though, I think that #1 still applies since we already know more or less how the logic would grow, and enforcing that now could save other developers working on it in the future of having to figure out how it's supposed to be organized.
We're a place where coders share, stay up-to-date and grow their careers.
We strive for transparency and don't collect excess data.