DEV Community

[Comment from a deleted post]
Collapse
 
yaser profile image
Yaser Al-Najjar

Optimizing queries where you have no idea:

  • If that's the norm of query speed
  • You're doing the query the wrong way
  • Or you should change your model

I realized that's a problem most developers face 🙁

Collapse
 
helenanders26 profile image
Helen Anderson

It's a problem Analysts face too.

I get a lot of help tickets for 'slow queries' that I either can't replicate, or that can't go any faster because they are trying to join a billion row table to a billion row table to a billion row table.

Collapse
 
yaser profile image
Yaser Al-Najjar

Billon row table joins lol 😂

 
helenanders26 profile image
Helen Anderson

The expectation being that everything should return in seconds and if it doesn't the database must be the issue.

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

 
geraldew profile image
geraldew

Alas, I suspect you have just foretold the next few years of my working life as Spark usage progresses. I like it well enough but trying to be definitive about its actual performance is like trying to work out whether someone walking to the back of a slowly moving bus is actually going forwards or backwards as seen from the street but unsure if you are yourself sitting in a moving train that is inexplicably inside a jet aircraft. (With due apologies to Winston Churchill.)

 
mx profile image
Maxime Moreau

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

I've faced many issues with this... Developers are using PySpark and they're blindness writing shitty code. That's a huge problem.