Optimizing queries where you have no idea: If that's the norm of query spee...

[Comment from a deleted post]

Yaser Al-Najjar • Feb 16 '20

Optimizing queries where you have no idea:

If that's the norm of query speed
You're doing the query the wrong way
Or you should change your model

I realized that's a problem most developers face 🙁

Helen Anderson • Feb 16 '20

It's a problem Analysts face too.

I get a lot of help tickets for 'slow queries' that I either can't replicate, or that can't go any faster because they are trying to join a billion row table to a billion row table to a billion row table.

Yaser Al-Najjar • Feb 16 '20

Billon row table joins lol 😂

Helen Anderson • Feb 16 '20

The expectation being that everything should return in seconds and if it doesn't the database must be the issue.

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

geraldew • Feb 18 '20

Alas, I suspect you have just foretold the next few years of my working life as Spark usage progresses. I like it well enough but trying to be definitive about its actual performance is like trying to work out whether someone walking to the back of a slowly moving bus is actually going forwards or backwards as seen from the street but unsure if you are yourself sitting in a moving train that is inexplicably inside a jet aircraft. (With due apologies to Winston Churchill.)

Maxime Moreau • Mar 11 '20

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

I've faced many issues with this... Developers are using PySpark and they're blindness writing shitty code. That's a huge problem.