Skip to content
loading...

What Do You Find Difficult about SQL and Databases?

twitter logo github logo Updated on ・1 min read  

So I was wondering... (23 Part Series)

1) What's your opinion on Coding Bootcamps? 2) How does your workplace approach Health and Wellness? 3 ... 21 3) Start-up v Corporate, which do you prefer? 4) Encouraging others to write blog posts 5) Choose Your Own (Career) Adventure 6) How does your workplace approach recognition? 7) My 2018 Year in Review on dev.to 8) Do you work on call? 9) How Many Comms Tools are Too Many? 10) What's your Wifi's name? 11) Which game are you playing right now? 12) What's your type? 13) How do you onboard a new team member? 14) What are your favourite resources for beginners? 15) What Advice Would You Give Your 20-year-old Self? 16) What are you (still) not interested in learning? 17) Follow Friday: Which DEVs would you recommend following? 18) Say something nice about another DEV member 19) Follow Friday: Which DEVs would you recommend following? 20) What do you have to Google? Every. Single. Time. 21) What Do You Find Difficult about SQL and Databases? 22) What's on Your Personal Development Plan? 23) Career Progression: What Does It Mean to You?
twitter logo DISCUSS (69)
markdown guide
 

Some of the things in no particular order:

  • Cross db joins
  • Downtime-less schema changes
  • Locking levels and transactions
  • Cascade delete
  • SQL flavors
 

I'd agree with all of those too :)

SQLLines has a free tool that I find myself coming back to when I need to translate between flavours - sqlines.com/online

 

Sounds like a cool tool, doesnt seem to work with my simple example. I will definitely keep it bookmarked to try when I have a real example.

postgresql

select * from table limit 10

Should convert to

Oracle

select * from table fetch first 10 rows only

I used it when migrating from SQL Server to Postgres and found it handy in that particular use-case.

Dates were a a real drag:

SQL Server

SELECT DATEADD(day, 1, GETDATE()) 

Postgres

SELECT INTERVAL '1 day' + NOW() 

The AWS Schema Conversion Tool is a slicker, although paid, version of the same if you are using their services.

In retrospect I could have just said "date handling" and walked away from the subsequent vortex of possible discussion. At my workplace I keep a wiki page with stock clippings of date literals, date constants and date expressions for the different dialects I use. I barely read/notice them now, I just trust my decades younger self to have ensured each was right and copy-paste them into place. Younger me never worked out INTERVAL syntax, so none of them have that.

 
 

The only thing worse than cascade delete is ending up with orphaned records. If no-one specifies the foreign key relationship in the database you can end up with order lines with no order address, or orders with no customer. I agree that RI (referential integrity) can be frustrating and perhaps engines could be smarter about checking the scope of the transaction but I'd rather have the delete fail than result in corrupted data.

 

Not being able to use windowed values (rank, lead lag etc) to filter in the base query, but the query engine is quite happy when you wrap the query in a WITH CTE AS (...) and then filter on windowed field when you SELECT * FROM CTE.

 

Downtime-less schema changes
+1

Desires state can generate unexpected/unwelcome scripts, while script based approaches can quickly become painful to manage.

 

Migra looks very promising, at least at smaller scales: make changes locally, diff against production, tweak as needed, and ship a single migration script instead of keeping old/intermediary states around forever.

 
 

Optimizing queries where you have no idea:

  • If that's the norm of query speed
  • You're doing the query the wrong way
  • Or you should change your model

I realized that's a problem most developers face πŸ™

 

It's a problem Analysts face too.

I get a lot of help tickets for 'slow queries' that I either can't replicate, or that can't go any faster because they are trying to join a billion row table to a billion row table to a billion row table.

 

The expectation being that everything should return in seconds and if it doesn't the database must be the issue.

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

Alas, I suspect you have just foretold the next few years of my working life as Spark usage progresses. I like it well enough but trying to be definitive about its actual performance is like trying to work out whether someone walking to the back of a slowly moving bus is actually going forwards or backwards as seen from the street but unsure if you are yourself sitting in a moving train that is inexplicably inside a jet aircraft. (With due apologies to Winston Churchill.)

I wonder if throwing more compute power like Spark at data projects will encourage these kind of queries to continue, rather than rewriting them with filters and aggregation to perform better.

I've faced many issues with this... Developers are using PySpark and they're blindness writing shitty code. That's a huge problem.

 

There's some irony in my answer to that question.

For Databases, I think the major difficulty is that SQL is the only language consistently available for working with them.

For SQL the major difficulty is that the implementations are so inconsistent.

To be a bit less wry, I find that handling self-referential data structures is the thing where I keep re-inventing the wheel with bespoke solutions for lack of some more powerful principles built into the data engines. Such reinventions are always challenging, both to create and support. Ditto for many-to-many structures, which always seem to require a lot of extra interface building.

When using proprietary engines such work-arounds are inherently necessary. For the Open Source options I'll have to admit responsibility for not chipping in to help building something better.

 

Others insist that every column needs to be in the table blowing it up to Millions of rows πŸ’₯ and killing performance. Separately each table is less than 100k rows. A simple view would solve their use case and keep queries fast when you dont need every column.

 

+1 for this.

You really don't need 500+ columns on a table.

 

Glad someone agrees with me.

Can't count how many conversations I have had that went something like this.

Where did you get that data?....

Oh from the table that used to run the dashboard. Ya that one that hasn't been updated in two years.

Not everyone accepts that the reason we put so much work into creating relational models is so we don't end up with a giant table Frankenstein'd together with each and every attribute an entity can have. Not easy to make this clear to folks who use Excel for their data needs and can get away with it .. to a point.

I find it's the same with dashboards in BI tools. It's going to perform incredibly slowly if we throw in every column that 'might' be needed to drill down into.

I don't understand BI dashboards and the reason behind how they work. I have seen them take 30s to display 12 points on a line chart. A real example, I rebuilt it with vanilla js and a rest API in about an hour.... Under 100ms refresh with no optimization and very little effort.

I get that they make it really easy to click together a dashboard. They are great at that, and there is a place for it. I just don't get how they can fall over so easy in some cases.

Re: 'Why is it so slow you're stealing my life, I hate this tool!'

The dashboard is only one piece of the puzzle. When you're dragging and dropping fields around, the tool is calling the database behind the scenes to execute a SQL query.

It could be slow because ...

  • All the queue slots are taken in your database and you're waiting in a queue.
  • The database is being slammed by other queries outside of the BI tool world.
  • There are limits in place to how many rows can be returned.
  • There are timeout limits on the webserver side (depending on the tool)
  • You've dragged and dropped something that doesn't have anything to join on and it's trying to do a cross join.
  • Your report hasn't been cached so each time you drag and drop it's executing the query all over again.

BI dashboards and self service is tricky to set up since everyone wants everything 'just in case', tricky to manage and keep archiving off the old stuff and tricky to keep everyone educated.

In some cases it would be quicker for the Analyst team just to do each report rather than empower users with self-service.

I could talk about this all day but @alanhylands does a better job at explaining it than I do

very well said. I like that "self-service" "just in case". It seems like a very interesting approach. The real question is, is it worth weeks of setup from the Analyst for self service when the analyst could run it ad-hoc in a few hours. I often find my organization trying to be over prepared to click something through the dashboard for the meeting with the big boss. But in reality most of the ones going to the big boss end up "pre-rendered" into PowerPoint anyways.

Thanks for the recommended article, that was just too well written!

Oh, and before mentioned example used the same exact table as the BI tool. Felt like there was too much hidden behind the BI tool curtain to really point to what the issue really was.

I can see why Self-Service was the magic bullet. Free up the Analyst to do more 'value add' work and leave the 'give me a number' type requests that are easy to get to the Tool. The problem is the end user needs to:

  • be able to use it from a functional point of view
  • know where to go to get the numbers in the masses of reports
  • know the filters to add to make sure it is actually the right number
  • be surrounded by data governance to make sure nothing sensitive is shared or that crazy numbers aren't shared to the wide world ... or just the big boss and that URGENT power point presentation

The reality is there's a lot of work that goes into building models and educating the user. That is if they want to be educated at all.

The tool we use at my workplace has an API, it just hasn't been switched on yet. It will be interesting to see if that speeds things up and allows our end users to hit the tables directly .. sort of.

 

Depends.. but I agree it smacks of poor design in most cases. I once had a vendor write a custom view which (due to their security model) returned an error when I tried to use it because MSSQL is limited to 260 joins in a single query. O re-wrote it in about 20 joins and unsurprisingly it took 1/8 the time to execute

 

Terrible error messages from IBM and Oracle. I wish they could make good error messages like the open source alternatives. I generally get something that essentially says, your sql statement has an error, with no idea to whether its a column name, table name, or syntax error.

 

I'm using pgAdmin as my UI and it leaves me such polite messages when I screw something up

 

That's Postgres itself! You'll see the same thing in the terminal with psql or in stack traces that include output from the database :)

Such a heartwarming thing to see some actual helpful information and not just an angry looking error code.

 
 

COPY\ command. Honestly can't get Postgres to work with it on an import so have to resort to slow --inserts. 😒

 
 

I think this is the most challenging.

Once you have the structure crystalized with a lot of external references, sometimes in different languages and applications, is very hard to change it.

 
  • Creating SQL language programmatically, preferably from JSON -- I had to create this
    • I think it is actually as same topic as creating DSL programmatically
  • Writing custom triggers and functions
  • Differences between SQLite and PostGRES (I have only tried two)
    • Also, I haven't try querying JSON in PostGRES
 

Writing custom triggers - you don't write them ; )

 

Databases? The lack of thought that goes into their architecture.

SQL? The lack of quality of life functions the language really should have now that it’s 40 years old.

 
 

The syntax is ridiculously verbose and developer-unfriendly. It feels like it's stuck to COBOL days and stayed that way because people who manage said databases don't know any better and aren't willing to learn NoSQL or use an ORM.

 

Dealing with massive tables, hundreds of millions or billions, when adding an index might actually slow down inserts πŸ˜‚

 

I'm dealing with this right now, each analyst has an opinion on indexing and if I add all their suggestions the table will never update

 

If you have that much data you should hopefully be on an Enterprise level product and partitioning should be a viable option. If you are on mssql, get onto sqlservercentral.com forums and reach out to gilamonster or grant Finchley. They should be able to help with strategy

We have 70+ analysts on AWS Aurora (Postgres). The clustered indexes are applied by my team (BI), after that it’s up to the analysts to agree how they would like extra indexes to be built to support what they are doing

 

For me the most difficult part are the cloud databases and the Big Data variants of the Databases that use a SQL layer for processing queries. Trying to fit a Relational paradigm to a Columnar or even cloud DB is very difficult. Many times they say its auto scale and auto optimised but many a times we do not have a control over the performance of the Query like we have with Index and DB hints.

Hence I feel relational databases with optimisations available to DBAs and Database developers is a much better option at least we know how to tame a beast then these hybrid solutions.

Another issue is the lack of standardisation while implementing SQL standards and how they implement stored procedures and vanilla data types like Json, xml or variant. Once you go past simple queries to more advanced Data Analytics stuff I feel SQL is not the best option and yet you find many BI teams trying to cram their Data Analytics initiatives more towards Database queries and Window functions rather using a more efficient options.

 

Maybe we need to have Helen pose an echo question for "Big Data" and Hadoop instead of Databases and SQL.

 

Pivot/Unpivot. My only use for these are when I answer Stackoverflow questions, so I can never remember how to do it correctly.

 

Since moving to Postgres I've been using the crosstab function to do pivoting. Have you given it a try?

 

In my workplace we're using SQL Server, I'm not familiar with Postgres nor am I aware of crosstab function, so the answer is no, but I will check it out...

If you're a Microsoft shop then MS Access has a "crosstab" query (and has done since v1.0 if memory serves).

 

Databases are not my expertise so my thoughts with a grain of salt:

Such a controversial technology. An incredible accomplishment dating 50 years now and still perfect for most scenarios. But 50y is a long time and some of its most powerful features like the transaction are also the biggest limiter at the same time for multi regional setups.

 

SQL based DBs, the ancient king of data storage technologies is showing its age.

The SQL language itself is just archaic. Try writing rollback scripts for undoing cross table joins. (for example).

Data normalization comes at a huge run-time join price, especially when database sizes are in exa-byte land. Ever have a query take a day? Only to be canceled by DBAs for too much resources?

Json based No Sql Dbs disrupted the SQL train. As storage costs plummented, document based DBs made sense. Plus they are fast and better suited to the internet.

I wonder how many Google searches use SQL?

 

Them inner and outer joins...

And also using 'order by' and 'group by' correctly...

 

In mssql (not sure of other dialects) you can drop the inner for join, left and right. You only need outer on the full outer join. In 25 years of working in SQL I have never come across a legitimate use for Right Join. RJ means you don't understand your data.

 

I've used right joins a handful of times, most recently joining a table of weather stations to a CTE representing intervals during which stations were or weren't deployed (the latter being of primary interest). Left joins are certainly much more intuitive in almost all cases, but I suspect that has more to do with English-readers being used to processing left to right than anything else.

In which case it should have been

SELECT 
        CTE.column
        ,STATION.data
FROM
        CTE
    LEFT JOIN STATION ON STATION.CTE_ID = CTE.ID
 

Indexing / optimising and weird CPU spikes when they are probably something wrong with my query but I have no idea what!!

 

Mutiquery-> especially when you want to call a query inside a query to get sum, Average, T group by, Order By

 

Can anyone tell some good resources to learn Database Optimization ?

 

Your database manual should cover something about performance. It really depends on the database you are using. My suggestion is too google for posts or videos about "query optimization for + your_db".

If you are using a proprietary database ask the vendor, they should have some specific documentation about it, then start asking questions.

 

Sqlservercentrsl.com
The forums and stairway blogs.

 
 

No. Because I use Codeigniter πŸ˜‚

 

OEM and Active Record patterns result in very poor performance if you have a complex data model. Using ORM like linq or eloquent results in really poor data types in the database which waste space and impact query performance

 

Yeah. I agree with your opinion. While I get complex model, I use native SQL syntax πŸ˜‚

It's not just complexity; code-first migrations tend to use generic and therefore wasteful data types and don't normally provide good index candidates

 
 

Depends on how you write your code. In mssql you can lift complex logic into common table expressions and incrementally build your query logic in individually testable steps. You still can't pause the query during debug but that is the cost of having a set based language.

Classic DEV Post from Jun 23 '19

What Advice Would You Give Your 20-year-old Self?

If you could go back in time, what advice would you give your 20-year-old self?

Helen Anderson profile image
I work with analysts to support their interpretation of data sets, optimise code, lead projects to build AWS solutions that support their work and provide guidance for juniors

Sore eyes?

dev.to now has dark mode.

Go to the "misc" section of your settings and select night theme ❀️