dev.to staff for The DEV Team

Posted on Mar 12

What Are the Top Database Optimization Techniques?

#discuss

Which database optimization techniques do you find most effective in improving performance? Share your experiences and insights on implementing these techniques in your projects.

Follow the DEVteam for more discussions and online camaraderie!

The DEV Team

The team behind this very platform. 😄

Top comments (15)

Bala Madhusoodhanan • Mar 12

1) While data holds immense value, not all of it must occupy space in the active database. In an era of decreasing computing and storage costs, adopting a ‘less is more’ approach. Archive and retain only the essential data.
2) Indexes: Properly designed indexes can speed up data retrieval. Indexes help locate specific rows efficiently, especially when searching large datasets.
3) Utilizing Object-Relation Mapping (ORM) allows applications to handle complex queries and relationships more efficiently.
4) When fetching large result sets, unbuffered mode reduces memory consumption by fetching rows one at a time

Ask the DB administrator for help !!

Bram • Mar 17

The statement that "Utilizing Object-Relation Mapping (ORM) allows applications to handle complex queries and relationships more efficiently" is not entirely accurate. While ORM tools do simplify the interaction between object-oriented programming languages and relational databases, they can sometimes be less efficient than writing SQL directly, especially when dealing with complex queries. This is because ORM tools generate SQL queries automatically, which may not always be as optimized as those written by a skilled SQL developer. Additionally, some ORM tools may not support advanced SQL features, limiting their effectiveness for certain applications .

ORM tools provide a higher level of abstraction, allowing developers to work with objects and their relationships instead of directly writing SQL. This abstraction can make development faster and more intuitive, as developers can use the programming language's syntax to interact with the database. However, this abstraction can also introduce performance overhead, as the ORM tool needs to translate object-oriented operations into SQL queries. This translation process can result in less efficient queries and potentially more database queries than necessary, which can slow down performance .

In summary, while ORM tools can significantly simplify database interactions and reduce the amount of SQL code developers need to write, they may not always provide the same level of efficiency as hand-written SQL, especially for complex queries and operations. Developers should consider the specific requirements of their application, including performance needs, when deciding whether to use an ORM tool .

MisakaCloud • Mar 18

Yes, so I like orm like hibernate or drizzle. They also provide the manual way to construct SQL query.

Bala Madhusoodhanan • Mar 17

Absolutely .. thanks for adding more clarity ..

Bram • Mar 17

AI convinced me with this, I haven't tested it my self tho 🤷‍♂️.

Bala Madhusoodhanan • Mar 17

hence the slate statement ask the DBA :-D...

Those were from my experience.. Always had DBA and senior developers who had guided or helped me to explore the options

Bulk data operations and for fine tuning an performance based query ORM might not work.

Ryan Brown • Mar 12

Snarky (mostly) answer is to not use a database if you don't NEED one.

Check the amount of data you're dealing with, how you are accessing it, what the storage needs are, how relational the data are and how often and in what way is it changed.

It may be better to keep your data in something other than a "proper" database. If it's mostly static, structured files may be a good choice.
It may be easier to denormalize your schema so as to not need relational look-ups.
If your data is dealt with sequentially, a file-stream sounds better than an order by clause.

Even if a database is a best choice for storage, modification and retrieval... you may be able to mitigate the need for high-performance tuning by generating the output as needed to flat files that are then used directly by the portion that requires performance.

Not Real Example For Illustrative Purposes ¹

(Totally not something I'm doing myself 😉)

Context

Blog CMS (without comments for simplicity)

Writing blog posts is not frequent. Maybe a few times per day at most.

Between individual requests, it is very unlikely that content has changed.

Keep blog posts and data in a database; Let's assume you have a fine editor and way to place & retrieve blog data from the database.

Use a separate process to pull changed data from the database and render it to HTML files.

Publish only the static, rendered, HTML files.

Benefits from client side caching; Assuming that your headers are configured correctly.

The site will be fast. Compiling the site can be slow, it doesn't affect the blog reader's ability to get the content.

That being said... most often the Database performance tasks I end up actually using when I'm actually using a real Database are probably pretty close to the same as others

Indexing
Removing unneeded joins & group by clauses
Pay attention to the views you use... they may be costly due to other joins that you don't need.
- Perhaps for less actively changing data... materialize that view either using the database's built in methods or select into a table periodically and then use that instead.
Pay attention to WHERE the data is stored. What kind of storage is used... (spinning iron, RAID) is it being pulled across a long wire? (e.g. are you using a DB link across the continent?)
pre-eliminate as much data from the intermediate queries as possible if it's not needed later.
Sargable get to know it, make it a part of your intrinsic thought process when considering queries
As others have said, the balance between Normalization levels: consider various data mart structure strategies, you may need more than one depending on how your data is consumed

My real-world project doesn't use a database, but instead markdown files with structured metadata blocks. State is remembered in flat files to keep track if content has been changed and when content has been changed, which other files need re-rendering due to implied or declared dependencies. The goal is to have an easy to edit blog, just .md files, that get compiled to good quality HTML files that get published to the web-host and the readers experience it as constructed from purely static files. ↩

Fatemeh Paghar • Mar 13

Data Archiving and Purging: Archiving historical data and regularly purging obsolete data can reduce the size of the database and improve query performance. This also helps in maintaining optimal storage utilization.
Caching: Implementing caching mechanisms can reduce database load by storing frequently accessed data in memory. This can be achieved through in-memory databases, caching solutions like Redis or Memcached, or application-level caching.
Hardware Optimization: Investing in high-performance hardware, such as solid-state drives (SSDs), sufficient memory (RAM), and powerful processors, can improve database performance. Additionally, optimizing server configuration settings and ensuring proper resource allocation can enhance overall performance.
Regular Maintenance: Performing routine maintenance tasks such as index rebuilding, statistics updates, and database reorganization can help optimize database performance over time. Automated maintenance plans can be set up to ensure regular execution of these tasks.

Jack • Mar 15 • Edited

A few I didn't see:

Don't use NULL values: a query with WHERE col IS NULL or IS NOT NULL, either syntax will be very slow for lookup - best to default the values.

Do not use LEFT JOIN t ON t.col1 = t2.col1 OR t.col2 = t2.col3

Using OR during LEFT JOIN is almost always going to move at a glacial pace.

We all know about indexing, but having efficient lookups and relationships that are indexed in a reliable way that doesn't require data duplication is only half of the battle - the other half is making sure you do not write boneheaded queries - especially when running against massive databases.

As others said, segment out the data - I use rolling timers to do "garbage" cleanup. We typically track every single micro-thing both the server and the user does - 6 months from now, it doesn't really matter what time you logged in and out of the platform.

Always scan your DB for tables that are taking up a lot of space - you would be surprised how an errant column of data you don't need can balloon your database size.

Here comes the controversial advice: I found that while using PDO in PHP, binding too many parameters can be incredibly slow. If you are absolutely sure about the source of the data and have sanitized it, there doesn't seem to be a reachable "query size limit" - if I try to bind 3,000 parameters, my sqld is going to shit the bed - but a 40,000 character "query" flys through at the speed of light. YMMV and this may be the worst advice you read on the internet for the rest of the week, but if you absolutely need to juice the last bit of performance out of a query with PDO, test the speed difference between binding versus just shoving a huge long query through. :)

Ben Halpern • Mar 13

Indexes!

Jacelyn Sia • May 30 • Edited

Top database optimization techniques include indexing to speed up queries, query optimization to refine SQL statements, using appropriate data types, partitioning large tables, regular database maintenance like vacuuming and defragmentation, caching frequently accessed data, and optimizing hardware resources such as memory and CPU.

Deepak Prabhakara • Mar 13

Indexes, indexes indexes!

A few other tricks that help (as needed):-

Ensure connections to db are pooled
Query Optimization
Monitoring and Profiling
Limit your queries (pagination etc.)

Ezra Epstein • Mar 16 • Edited

Using 'where not...' for inserts to improve upsert performance. Notably under Postgres on tables with triggers this can lead to a 10x improvement vs. using the 'on conflict' clause alone for conditional inserts. Adding to the where clause also avoids generating and then discarding values from sequences. The where clause condition should aim to use an index -- since unique indices are what cause the conflict in the first place this usually means the index needed in the where clause already exists.