DEV Community

Cover image for What Are the Top Database Optimization Techniques?
dev.to staff for The DEV Team

Posted on

What Are the Top Database Optimization Techniques?

Which database optimization techniques do you find most effective in improving performance? Share your experiences and insights on implementing these techniques in your projects.

Follow the DEVteam for more discussions and online camaraderie!

Top comments (14)

Collapse
 
balagmadhu profile image
Bala Madhusoodhanan

1) While data holds immense value, not all of it must occupy space in the active database. In an era of decreasing computing and storage costs, adopting a ‘less is more’ approach. Archive and retain only the essential data.
2) Indexes: Properly designed indexes can speed up data retrieval. Indexes help locate specific rows efficiently, especially when searching large datasets.
3) Utilizing Object-Relation Mapping (ORM) allows applications to handle complex queries and relationships more efficiently.
4) When fetching large result sets, unbuffered mode reduces memory consumption by fetching rows one at a time

Ask the DB administrator for help !!
help

Collapse
 
sha_255 profile image
Bram

The statement that "Utilizing Object-Relation Mapping (ORM) allows applications to handle complex queries and relationships more efficiently" is not entirely accurate. While ORM tools do simplify the interaction between object-oriented programming languages and relational databases, they can sometimes be less efficient than writing SQL directly, especially when dealing with complex queries. This is because ORM tools generate SQL queries automatically, which may not always be as optimized as those written by a skilled SQL developer. Additionally, some ORM tools may not support advanced SQL features, limiting their effectiveness for certain applications .

ORM tools provide a higher level of abstraction, allowing developers to work with objects and their relationships instead of directly writing SQL. This abstraction can make development faster and more intuitive, as developers can use the programming language's syntax to interact with the database. However, this abstraction can also introduce performance overhead, as the ORM tool needs to translate object-oriented operations into SQL queries. This translation process can result in less efficient queries and potentially more database queries than necessary, which can slow down performance .

In summary, while ORM tools can significantly simplify database interactions and reduce the amount of SQL code developers need to write, they may not always provide the same level of efficiency as hand-written SQL, especially for complex queries and operations. Developers should consider the specific requirements of their application, including performance needs, when deciding whether to use an ORM tool .

Collapse
 
gzzchh profile image
MisakaCloud

Yes, so I like orm like hibernate or drizzle. They also provide the manual way to construct SQL query.

Collapse
 
balagmadhu profile image
Bala Madhusoodhanan

Absolutely .. thanks for adding more clarity ..

Thread Thread
 
sha_255 profile image
Bram

AI convinced me with this, I haven't tested it my self tho 🤷‍♂️.

Thread Thread
 
balagmadhu profile image
Bala Madhusoodhanan

hence the slate statement ask the DBA :-D...

Those were from my experience.. Always had DBA and senior developers who had guided or helped me to explore the options

Bulk data operations and for fine tuning an performance based query ORM might not work.

Collapse
 
ryencode profile image
Ryan Brown

Snarky (mostly) answer is to not use a database if you don't NEED one.

Check the amount of data you're dealing with, how you are accessing it, what the storage needs are, how relational the data are and how often and in what way is it changed.

It may be better to keep your data in something other than a "proper" database. If it's mostly static, structured files may be a good choice.
It may be easier to denormalize your schema so as to not need relational look-ups.
If your data is dealt with sequentially, a file-stream sounds better than an order by clause.

Even if a database is a best choice for storage, modification and retrieval... you may be able to mitigate the need for high-performance tuning by generating the output as needed to flat files that are then used directly by the portion that requires performance.

Not Real Example For Illustrative Purposes 1

(Totally not something I'm doing myself 😉)

  • Context
    • Blog CMS (without comments for simplicity)
    • Writing blog posts is not frequent. Maybe a few times per day at most.
    • Between individual requests, it is very unlikely that content has changed.
  • Keep blog posts and data in a database; Let's assume you have a fine editor and way to place & retrieve blog data from the database.
  • Use a separate process to pull changed data from the database and render it to HTML files.
  • Publish only the static, rendered, HTML files.
  • Benefits from client side caching; Assuming that your headers are configured correctly.
  • The site will be fast. Compiling the site can be slow, it doesn't affect the blog reader's ability to get the content.

That being said... most often the Database performance tasks I end up actually using when I'm actually using a real Database are probably pretty close to the same as others

  • Indexing
  • Removing unneeded joins & group by clauses
  • Pay attention to the views you use... they may be costly due to other joins that you don't need.
    • Perhaps for less actively changing data... materialize that view either using the database's built in methods or select into a table periodically and then use that instead.
  • Pay attention to WHERE the data is stored. What kind of storage is used... (spinning iron, RAID) is it being pulled across a long wire? (e.g. are you using a DB link across the continent?)
  • pre-eliminate as much data from the intermediate queries as possible if it's not needed later.
  • Sargable get to know it, make it a part of your intrinsic thought process when considering queries
  • As others have said, the balance between Normalization levels: consider various data mart structure strategies, you may need more than one depending on how your data is consumed

  1. My real-world project doesn't use a database, but instead markdown files with structured metadata blocks. State is remembered in flat files to keep track if content has been changed and when content has been changed, which other files need re-rendering due to implied or declared dependencies. The goal is to have an easy to edit blog, just .md files, that get compiled to good quality HTML files that get published to the web-host and the readers experience it as constructed from purely static files. 

Collapse
 
saintpetejackboy profile image
Jack • Edited

A few I didn't see:

Don't use NULL values: a query with WHERE col IS NULL or IS NOT NULL, either syntax will be very slow for lookup - best to default the values.

Do not use LEFT JOIN t ON t.col1 = t2.col1 OR t.col2 = t2.col3

Using OR during LEFT JOIN is almost always going to move at a glacial pace.

We all know about indexing, but having efficient lookups and relationships that are indexed in a reliable way that doesn't require data duplication is only half of the battle - the other half is making sure you do not write boneheaded queries - especially when running against massive databases.

As others said, segment out the data - I use rolling timers to do "garbage" cleanup. We typically track every single micro-thing both the server and the user does - 6 months from now, it doesn't really matter what time you logged in and out of the platform.

Always scan your DB for tables that are taking up a lot of space - you would be surprised how an errant column of data you don't need can balloon your database size.

Here comes the controversial advice: I found that while using PDO in PHP, binding too many parameters can be incredibly slow. If you are absolutely sure about the source of the data and have sanitized it, there doesn't seem to be a reachable "query size limit" - if I try to bind 3,000 parameters, my sqld is going to shit the bed - but a 40,000 character "query" flys through at the speed of light. YMMV and this may be the worst advice you read on the internet for the rest of the week, but if you absolutely need to juice the last bit of performance out of a query with PDO, test the speed difference between binding versus just shoving a huge long query through. :)

Collapse
 
fpaghar profile image
Fatemeh Paghar
  • Data Archiving and Purging: Archiving historical data and regularly purging obsolete data can reduce the size of the database and improve query performance. This also helps in maintaining optimal storage utilization.

  • Caching: Implementing caching mechanisms can reduce database load by storing frequently accessed data in memory. This can be achieved through in-memory databases, caching solutions like Redis or Memcached, or application-level caching.

  • Hardware Optimization: Investing in high-performance hardware, such as solid-state drives (SSDs), sufficient memory (RAM), and powerful processors, can improve database performance. Additionally, optimizing server configuration settings and ensuring proper resource allocation can enhance overall performance.

  • Regular Maintenance: Performing routine maintenance tasks such as index rebuilding, statistics updates, and database reorganization can help optimize database performance over time. Automated maintenance plans can be set up to ensure regular execution of these tasks.

Collapse
 
ben profile image
Ben Halpern

Indexes!

Collapse
 
eepstein profile image
Ezra Epstein • Edited

Using 'where not...' for inserts to improve upsert performance. Notably under Postgres on tables with triggers this can lead to a 10x improvement vs. using the 'on conflict' clause alone for conditional inserts. Adding to the where clause also avoids generating and then discarding values from sequences. The where clause condition should aim to use an index -- since unique indices are what cause the conflict in the first place this usually means the index needed in the where clause already exists.

Collapse
 
deepakprab profile image
Deepak Prabhakara

Indexes, indexes indexes!

A few other tricks that help (as needed):-

  • Ensure connections to db are pooled
  • Query Optimization
  • Monitoring and Profiling
  • Limit your queries (pagination etc.)
Collapse
 
natescode profile image
Nathan Hedglin

Use the database for storing and querying data. DON'T store everything in the database.

I've seen business logic, HTML and actual source code in databases far too often.

Collapse
 
avicsebooks profile image
Avinash

Some comments may only be visible to logged-in visitors. Sign in to view all comments.