MySQL is the database of the web and is used by all kinds of companies from small to large enterprises. Out of the box with the standard configuration MySQL can give you quite a bit of mileage, however as your tables grow into the tens of millions - you will eventually start noticing performance bottlenecks.
The good news is that MySQL can easily scale to handle terabytes of data without breaking a sweat and the hardware costs are not that bad either.
In this guide - I am going to cover five tips to help you scale, from a small DB with up to a few 100 gigabytes of data, to a large DB that can grow into terabytes of data.
The first thing you want to do is make sure your hardware is of good quality, a cheap VPS server is not going to cut it - you are going to need a dedicated server at the very least with decent disk IO speeds.
I would run MySQL on a Linux box as this will naturally give you better performance. In addition, I would use XFS as my filesystem - this will come in handy when taking backups.
By far, the Percona Toolkit has the best suite of tools I have ever used with MySQL. This is an open-source toolkit that basically allows you to perform various essential operations such as backing up, altering tables, archiving data and more without needing to block tables or stop your master DB.
This is a no-brainer. If you have a large DB - you should be splitting up your read and writes. MySQL supports Master-Master and Master-Slave replication.
To set these up is fairly trivial and they generally work well autonomously, without the need for much maintenance. You just need to have a bit of monitoring in place, to keep an eye on the replication. Sometimes bad queries can interrupt the replication process and cause your slave to not sync at all or lag behind.
A word of caution with binlogs; if the binlogs fill up too quickly - it can bring your master to a grinding halt. I would put these on a separate partition from your main MySQL data and ensure the disk IO performance is decent.
Furthermore, I suggest having some kind of backup script, that syncs the old binlogs to S3 or some storage service and then purges the logs after a day or two.
If you have log entries or time series data being stored in MySQL - this is generally a bad idea. While MySQL can handle millions of rows of data just fine, it adds unnecessary overhead to your read/write performance.
Rather, I suggest that you store this kind of data using a NoSQL store, redis or MongoDB is a good option here.
For content that doesn't update often or even if you can cache for 15 - 20 minutes, it makes a huge difference. Cache your data into Redis or some in-memory db store and then just serve your content from the cache.
This lessens the burden on your MySQL DB and will improve your website's performance overall.
Tweak your MySQL configuration, each setup is different, so I'm not going to go too in-depth here but just a few config types to think about:
1) Buffer settings. You can increase these to offload data to the RAM so there's less pressure on your disks.
2) Max connections. Do not set these too low, provision allowances for spikes in traffic.
3) innodb_ settings. There are a ton of these settings like innodb_io_capacity, innodb_buffer_pool_size, etc...
I would tweak one at a time and measure the difference in performance. You can use a load-testing script or a tool like "Apache JMeter" to measure the performance as you tweak settings.
Finally, I would suggest using "EXPLAIN" on all of your queries to see the rows scanned vs the rows returned. Thereafter optimize your queries accordingly, or add indexes to reduce the number of rows scanned.
There is so much more we can do to fine-tune MySQL for better performance, however by just performing these 5 or so tweaks to your DB servers, you will get a lot of mileage out of MySQL and at the same time keep your hardware costs within budget.