I was recently tasked with performing an architectural review for one of our Azure PaaS clients using Sitecore 9.2, and part of this was to review the backups being done on Azure. The Azure SQL backups have their own world, but backing up the app services requires a storage account. Fine, no problem, let me go through the app services...CM is good to go...but CD is failing its nightly backups!
So after a brief panic, I dug into it a bit more. There's fortunately a helpful error message (for once!) saying that the backup had a 10 GB limit, and I was trying to back up 10.3 GB. Now we're not backing up the databases with the app service, so there's theoretically no way we should be getting close to that limit. Until I dug through the file system in Kudu.
The first thing I thought of was logs...we know Sitecore logs a ton, and maybe they were getting overwhelmed. So I decided to walk through the App_Data folder, and I found the answer faster than the logs folder. In the DeviceDetection folder, I found a heap of the device databases, each one many megabytes of data...and all but one of them utterly useless. It seems Sitecore hasn't been cleaning up old copies of the device detection database!
So knowing this now, I confirmed the older items weren't needed (they weren't). Then it was about remediation, and there's a few things we can do here:
Update to 9.3: This issue was apparently reported as a bug and fixed in Sitecore 9.3: https://kb.sitecore.net/articles/872808
Add a cleanup agent: Recommended by this blog entry, you can add a patch file to clean up device detection databases over 30 days old. Sitecore apparently won't let you totally empty the folder, so you should always have one file that's available.
<configuration> <sitecore> <scheduling> <agent type="Sitecore.Tasks.CleanupAgent"> <files> <remove folder="/App_Data/DeviceDetection" pattern="DeviceDetectionDB-*.db" maxAge="30.00:00:00" recursive="false" /> </files> </agent> </scheduling> </sitecore> </configuration>
Add a backup filter: You can add a file called _backup.filter where you can put exclusion paths that should not be back up by the app service. There's more information on Microsoft's site. As far as the paths to use, I chose the following based on reviewing all of the app services and the backup messages provided in the logs. Also, a lot of the folders under App_Data don't have data that's critical to the app service working, or that data can be quickly recovered (such as Sitecore getting a fresh device detection database).
\LogFiles\ \site\wwwroot\App_Data\debug \site\wwwroot\App_Data\DeviceDetection \site\wwwroot\App_Data\diagnostics \site\wwwroot\App_Data\logs \site\wwwroot\App_Data\mediaIndexing \site\wwwroot\App_Data\packages \site\wwwroot\App_Data\serialization \site\wwwroot\App_Data\Submit Queue \site\wwwroot\App_Data\tools \site\wwwroot\App_Data\viewstate \site\wwwroot\logs