DEV Community

Kenneth McAndrew
Kenneth McAndrew

Posted on

Conceptual Thought: Upgrading the Media Library From File-Based to Azure Blob Storage

I'm going to start with this caveat...unlike most blogs, I'm actively soliciting feedback to this. I have not tried what I'm about to describe, but I've done some investigation and basic tests that suggest this should be possible.

The Problem

I have a client that was originally on Sitecore 6.2.1, and their media library was file-based. I'm not sure if that's because that was the norm then or their choice, but of course as time went on, the default was to store media as blobs in the database. The media library files totaled just over 100 GB, and rather than create a database bloat by migrating those files in, we decided when upgrading to 8.2.1 to leave the media library as-is, using robocopy to get the files from CM to CD. That all worked, and today the media library is up to 170 GB.

Of course, we also now live in a world of Azure PaaS and containers and 8.2.1 being out of mainstream support, so it's time for an upgrade to 10.1.1. The client decided on PaaS, which is fine, and Sitecore now has this nice Azure Blob Storage for keeping media files in.

But if you check the migration steps in Sitecore's documentation, it talks about moving content from the database storage to blob storage. Nothing about file based storage. I reached out to Sitecore support, and they've got nothing. So I either have to import the media into the database...the very thing I tried to avoid before...or I find another way.

(I mean, I'm not going to blog about something in the documentation unless it's lacking, so another way it is!)

How Azure Blob Storage Is Maintained

First, a brief rundown of how the blob storage connection is maintained with the Sitecore item. If you install the blob storage module when you spin up your PaaS environment, it will add a connection string to your blob container. On a CM you have full access, and on a CD you have list and read access (since you won't be writing from your CD).

Nothing changes on the front-end of media management, but behind the scenes when you add media, it adds a new item to the blob container...no subfolders, it appears to be purely on the main level. The files are just stored by GUIDs with no indication of the media type; note the GUIDs are not the GUID of the Sitecore item.

From an item management standpoint, this blob GUID is used in the SharedFields table where the field ID is "40E50ED9-BA07-4702-992E-A912738D32DC" (the media/blob field). The Value field stores the format blob://[blob GUID] as a reference. By comparison, if you're using database storage, the Value field would have another GUID in it, which points to the BlobId field of the Blobs table.

The other thing to note is if you use the Attach function to overwrite the media, it creates a new media item in blob storage. I think the clean up database function will help prune this, so you don't have so much dead media, but I'm not 100% sure on this at the moment. This is more for maintenance knowledge, if you run into crazy storage costs for example.

The (Possible) Solution

So now that we've found where in the database the storage information is kept, it's time to do what Sitecore doesn't like you doing...direct database writes! Also time for the caveat, there are no code samples here, but if this process validates out, I'll add those in a follow-up post. This is conceptual in search of feedback.

  • The first part of the process is one you've seen done with Sitecore Powershell Extensions...find a starting point item and get the child items recursively, while ensuring you only pick media items.
  • Since we're going from file-based storage, there's a "file path" field that will have content, so use that to get the media item.
  • Then, we'll upload the media to the blob storage, getting back the ID of the created item; if we have to provide a "name" then I'd say use the item's ID.
  • Create a record in the SharedFields table of the master database, where ItemId is the item we're on, FieldId is the ID we identified earlier, and Value is the "blob://" address with the appropriate blob storage ID.
  • Delete the record in the SharedFields table for the item which has the file path; the field ID is "2134867A-AC67-4DAC-836C-A9264FD9D6D6".

Part of the blob storage package are a pair of Powershell scripts, one to migrate from database storage to blob storage, the other to revert that process. I haven't dug into these yet, but they may provide some of the framework for the ideas above.

If you've done this process before, I'd love to hear from you on what you did/didn't do. I don't know how many people still have file-based media libraries, I'm sure it's mostly legacy at this stage, but perhaps this will help someone else when it's fleshed out. Meantime, feedback most welcome!

Discussion (0)