DEV Community

MartinJ
MartinJ

Posted on • Edited on

'Flattening' your code with Google Cloud Storage Metadata

Last reviewed : Feb 2023

The word "flattening" is used here in the sense of "simplifying" and "clarifying" rather than "squashing and dumping in the can bank"! I've recently been moving some old systems onto the Google Cloud platform and made some unexpected discoveries. Here's an interesting bit of "enlightenment" that I'd like to share.

Management of a file archive - pdf documents, graphic image files and suchlike - is a common IT task. The files in an archive usually come with a supporting cast of associated data - author, title and so on. This needs to be managed in parallel with the files themselves.

In the past I've handled this by creating a database table with a row for every file supplying the 'associated data' and the web address of the file itself. I did this because then simple ISP servers I used to use offered no alternative.

But I continued to do this even after I'd moved my systems into the Google Cloud. I was aware that the GCP allows you to tag files with "metadata" but through a combination of idleness, pressure of work and fear (ooh metadata - what's that? - sounds scary - move on quickly), I did nothing about it.

But then something happened that shook me rigid and forced me to get a grip. It wasn't a pleasant experience and cost me a day or two, but the result has been a codebase that is simpler, less error-prone and much more efficient than my previous efforts.

I'd created a little Archive maintenance webapp that would upload a file to the Cloud and create an entry in a Firestore collection to record its title, the name of its owner and its cloud address. Here's a pic of the user-interface
Maintenance webapp
There aren't many files in my collections, so an individual file is selected by just picking it from a table displaying the whole lot. The i/f allows me to update both the file content and its associated data. It also allows me to view the file at the cloud address and to delete the whole thing if necessary. Neat!

Everything seemed fine until I started to use the webapp to populate the archive. Then for some reason I deleted a file but happened to click on a browser tab that I'd previously used to display its content. The content was still there! Then I noticed that when I overwrote a cloud file with a new local source, the original content was unchanged in the browser. Whoa - not good!

To cut a long story short, having eliminated local cache as the cause of the confusion (such a thorn in the side of every system developer), I finally discovered that, along with all its other talents, the Google Cloud Platform also acts as a Content Delivery Network. In brief, for publicly readable objects, it maintains its own cache. This is enabled by default and, unless you tell the Cloud otherwise, your files will all be served from the "cloud cache" cache for 60 minutes before going back to the source. Ahha!

For serious applications I'm sure there are huge benefits in this arrangement but, for my application, cache of any variety was just going to create a world of pain. My files are small and accessed only rarely.

Fortunately, it was pretty easy to turn Cloud Cache off. But the way you did this was through metadata, so it was time to man up and learn some new tricks.

As it turned out, manning up wasn't the least bit difficult. Metadata is just a fancy word for "data about data". Effectively, in this case, it allows you to stick yellow "Post-it" notes onto your files. Each file comes with some standard stickers just waiting for you to supply some content. One of these is the "Cache-Control" item which allows you to manage the aforementioned "Content Management" behaviour. You can set this manually from the storage management page for your project in the Cloud Console - you just select a file and use the "options" button to "edit metadata" - but obviously, in practice, you'll want to do this programmatically. As you'll see in a moment, this is perfectly straightforward.

Once I'd got this straightened out, my webapp performed perfectly, but then I started to think about a feature of Cloud file metadata that suddenly seemed hugely interesting. As well as the aforementioned standard Post-it notes, Google offers a "Custom Metadata" Post-it. This allows you to add a note with an object containing whatever properties you choose to put inside it.

So suddenly I was thinking - why have I got a Firestore collection to store information about my files when I can store this directly inside the files themselves?

The code I'd written to deliver this historic arrangement used to give me a headache every time I looked at it. Every transaction involved accessing two separate stores of information and these, in turn, needed to be maintained in strict lockstep. So, I decided to ditch this and try again using metadata in the files to replace rows in the database table.

Here's some code to upload a local file into the Cloud and tag it with a "title" custom metadata field. Upstream of this you need to imagine that handleUpdate() is triggered by a "Save" button somewhere and that there's also an <input type='file'> collecting the sourceFileObject linked to a stateObject. A "configuration" module is also supplying a storage variable for my Cloud Storage bucket.

import { ref, uploadBytes, getMetadata } from 'firebase/storage';

async function handleUpload() {

        // start by acquiring a uuid to identify a new cloud file and then upload 
        // into it with uploadBytes
        const uuid = crypto.randomUUID(); // just a bit of "glam" - date/time would do

        const sourceFileObject = stateObject.sourceFileObject;
        const destinationFilename = 'my_papers_folder/' + uuid + '.pdf';
        const storageRef = ref(storage, destinationFilename);
        try {
            const result = await uploadBytes(storageRef, sourceFileObject)

            // Create a metadata object for the info you want to add
            const paperMetadata = {
                cacheControl: 'private,max-age=0,no-store',
                customMetadata: {
                    title: popupState.title,
                    submittingUser: user
                }
            };

            // and now add it to the file
            const newMetadata = await updateMetadata(storageRef, paperMetadata);

            //Yay - all done

        } catch (error) {
           alert('Sorry - something is stopping the upload of ' + destinationFilename + ' : error is ' + error);
        };

    }
Enter fullscreen mode Exit fullscreen mode

You'll notice that the code sets the cacheControl field as well as the custom metadata items. Values for this are as follows:

  • private - the object can be cached in a requester's local cache.
  • max-age=0 - zeros the length of time an object may be cached before it's considered stale.
  • no-store - the object can't be cached

I probably don't actually need both "no-store" and "max-age=0", but safe is better than sorry.

The file's content and its metadata are returned by similar natural and straightforward code.

Are there any disadvantages to this arrangement? I certainly miss the indexing and sorting benefits that come automatically with a Firestore collection. And I also wonder how efficiently Cloud folder indexes would perform on a large data set. But just now I'm more concerned about the simplicity and clarity of my code, and on this front I'm more than pleased with the new arrangement.

See Google's file-metadata document for further background on metadata arrangements.

Postscript

Of course, for many applications, disabling local caching as described above will be highly undesirable. Recently I had to revisit cloud storage issues in order to provide a home for the background files that illuminate the Card components in a standard website "Cards" layout. Here I very much do want local caching for the website, but I very much don't want either this or Cloud caching to confuse operation of the little maintenance application that creates and edits background file arrangements.

Having thought about this for a while, I decided that the only way to do this was to return to the original idea of using a Firestore collection to describe my cards. I now include the name of the card background file as a property of the card document and I create Cloud storage files for the background with local caching enabled.

So that's the website sorted, but how to avoid caching confusion in the maintenance application?

In the past I might have added a version number or similar in file references to "turn off" caching, but I now realise there's a simpler way. Every time I create a new file, I now simply create a new file using the crypto.randomUUID() trick to get myself a fresh, unique filename. I then record this in the Card collection record.

This still leaves the maintenance confusion that will be caused by Cloud caching, but I can now fix this by using what I've learnt about metadata to save my background files with a cacheControl: 'no-store' property.

Top comments (0)