Justin

Posted on May 24, 2020

How to build a paginated filesystem in Laravel

#laravel #s3 #aws

I'm building a graphql endpoint in Laravel and migrating a legacy php application to use it. One of the goals we want to hit is, the legacy system uses the filesystem for everything. So we're naturally moving that all to S3 so the filesystem can scale across as many lambda instances or docker images as we want.

So one of the queries in this new graphql server is going to be the ability to query s3 for files. Similar to how the legacy system worked, it would use php's glob function to see what files were available and put that in a data table.

What's that look like for Laravel?

Pretty nice I think. It's a WIP, I'd love to build it out in a package for Laravel but this is what I've got so far.

First off, I want to store additional data about the file and keep that in S3 and then I'll pull this information from S3 every time in the event it ever changes from any other API access, that metadata will live in S3 and be the single source of truth.

So adding/updating the metadata looks like this:

\Storage::getDriver()->getAdapter()->getClient()->copyObject([
  'Key' => $file,
  'Bucket' => config('filesystems.disks.s3.bucket'),
  'CopySource' => config('filesystems.disks.s3.bucket').'/'.$file,
  'MetadataDirective' => 'REPLACE',
  'Metadata' => [
      'name' => '<name>',
      'created' => time(),
      'size' => '10MB',
      'imported' => "false",
      'from' => '<string>',
      'preview' => '<url>',
      'version' => '<version>',
  ]
]);

This hopefully is pretty straight forward but basically this adds or updates metadata to an already existing file. How do you get the file in there in the first place? We're using Vapor on this project, and lambda w/ GraphQL is a little bit different but it works very similar to how vapor intended.

Ok so with the file and the meta data, how do we get a list of files? Laravel has this lovely helper:

\Storage::files('<folder?>');

Which returns

[
 "<folder>/47b100e1-b8ea-4513-934e-313fa9eb7eae",
 "<folder>/47b100e1-b8ea-4513-934e-313fa9eb7eae",
]

Not really all the information we want.

If we wanted metadata, we could just do:

return collect(\Storage::files('<folder?>'))->map(function($file){
    return \Storage::getDriver()->getMetadata($file);
}

And we'd get a boat load of information back like:

Illuminate\Support\Collection {#1406
     all: [
       [
         "path" => "<folder>/47b100e1-b8ea-4513-934e-313fa9eb7eae",
         "dirname" => "<folder>",
         "basename" => "47b100e1-b8ea-4513-934e-313fa9eb7eae",
         "filename" => "47b100e1-b8ea-4513-934e-313fa9eb7eae",
         "timestamp" => 1590329909,
         "size" => 3688327,
         "mimetype" => "binary/octet-stream",
         "metadata" => [
           "version" => "2020.19.009",
           "created" => "1590329908",
           "name" => "<string>",
           "from" => "<string>",
           "size" => "<string>",
           "preview" => "<string>",
           "imported" => "<string>",
         ],
         "storageclass" => "",
         "etag" => ""a8e674bec6fca635c5ff8819ba0a0a96"",
         "versionid" => "",
         "type" => "file",
       ],
     ],
   }

Now, if we paired that with a simple paginator already, we'd be pretty much there.

$files = collect(\Storage::files('<folder?>'))->map(function($file){
    return \Storage::getDriver()->getMetadata($file);
}

return LengthAwarePaginator($files->all(), $files->count(), 10)

That would work great to the first 1,000 objects. You might not get results past that, depends on how Laravel implemented the \Storage::files() cursor. But we don't really want that anyway because that also means we'd be doing 1 HTTP GET for the first 1k result, then 1K GET requests for the metadata, then page 2 would be 1 + 1k.

So what we really want is to be able to chunk them up smaller, let's say 20, and only do 21, if we get to page 2. How do we do that you ask? More reliance on the actual S3 SDK by calling AWS's S3 listObjects endpoints directly.

Like so:

\Storage::getDriver()->getAdapter()->getClient()->listObjectsV2([
  'Bucket'  => $bucket,
  'MaxKeys' => 20,
  'StartAfter' => '<folder>/<key>',
  'Prefix'  => '<folder>/',
])

So you can pick up pagination w/ the StartAfter and specify how many you want with MaxKeys.

You could build something with this that would almost create a Eloquent like Model that would let you do things like:

return S3File::limit(20)
          ->where('prefix','<folder/>')
          ->paginate();

Just need to write the S3 class to extend Eloquent model and hook up pagination to match with S3's listObjectsV2.

For me, might not work because I want the extra metadata, that means 21 API calls on a single index. Thinking now, it might be better to move this action to a job which 'caches' the results to the database. So from a user perspective they're not stuck waiting for 21 API calls to complete before they receive that they were looking for.

Love to hear any feedback, anybody else gone down this same rabbit hole?

Top comments (2)

Phread • May 29 '20

Curious.. What if you were to store/update/delete the metadata in a database table and query the table to get the results you are wanting? Your add/update (even a delete if this is allowed) would be updating the table, instead of the metadata. This would also remove the need for a job that would cache the results.

Thoughts ?

Justin • Jun 3 '20

Yes it's a great idea, thank you.

It's ultimately what I also went with. I still store metadata on S3 but I do that via an observer on the S3 model. Can interact with the model as normal and then it eventually pushes those changes to S3.