loading...

Preventing object replacement in Amazon S3

goceb profile image Goce ・4 min read

S3 bucket permissions do not distinguish between creating new object and replacing / overwriting an existing object and there are cases where this behavior is conflicting some business requirements for the system.

The business requirements

  • Let's imagine we have a document management service where users can upload documents;
  • Each document is uploaded to S3 with a unique key (file name) created by the application;
  • When an already uploaded document is replaced in the app, a new object with a new key is uploaded to the bucket and the old key is scheduled for deletion;
  • Uploaded object are available publicly via URL without the specific version passed as a parameter;
  • Once a object is uploaded it should not be overwritten, just deleted;
  • Object keys are generated in a way that prevents collisions so any attempts to overwrite an existing file is considered an exception and should be reported;

Object locking can not be applied since it prevents the deletion of the object. We could use Governance mode and a separate user with s3:BypassGovernanceRetention permission to preform deletes, but that is another story.

Flow

The general idea is for the bucket to dispatch a "ObjectCreated:Put" event notification on each upload that is piped to a lambda function that listens to all "ObjectCreated" events from the bucket. Once the function is triggered, it fetches all the object versions from the S3 API and if there is more than one version, removes all but the oldest.
In order for this to work, versioning must be enabled on the bucket so all changes to the file are preserved. See how to enable it here;

First time upload

An object is uploaded with a key "123456789.txt", the event is sent to the lambda function and since the uploaded version is the only one, the function does nothing.

Second, third, fourth... upload

Overwriting the "123456789.txt" object once it exists would create the same event and send it to the lambda function. Once executed, the number of file versions that will be fetched from the API will be grater than one. This will trigger the cleanup part of the function where all versions of the object except the oldest (the first upload) will be removed.

Creating the Lambda function

For the lambda function we are going to use Bref which is marketed as "Everything you need to easily deploy and run serverless PHP applications". Bref relies on the Serverless framework for the deployment part. Install instructions are available here. If you have AWS CLI installed and configured, you can skip the Serverless config credentials part;

Lets, start with the serverless.yml file, which defines the lambda function along with the permissions it needs to function properly. Replace the example-bucket-name with the name of your bucket in the custom section and the region under the provider section. If you need more info here is a list of all available properties for AWS.

The serverless.yml file:

service: app

custom:
  bucket: example-bucket-name

provider:
  name: aws
  region: eu-west-1
  runtime: provided
  memorySize: 128
  iamRoleStatements:
    - Effect: Allow
      Action:
        - s3:ListBucketVersions
        - s3:GetObjectVersion
        - s3:DeleteObjectVersion
      Resource:
      - "arn:aws:s3:::${self:custom.bucket}/*"
      - "arn:aws:s3:::${self:custom.bucket}"

plugins:
  - ./vendor/bref/bref

functions:
  function:
    handler: index.php
    environment:
      TARGET_BUCKET: ${self:custom.bucket}
    events:
      - s3:
          bucket: ${self:custom.bucket}
          event: s3:ObjectCreated:*
          existing: true
    timeout: 5
    layers:
      - ${bref:layer.php-73}

# Exclude files from deployment
package:
  exclude:
    - 'tests/**'

Next comes the actual PHP function that will handle the events.
The index.php file:

<?php
declare(strict_types=1);

use Aws\S3\S3Client;

require __DIR__ . '/vendor/autoload.php';

return function ($event) {
    $bucket = getenv('TARGET_BUCKET');
    $targetObjectKey = $event['Records']['0']['s3']['object']['key'];

    $client = new S3Client(
        [
            'region'  => getenv('AWS_REGION'),
            'version' => 'latest',
        ]
    );

    $objectVersions = [];

    $results = $client->getPaginator(
        'ListObjectVersions',
        [
            'Bucket' => $bucket,
            'Prefix' => $targetObjectKey
        ]
    );

    foreach ($results as $result) {
        foreach ($result['Versions'] as $object) {
            if (strcmp($object['Key'], $targetObjectKey) === 0) {
                /** @var Aws\Api\DateTimeResult $lm */
                $lm = $object['LastModified'];
                $objectVersions[] = [
                    'Key'       => $object['Key'],
                    'VersionId' => $object['VersionId'],
                    'Timestamp' => $lm->getTimestamp(),
                ];
            }
        }
    }

    /**
     * If there is more than one verion of the object something is not right
     */
    $objectVersionsCount = count($objectVersions);

    if ($objectVersionsCount > 1) {
        usort(
            $objectVersions,
            function ($a, $b) {
                return $a['Timestamp'] <=> $b['Timestamp'];
            }
        );

        /**
         * Remove the oldest version (original upload) from the array
         */
        array_shift($objectVersions);

        $deleteQueue = [];
        foreach ($objectVersions as $version) {
            $deleteQueue[] = [
                'Key'       => $targetObjectKey,
                'VersionId' => $version['VersionId']
            ];
        }

        /*
         * Delete the object versions.
         */
        $result = $client->deleteObjects(
            [
                'Bucket' => $bucket,
                'Delete' => [
                    'Objects' => $deleteQueue
                ],
            ]
        );

        if ($result->hasKey('Errors')) {
            throw new Exception('Failed to delete old versions!');
        }
    }
};

To deploy the lambda function, open your terminal and type serverless deploy.

Enable events on the bucket

The last thing that remains is to enable the events on the bucket and link them with the lambda function. Go to the "Properties" tab in the bucket and click the "Events" version. Click "Add notification". Give the event a name, select "PUT" / "All object create events" and in the "Send to" dropdown select "Lambda" and enter the Lambda function ARN.
That's it. Time to test it with real uploads. If everything works as it should, no matter how many times you try to upload / replace / overwrite the same key the lambda function will remove all but the first version of the file.

Conclusion

While the approach shown above works, it is far from perfect and you might want to tweak it to your specific use case.
One improvement that can be made is instead of sending the events directly from the bucket to the lambda function, to pipe them through a SQS queue that is configured as the lambda function trigger. Once the lambda function processes the event it can send a message via a SNS topic to all interested parties. An optional SQS dead letter queue can be attached for better monitoring of the system.

Discussion

pic
Editor guide