DEV Community

Chris Cook for AWS Community Builders

Posted on • Edited on • Originally published at zirkelc.dev

Format and Parse Amazon S3 URL

Amazon S3 URLs come in different flavors. There are those starting with s3:, http:, or https:. Then, there are the ones with s3.amazonaws.com, s3.us-east-1.amazonaws.com, or even s3-us-west-2.amazonaws.com (note the dash instead of the dot between s3 and the region code). And where do you put the bucket: is it <bucket>.s3.us-east-1.amazonaws.com/<key> or s3.us-east-1.amazonaws.com/<bucket>/<key>? And when it comes to static website hosting, of course, there is also <bucket>.s3-website-us-east-1.amazonaws.com and <bucket>.s3-website-us-east-1.amazonaws.com (again, note the dash and the dot).

There are even more when you include the dual-stack, FIPS, access point, and S3 control endpoints. Here's the full list of Amazon S3 endpoints. But for this post, I will focus on the more common URLs that I mentioned before.

Global

The global URL has the simplest format with the following structure: s3://<bucket>/<key>. This URL is also displayed by the AWS management console.

Image description

Path-style vs. Virtual-hosted-style

The difference between path-style and virtual-hosted-style URLs is how the bucket name is included in the URL. Path-style URLs have the bucket name in the pathname of the URL:

https://s3.<region>.amazonaws.com/<bucket>/<key>
Enter fullscreen mode Exit fullscreen mode

On the other hand, virtual-hosted-style URLs have the bucket name in the hostname of the URL:

https://<bucket>.<region>.s3.amazonaws.com/<key>
Enter fullscreen mode Exit fullscreen mode

Having the bucket name in the host has the advantage of using DNS to route different buckets to different IP addresses. If the bucket name is in the path, all requests have to go to one IP address even for different buckets. That is the reason path-style URLs are deprecated, and support for this style was supposed to end in 2020, but AWS changed their plan and continues to support this style for buckets created on or before September 30, 2020. There's an interesting blog post about the background: Amazon S3 Path Deprecation Plan – The Rest of the Story

Legacy vs. Regional

Some regions like US East (N. Virginia) us-east-1 have a legacy global endpoint that doesn't need a region code in the hostname:

# Legacy hostname with path-style
https://s3.amazonaws.com/<bucket>/<key>
# Legacy hostname with virtual-hosted-style
https://<bucket>.s3.amazonaws.com/<key>
Enter fullscreen mode Exit fullscreen mode

If you use this type of URL for other regions that don't support it, you might either get an HTTP 307 Temporary Redirect or, in the worst case, an HTTP 400 Bad Request error, depending on when the bucket was created.

AWS recommends always using the regional endpoints with the region code in the hostname:

# Regional hostname with path-style
https://s3.<region>.amazonaws.com/<bucket>/<key>
# Regional hostname with virtual-hosted-style
https://<bucket>.<region>.s3.amazonaws.com/<key>
Enter fullscreen mode Exit fullscreen mode

Dot-style vs. Dash-style

But also here is a caveat: some regions used to have a dash - instead of a dot . between s3 and <region>:

# Dot-style
https://s3.<region>.amazonaws.com/<bucket>/<key>
# Dash-style
https://s3-<region>.s3.amazonaws.com/<bucket>/<key>
Enter fullscreen mode Exit fullscreen mode

For example, the US West (Oregon) us-west-2 region would support the legacy dash-style URL like https://s3-us-west-2.amazonaws.com/<bucket>/<key>. Nevertheless, the standard format https://s3.us-west-2.amazonaws.com/<bucket>/<key> is also available for these outliers.

REST vs. Website

All the URL formats we have seen so far, except the global S3 URL, are called REST endpoints. They are hosted on either the s3.amazonaws.com or s3.<region>.amazonaws.com hostname, but more importantly, they support secure HTTPS connections. That means all these URLs work with https:// as the protocol.

Amazon S3 also has a website endpoint for static website hosting. The website endpoint does not support HTTPS, only HTTP. These URLs have the following formats:

# Website hostname with dot-style
http://<bucket>.s3-website.<region>.amazonaws.com/<key>
# Website hostname with dash-style
http://<bucket>.s3-website-<region>.amazonaws.com/<key>
Enter fullscreen mode Exit fullscreen mode

Again, depending on the region, there is a dash - or a dot . separating s3-website and <region>. To see which one is right for your region, you have to check the list of Amazon S3 website endpoints.

Format and Parse S3 URLs

Depending on how you interact with Amazon S3, you might use one of the previous URLs. For example, the AWS CLI for S3 expects the S3 URL in the global format s3://<bucket>/<key>. Other clients and SDKs probably use the regional REST endpoint with the bucket name either in the hostname or pathname.

If you're using the wrong format or endpoint, you might get an error like this:

com.amazonaws.services.s3.model.AmazonS3Exception:
The bucket is in this region: eu-west-1.
Please use this region to retry the request (Service: Amazon S3; Status Code: 301; Error Code: PermanentRedirect;)
Enter fullscreen mode Exit fullscreen mode

The right URL really depends on the individual client and how it is requesting from S3. To lift some of this burden, I created a tiny JavaScript library to check, format, and parse S3 URLs in the various formats I described earlier.

GitHub logo zirkelc / amazon-s3-url

Format and parse Amazon S3 URL

CI npm

Amazon S3 URL Formatter and Parser

This small and dependency-free library is designed to help you check, format and parse Amazon S3 URLs Please note that this library does only rudimentary URL validation on the structure of the URL. It currently does not validate bucket names and object keys against the rules defined in the AWS documentation.

Amazon S3 URL Formats

Amazon S3 supports a combination of different styles:

Virtual-hosted-style and Path-style

The difference between these two styles is how the bucket name is included in the URL, either as part of the hostname or as part of the pathname.

  • Virtual-hosted-style URLs have the bucket name as part of the host: <bucket>.s3.amazonaws.com/<key>
  • Path-style URLs have the bucket name as part of the path, e.g. s3.amazonaws.com/<bucket>/<key>

Warning

Path-style URLs will be discontinued in the future See Amazon S3 backward compatibility for more information.

Regional and Legacy

The difference between these two…

At the moment, the library exports only three functions: formatS3Url, parseS3Url, and isS3Url.

import { formatS3Url, parseS3Url, isS3Url, S3Object } from 'amazon-s3-url';

/* Types */
type S3UrlFormat =
  | "s3-global-path"
  | "s3-legacy-path"
  | "s3-legacy-virtual-host"
  | "https-legacy-path"
  | "https-legacy-virtual-host"
  | "s3-region-path"
  | "s3-region-virtual-host"
  | "https-region-path"
  | "https-region-virtual-host";

type S3Object = {
  bucket: string;
  key: string;
  region?: string;
};

/* Signatures */
function formatS3Url(s3Object: S3Object, format?: S3UrlFormat): string;
function parseS3Url(s3Url: string, format?: S3UrlFormat): S3Object;
function isS3Url(s3Url: string, format?: S3UrlFormat): boolean;

/* Examples */
// Global path
// Without format param (defaults to s3-global-path)
formatS3Url({ bucket: 'bucket', key: 'key' });
parseS3Url('s3://bucket/key');
isS3Url('s3://bucket/key');

// Legacy path-style
// With format param for explicit formatting and parsing
formatS3Url({ bucket: 'bucket', key: 'key' }, 'https-legacy-path');
parseS3Url('https://s3.amazonaws.com/bucket/key', 'https-legacy-path');
isS3Url('https://s3.amazonaws.com/bucket/key', 'https-legacy-path');

// Regional virtual-hosted-style
// With region property for regional endpoints
formatS3Url({ region: 'us-west-1', bucket: 'bucket', key: 'key' }, 'https-region-virtual-host');
parseS3Url('https://bucket.s3.us-west-1.amazonaws.com/key', 'https-region-virtual-host');
isS3Url('https://bucket.s3.us-west-1.amazonaws.com/key', 'https-region-virtual-host');
Enter fullscreen mode Exit fullscreen mode

Limitations

The library does only rudimentary URL validation on the structure of the URL, but it doesn't validate the bucket name, object keys, and regions. Also, it doesn't support the dual-stack, FIPS, access point, control, and website endpoints yet. But I'm happy to welcome any external contribution.

Top comments (1)

Collapse
 
niklampe profile image
Nik • Edited

Great!
You should consider aggregating all your great posts into a Cookbook 😀