If you do web development you will, at some point, encounter 3 particular term: URI, URL and URN (this is not so familiar but you may have encountered ARNs in AWS).
You may also have seen URI and URL being used interchangeably, but it's important to note they are not the same thing even if they are used for very similar purposes: finding things and finding things on the internet.
Let's break down what do those acronyms mean:
- URI stands for Uniform Resource Indicator
- URL stands for Uniform Resource Locator
- URN stands for Uniform Resource Name
URLs and URNs are specific classifications of URIs.
It happens that URIs are very different between each other (from rfc3986#section-1.1.2):
ftp://ftp.is.co.za/rfc/rfc1808.txt http://www.ietf.org/rfc/rfc2396.txt ldap://[2001:db8::7]/c=GB?objectClass?one mailto:John.Doe@example.com news:comp.infosystems.www.servers.unix tel:+1-816-555-1212 telnet://192.0.2.16:80/ urn:oasis:names:specification:docbook:dtd:xml:4.1.2
Follow me in a deep dive into URIs, URLs and URNs and some good old RFC digging. Put on your safety 🥽, grab your ⛏ and let's go!!
A Uniform Resource Identifier is a generic way to uniquely identify any resource.
The complete definition is in RFC 3986, where you can hunt for all the details.
It takes the form of a string with this syntax:
URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]
There are 5 components:
scheme(required), arbitrary but there are popular ones like
authority(optional), for user information and top level namespace (usually a domain or IP address) with the syntax
authority = [ userinfo "@" ] host [ ":" port ]
path(required but can be empty - you know, parsers 🤷), a hierarchical structure separated by
query(optional), it starts with
?and can contain
fragment(optional), it starts with
#until the end of the URI
This is quite convoluted and the RFC is incredibly detailed. The Wikipedia page for URI helps!
It's important to understand the URI as it's the foundation on which URL and URN are based.
The Uniform Resource Locator is a string representation to for a resource available via the Internet.
It has it's own RFC, RFC 1738, where we find all the familiar names and strings we see as Web developers.
It defines some specific schemas we all know and love:
ftp File Transfer protocol http Hypertext Transfer Protocol gopher The Gopher protocol mailto Electronic mail address news USENET news nntp USENET news using NNTP access telnet Reference to interactive sessions wais Wide Area Information Servers file Host-specific file names prospero Prospero Directory Service
(wait, what is
wais??? Think I'm too young for that!)
and it defines the usual "Internet" scheme syntax for all URLs schemes that involve usage of an IP-based protocol:
I'm young enough that I basically only used
mailto! (And well.. watching Star Wars over
telnet 😆) Did you use some of the others? Let me know in the comment, I want to read your story!!
Quite simply, it's a URI with the
urn scheme. URNs are location indipendent and persistent identifiers.
This means there is only 1 unique URN for a given resource in a given namespace forever (or until that resource doesn't exist any more).
URNs are defined by RFC 8141.
Their properties of being location indipendent and persistent makes them useful for some very interesting use cases, especially.
Their syntax definition (rfc8141#section-2) is quite more complex, here a simplified version:
URN = "urn" ":" NID ":" NSS [ "?+" r-component ] [ "?=" q-component ] [ "#" f-component ]
This is more easily akin to a URI with multiple components:
NID(required), the namespace identifier
NSS(required), the namespace specific string
r-component(optional), query parameters to pass to URL resolution services, note that it's used is discouraged: "Thus, r-components SHOULD NOT be used for URNs before their semantics have been standardized."
q-component(optional), query parameters for the named resource or the service supplying the named resource
f-component(optional), a fragment representing the location or region for the named resource, ignored during URN equivalence operations.
Amazon Resource Names (ARNs) uniquely identify AWS resources. We require an ARN when you need to specify a resource unambiguously across all of AWS, such as in IAM policies, Amazon Relational Database Service (Amazon RDS) tags, and API calls.
Sounds familiar? The format too is very URN-like (there are different formats, look at the docs!):
From the look of it it does not seem to be a RFC-compliant URN, but it's extremely similar.
Google Cloud Platform relies on URIs to identify resources on the platform.
(Resources names](https://cloud.google.com/apis/design/resource_names) are schema-less URIs similar to:
logging.googleapis.com is the
path the resource. Being the
path hierarchical is possible to represent GCP resource structure this way (project -> collection -> resource).
Another at-scale example is LinkedIn:
URNs are used to represent foreign associations to an entity (persons, organizations, and so on) in an API. A URN is a string-based identifier with the format:
Simple relational database design generally rely on (autoincrementing)
int for rows IDs in tables. This system is effective and works in a single database scenario.
When scaling to multiple DBs or distributed applications (es microservices) using integers is not enough anymore. Some common problems are:
- conflicting autoincrementing numbers: being auto incremental they are exposed to possible race conditions when creating records
- too generic: the system (or it's operators) is not able to know only by looking at the ID what kind of resource that ID refers to. If you think is not that important, Atlassian recently blew up 883 customer's websites due to a similar confusion: a script included IDs for websites and not apps in the Atlassian backend ecosystem. Those IDs were then used for deletion, but the thing deleted wasn't, as expected, the customer app instance but their entire website.
Do you have any other examples of URNs being used in systems? I'm curious to know about them so please let me know in the comments!