This post was originally published on Hackmamba.
Most of us are familiar with Uniform Resource Locators (URLs) - the simple and effective way to access our favorite websites. But how much do we know about URLs and how they function? The infrastructure of the internet is intricate and constantly changing. Thus, understanding a URL's anatomy can be daunting for those unfamiliar with the web.
URLs are more than merely typed addresses. It comprises several important components, each determining its location and accessibility.
Every URL component, from the protocol to the domain name, serves a specific purpose. Thus, it's essential to understand the many parts of a URL to unravel the mysteries of the internet. This article will give us an in-depth explanation of the anatomy of a URL and its components.
URLs are also called web addresses or document locators. It is the address we use to access and discover a particular page or resource online. In essence, URLs are the web's version of street addresses. It lays out a path for us to take when searching for web pages, images, videos, and other online resources. It consists of many components that give a computer the data it needs to access its desired resource. Below are examples of URLs:
The URLs above represent various web addresses, from articles to social media posts. We can access the URLs by entering them into our browser's address bar or clicking the link.
The URL has various key components, each serving a specific purpose. The following are examples of URL components:
The scheme is also known as the protocol component of the URL. It is the starting point of a URL and instructs the web browser on the protocol to access a resource. Thus, it specifies the rules the web browser and server must adhere to when exchanging data. These rules include error handling procedures, request types, and data transmission protocol. Here is an example of a scheme component:
In the URL above, the scheme is the Hypertext Transfer Protocol Secure (HTTPS). It is always followed by a colon (:) and two forward slashes (//). The colon and double forward slashes act as a visible separator. It shows the start of the domain or path component and then the change from the scheme to the resource location. It allows web browsers and servers to interpret and process resource locators accurately.
There are various common protocols, each catering to different forms of online communication. Here are a few examples of typical schemes:
HTTP is a request-response protocol enabling efficient communication between clients and servers. The client (web browser) sends an HTTP request to the server to start communication. The web server receives the request, processes it, and produces an HTTP response. In HTTP responses, there are status codes such as 200 for successful completion or 404 for not found. The server then returns the HTTP response, which our browser interprets and displays.
The main aim of HTTP is to make it easier to send hypertext, like text and images, across the internet. By clicking on hyperlinks, text, or image links, users can navigate from one web page to another. Also, HTTP is a stateless protocol; it does not keep track of past requests or sessions. Modern websites provide personalized user experiences and simulate statefulness through extra mechanisms. These mechanisms include using cookies, sessions, or tokens to maintain user-related information.
It's important to note that HTTP is a plain text protocol. Thus, the data transferred between the web browser and the web server is not encrypted. Websites that collect user data, like passwords and credit card details, are thus at risk.
HTTPS stands for Hypertext Transfer Protocol Secure. The HTTPS protocol is a more secure version of HTTP. In HTTPS-enabled websites, the web browser and server share an encryption key. The web server decrypts the data using the shared encryption key. The web browser then uses the same shared encryption key to encrypt all data sent to the web server. Thus, the data will be hard to decode without the encryption key, even if an attacker intercepts it. Security-conscious websites often use HTTPS for sensitive data entry, such as passwords.
When a website uses HTTPS, its address bar usually displays a padlock icon and a "https" prefix in the URL. The padlock icon indicates that our data, like credit card numbers, is safe when sent to the website. Also, search engines like Google consider HTTPS when ranking sites. Google algorithm evaluates many factors to determine the most relevant content to display. One of these factors is security and trustworthiness. Thus, websites that focus on safety by using HTTPS will rank higher in search engines.
FTP stands for File Transfer Protocol. It's a protocol that moves files from a web browser to a server across a TCP /IP (Transmission Control Protocol/Internet Protocol)-based network like the internet. We use the FTP to download files like images from a server to a browser or to send files from a browser to a server.
FTP operates on a client-server architecture. It's a typical setup in networking where the client (web browser) asks the server for resources. The server, in turn, responds to these commands made by the browser and sends the requested files.
FTP has two modes of file transfer: ASCII and binary mode. ASCII mode transfers text-based files, such as documents, HTML, and plain text. But, binary mode transfers non-text files like images, audio, and other binary data. Using the correct mode for each file type ensures the integrity and readability of files. Below is an example of a URL:
It is important to note that FTP transfers files in plain text and provides no encryption by default. But, safe transfer protocols like SFTP (Secure FTP) aid in addressing this issue.
iv. mailto: URL scheme
“Mailto:” is a prefix used in URLs, specifically when referring to email addresses. "Mailto:" stands for "Mail To." It creates hyperlinks that will start the process of composing an email when clicked. It gives users an easy way to send emails without opening their email app and typing in the person's address.
<a href="mailto:firstname.lastname@example.org">Send an email to Cess</a>
Assume the above hyperlink is on a web page that says, "Send an email to Cess. " When users click the link, their web browser reads the "mailto:" scheme in the hyperlink. In response, the browser launches its default mail client, like Outlook or the Gmail app. The recipient's email address, "email@example.com," will be pre-filled in their email's "To" field. The user can then fill out the email with a subject, main message, and other vital details before sending it.
A crucial part of any URL is the domain name, sometimes called the "hostname." It is a human-readable label used to identify and locate resources on the internet. In essence, a domain name is the name of the website or server that is hosting the resource. It's like a real-life street address that points to a specific web page on the internet.
In the example above, the domain name is "www.hackmamba.io." When a user enters the domain name into their web browser's address bar, it queries a Domain Name System (DNS) server to get its IP address. The DNS helps translate domain names into machine-readable IP addresses. As soon as the DNS server receives the request, it finds the requested webpage and sends it to the browser. The browser then interprets and displays the website content to the user.
NOTE: To secure and use a domain name, the user must register with a domain name registrar. These registrars mediate between domain name owners and the domain name system.
Domain names are vital in the search engine visibility of websites. A domain name rich in keywords will improve SEO on the search engine results page (SERP). The SERP displays domain names that closely match a user's search query in the search box.
Also, a domain name that reflects a site's content allows a search engine crawler to understand it. In turn, it leads to accurate indexing and improved website visibility. Well-chosen domain names create an online presence and foster trust with potential customers.
Domain names are usually made up of letters and numbers with a dot (.) separating them from the rest of the URL. The dot distinguishes between the different domain segments within the URL. Each part, separated by dots, represents levels within the domain name hierarchy. Below are the main components of a domain name:
Subdomain: Subdomains are optional prefixes to the main domain. They're located to the left of the main domain and divided by a dot. In our example above, "www" is the subdomain. It assists with classifying and organizing content under a primary domain name.
Second-Level Domain (SLD): This level domain comes after the subdomain and is the core part. In our example above, "hackmamba" is the second-level domain. It is the part of the domain name that users often choose to represent their identity, brand, or purpose.
Top-Level Domain (TLD): The TLD comes after the final dot and is the last part of the domain name. In the hierarchical domain name system (DNS) structure, it is the rightmost part of a domain name. It conveys information about the website's purpose or geographical location. A typical example is the ".com" TLD for commercial websites or ".edu" for educational websites. In our example above, ".io" is the top-level domain.
The path is a URL component that offers an organized approach to browsing a website's content. It aids in locating a specific online page, resource type, image, or directory on a web server. URL path components act as a navigational guide, like how computer system file paths do. It comes after the domain name and consists of folder names separated by forward slashes ("/"). The forward slash serves as a delimiter, marking where the path starts.
In the example above, "/blog/2023/08/libraries-vs-frameworks-which-is-right-for-your-next-web-project/" is the path component. It consists of many path components: "blog," "2023," "08," and " libraries-vs-frameworks-which-is-right-for-your-next-web-project." These path components specify the website structure's navigational path. They provide a clear route to the resource: a blog post from 08/2023 on libraries and frameworks.
Query parameters are optional URL parameters that allow a web browser to pass data to a server. They are usually in the form of "key=value" combos and come after a question mark ("?") at the end of a URL. The "Key=value" pairs represent the parameter's name and data related to it. Ampersands "&" separate the pairs.
The query string enhances the functionality and interactivity of web applications and websites. They often use them to filter content, customize views, or conduct searches. For example, Google's search engine uses the 'q' parameter to decide what to show the user in search results. Query strings can also include campaign tracking parameters used in marketing efforts. They provide extra parameters that help track user visits for analytics purposes.
In the URL above, the query string is everything that comes after the question mark "?". It includes the following key-value pairs:
- "q=what+is+a+url": This is a query parameter that carries the search query "What is a URL"
- "sourceid=chrome": This parameter identifies the search query's source as the Chrome browser.
- "ie=UTF-8": This parameter shows the query's character encoding, which is UTF-8.
The fragment identifier is also known as a "hash" or simply a "fragment." It is an optional part of a Uniform Resource Identifier (URI) that points a web browser to a specific area of a web page. A hash (#) sign always comes before a fragment, and a forward slash (/) separates it from the rest of the URL. It's usually used with the HTML anchor tag to link to a specific section of a lengthy webpage.
In the example above, "#heading-what-is-a-framework" is the fragment. Clicking on the URL will direct us to the heading section with the ID "what is a framework."
Port numbers are optional parts of a Uniform Resource component. They are numerical components that specify how a web client should connect to a web server. Ports are vital in routing network traffic to specific services or apps on a hosting server. The server environment can then control and send the correct data to the user's browser.
Ports are also integral components when it comes to server logs. Port numbers act as indicators in server logs, tracking all the actions on a web server. By doing this, they provide administrators with valuable insights into server performance.
URLs include a port number after the domain name, separated from the rest of the URL by a colon (":"). A port number isn't used only for URLs but also in Internet Protocol (IP) addressing to route traffic. IP port numbers are typically 16-bit unsigned integers ranging from 0 to 65,535. They consist of numerical digits (0-9) and sometimes contain characters such as "+" or "/." However, these special characters are rare and reserved for specific applications and protocols.
In the URL above, the port number is "8080". It implies that the web browser should connect to the web server at www.portexample.com using port 8080. There are several well-known port numbers associated with many internet services. A typical HTTP server uses port 80, while a HTTPS server uses port 443, FTP uses port 21, and SMTP uses port 25. By default, when a URL does not specify a port number, the web browser assumes the default port.
The following are some advantages of using URLs:
- Unique identification: URLs offer a unique way to access information on the internet. It allows for more efficient and user-friendly navigation across digital platforms.
- SEO: Search engines use URLs to determine the relevance of web pages. URLs play an essential role in search engine optimization (SEO). It is important to include keywords in our URL structure to optimize it. By using these keywords, search engines can learn more about the content and topic of the page. It makes it easier for users to find our website through search engines like Google.
- Analytics and tracking: URLs are an essential tool for analytics and tracking. They enable us to track user behavior on a website. Tools like Google Analytics make analyzing user behavior on a website easy. They help us enhance user experience, improve website functionality, and achieve online objectives.
There are several types of URLs, but the most popular types of URLs are Absolute and Relative URLs.
An absolute URL is a complete web address that contains all the data we need to locate an online resource. It encompasses the entire URL structure, protocol, domain name, path, and more. Absolute URL is essential for creating precise links to internal and external websites. They ensure the links remain functional regardless of web domain or structure changes.
Alternatively, relative URLs specify a resource's location relative to the current webpage's URL. Using relative URLs is much like navigating through folders on a computer. They are typically used to link to other pages or documents on the same website.
Relative URL web addresses or links do not include every URL component. They do not contain a domain name or port number; the path is the only required component of a relative URL.
Relative URL is often used in HTML anchor tags to specify the location of resources. A web browser will load the resources in the HTML link when the user clicks on it.
<a href="CSS/about.html">About Us</a>
In the example above, the relative URL is "CSS/about.html." When a user clicks on the link, the browser will find the "about.html" page in the "CSS" subdirectory.
URL shorteners are services that convert a long URL into a shorter, more manageable link. They simplify the sharing of web links in situations with limited character counts. Thus, it is handy when posting on social media platforms, for instance, as it has a character limit.
URL shorteners function by creating redirect links back to the original URL. When a user clicks on these shortened URLs, it redirects them to the original, lengthy URL. They make outgoing links neater, easier to remember, and more user-friendly.
Many URL shortening services provide the ability to track and analyze data. The number of clicks on a link, the source of the clicks, and other important data are all visible to users. Thus, this makes them useful for affiliate links and product links. URL shorteners can be a real lifesaver, but it's also worth being aware of the possible drawbacks. When clicking shortened links from untrusted sources, users should exercise caution.
The anatomy of a URL is an important concept to understand when navigating the internet. A URL is more than simply a random collection of letters and numbers. It is an intricately planned digital address acting as a virtual online compass.
URLs have many different parts that all work together to help web browsers find and access web pages. By understanding each component of URLs, we can better understand how they work.
We will find the following resources useful: