DNS is basically the address book of the internet. When you type www.duckduckgo.com into your browser, the computer needs to figure out how to get your request to the right server. Routing over the internet works via IP addresses, so to get to www.duckduckgo.com, you first need to figure out what its IP address is.
This is where DNS, the Domain Name System comes in. At a high level, when you hit enter in your browsers location bar, the browser does a DNS lookup to try and find the IP address of the server name you entered. A DNS lookup consists of calling an Operating System function (a syscall) called gethostbyname()
. This function takes a string such as "www.duckduckgo.com" and returns a structure that includes the IP address of the host, among other information. On my system, that struct looks like this1:
struct hostent {
char *h_name; /* official name of host */
char **h_aliases; /* alias list */
int h_addrtype; /* host address type */
int h_length; /* length of address */
char **h_addr_list; /* list of returned addresses */
};
How does gethostbyname()
know what the IP address of www.duckduckgo.com is? Unix like computers (including Macs) have a file called /etc/resolv.conf
that tells the Operating System where it should look for name to address lookups, you can check yours by typing cat /etc/resolv.conf
in your terminal program.
Mine looks like this:
$ cat /etc/resolv.conf
# Generated by iwm0 dhclient
nameserver 192.168.0.1
lookup file bind
That tells me a few things:
-
#Generated by iwm0 dhclient
This tells me the file itself was generated by thedhclient
program.dhclient
is the program that hands out IP addresses to computers that request them via DHCP (Dynamic Host Configuration Protocol) -
nameserver 192.168.0.1.
This tells me the IP address of a server I can use to lookup names I don't already know about. This will be important soon. -
lookup file bind
This tells the computer what database it should use to 'lookup' IP addresses, 'file' tells the computer to first check the/etc/hosts
file, and 'bind' tells the computer to ask DNS for the IP address.
If you're a web developer, you've likely used the /etc/hosts
file to have a web address point to your computer temporarily for testing. If not, that's ok we'll talk about it now.
If you look in /etc/hosts
you'll see something that looks a bit like this:
$ cat /etc/hosts
127.0.0.1 localhost
::1 localhost
This is a mapping of IP address, hostname, and aliases. In the above example, there are no aliases defined. If I wanted to prank myself, I could do something like this:
$ cat /etc/hosts
127.0.0.1 localhost
::1 localhost
127.0.0.1 www.duckduckgo.com
Now, when I try to browse to https://www.duckduckgo.com/ I get an error, because gethostbyname("www.duckduckgo.com")
is returning 127.0.0.1, which is the IP address of my local computer, and not the address of www.duckduckgo.com. If I knew the IP address of www.duckduckgo.com, I could put it here and everything would work, at least until the IP of www.duckduckgo.com changed.
A long, long, time ago, there was no DNS, just a bunch of computers and a bunch of people adding entries to /etc/hosts
files by hand, or maybe copy pasting from a file that was distributed by Stanford2. Eventually this became too cumbersome and someone decided to write some code, thus DNS was born.
Now we come to the second option in our resolv.conf
example, bind.
If gethostbyname()
can't find an entry in /etc/hosts
it then queries the DNS server specified in the nameserver
section of /etc/resolv.conf
this is your DNS server.
So, the Operating System sends a query to 192.168.0.1 asking for the IP address for the "www.duckduckgo.com" server. If 192.168.0.1 knows the IP address, it will return it to you, if not... it will then ask its DNS server (also defined in the /etc/resolv.conf
file but now on your DNS server) what the address is for "www.duckduckgo.com." If that server knows the IP address, it will return it, and if not, it will again ask its DNS server if that server knows the IP address to "www.duckduckgo.com" and so on... until one of two things happens:
- No one knows the IP address of www.duckduckgo.com - In this case you will get an error and someone will have to go fix the DNS records for that website.
- Eventually someone will say "Oh, yeah... I know that guy, here's his address" - In this case you will then open a TCP request to the IP address that you just found (there's at least one other full blog post there) and ask the server for a web page.
That's basically how DNS works, in a nutshell. But, but but, I hear you ask... how does the DNS server that finally responds know what the IP address is for "www.duckduckgo.com?" Excellent question. Basically, the administrator or the devops team, or the developer (now that things are all orchestrated and defined in code) will tell the "authoritative" name server for the domain "Hey, I just setup this server, it's going to host web pages, its name is 'www.duckduckgo.com' and its IP address is '184.72.104.138'."
You do this in what's called a Zone file.
You can play with DNS queries yourself using the dig
command:
$ dig www.duckduckgo.com
; <<>> DiG 9.4.2-P2 <<>> www.duckduckgo.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 57389
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0
;; QUESTION SECTION:
;www.duckduckgo.com. IN A
;; ANSWER SECTION:23.21.193.169
www.duckduckgo.com. 123 IN CNAME duckduckgo.com.
duckduckgo.com. 59 IN A 23.21.193.169
duckduckgo.com. 59 IN A 107.20.240.232
duckduckgo.com. 59 IN A 184.72.104.138
;; Query time: 51 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Wed Apr 25 17:41:15 2018
;; MSG SIZE rcvd: 98
Most of that stuff you can ignore for now, the interesting thing is the QUESTION SECTION
which basically just repeats your query back for you, and the ANSWER SECTION
which in this case tells us that "www.duckduckgo.com" is a CNAME(Canonical Name)
which is kind of like an alias, to "duckduckgo.com" which has 3 IP addresses: 23.21.193.169, 107.20.240.232, and 184.72.104.138. You should be able to confirm that any of those go to the duckduckgo.com website, by copy-pasting them into your browser (this won't always work because you can have multiple websites running on the same server at the same IP address).
That's all I've got to say about that.
Questions, feedback? Leave a comment!
Top comments (7)
Hey this is kind of awesome thanks for writing this. Question about CNAME: is the CNAME yet another shortcut to the ip? In other words, if I type in the CNAME into the browser, what's the process of resolving to one of those IPs look like. Does is resemble something like this data structure?
Hi Eugene, exactly. A CNAME is basically a shortcut or an alias to an A record. A CNAME can't point directly to an IP address, it always points to another name.
So, you can have any number of CNAMEs that point to any of your A records (or names) but not directly to an IP. A records point a name to an IP.
Thanks a lot. Could you please explain the last line of the article "this won't always work because you can have multiple websites running on the same server at the same IP address". If you can please explain this I would be very grateful to you.
Sure, I can have multiple CNAMES pointing to the same IP address, for example:
mysite.com -> 192.168.1.30
othersite.com -> 192.168.1.30
thirdsite.org -> 192.168.1.30
When I type 192.168.1.30 into my browser, the server won't know which website I'm asking for, so it will return either the default website for that server, or an error.
Awesome explanation! This is the kind of article we need to read while learning network or dev. Thank you!
Glad you liked it!
great explanation. Short, concise and very clear. Thanks !