Randomly the other day, I was wondering (when is anything I wonder not random?) how browsers work. Correction: I was wondering what happens when you enter a URL into your browser and hit “Enter.” I wasn’t wondering about the DNS resolutions or TCP connections or other networking related aspects around the Internet. I was wondering how a browser connects all of this into a single application. I’ve learned about these concepts before, but I’ve always wondered how they are all connected into a single, functional application.
As per usual, I want to learn more about this by reading the source code for an application. At first, I figured I might read the code for Chromium or Firefox or some other open source browser. Then I realized that there was probably a lot of code in these applications that would cloud my analysis of the networking aspect of this. Instead, I figured that I would try to figure out how these concepts connect by looking at the code base for curl
.
curl
is a command line tool that allows the user to transfer data from one server to another. Although it is often used to query sites through the hypertext transfer protocol (HTTP), you can actually use it transfer data through other protocols as well, like FTP. I actually didn’t know this before starting this exploration so I guess that’s one new thing learned!
Anyways, to get started, I want to figure out what happens when I run the following on the command line.
$ curl https://safia.rocks
This command print the HTML source for my personal website to the console. You can try it and see.
The source code for curl
is hosted on GitHub. The first order of business was to find the entry point for the curl
executable. I snooped around the source code to try to find this.
Sidebar: I use the phrase “snooped around” a lot in these blog posts. Generally, it means looking through directories, reading the first couple of lines in files that I think might be related to what I want, and reading through the source for header files.
After poking around for a little bit, I started to notice some trends in how the source code for curl
is structured. In particular, the source code associated with the command line tool is sprinkled around files under the src/tool_*.c
path. This includes code that parses parameters, prints out the help statement, processes the URLs passed into curl
, and interfaces with libcurl.
Sidebar: When I refer to curl
, I am referring to the command line tool. There is also libcurl
which is the library that contains the code for handling all the transfer protocols.
Once I figured out this src/tool_*.c
pattern, it didn’t take long for me to find the entry point of the command line tool. It’s located in the main
function of the src/tool_main.c
file. Sweet! Now, we can get to the fun parts.
The most critical code referenced in this function is the following.
/* Start our curl operation */
result = operate(&global, argc, argv);
So, it looks like most of the heavy lifting is done by the vaguely named operate
function.
Sidebar: The argc
and argv
parameters are self-explanatory. The global
parameter is a reference to the global configuration for curl
which is the .curlrc
file stored in the users home directory by default.
Alright! So It’s time to look into the operate
function which is defined in the src/tool_operate.c
file. I decided to start my exploration by looking at the function declaration for the operate
function.
CURLcode operate(struct GlobalConfig *config, int argc, argv_item_t argv[])
In particular, I was interested in the CURLcode
object that is returned from the function. I tried to find the definition of this object somewhere in the code base. I assumed that it would be a struct so I searched for that definition. The query I was using to look for the struct definition didn’t turn up anything (I actually looked for it using the command line on a local copy of curl). Instead, I opted for the documentation on CURLcode
which I found here. So it turns out that a CURLcode is actually an enum-like structure which contains integer representations of different possible error codes that might occur in curl
. This makes sense when you consider the fact that most (all) C functions return their status using integers and the fact tha the result of the operate
function is typecast into an integer in the return for the entry point of the main function for curl
.
Alright! This blog post is getting a little long so in the next blog post, I’ll dive into the meat of the operate
function and try to analyze what’s going on there. So I don’t lead myself to far astray, I’d like to answer the following questions as I’m reading through the code.
- What calls to the
libcurl
are made in theoperate
function? - What preprocessing, if any, is done to the arguments passed to the
operate
function?
See you in the next blog post!
Top comments (0)