For those who are new to *nix shells like Bash, the underlying framework of streams may seem a little hazy. Even less clear is the value proposition of developing a firm understanding of these facilities. In this article I will endeavor to pass along some information I wish someone had told me 30 years ago, when my *nix shell journey began.
Every program, including the shell into which you may type commands, a shell program you may be writing, the 'ls' command, etc. have three standard I/O streams commonly referred to as stdin, stdout, and stderr. These streams are always assigned file descriptor numbers 0, 1, and 2 repectively. The purpose of these streams is twofold:
- Route information between a running process (program) and a file.
- Route information between two running processes.
These particular streams are half duplex, meaning data flows in one direction only for each stream.
stdin is used to move data into a process, whereas stdout and stderr are two channels for moving data out of a process. stdout is normally for the information you get from a program if everything goes well, while stderr is normally reserved for information about things which have gone wrong. Each stream passes a series of bytes; whether or not these bytes are text, binary, JSON, XML, etc. is only a matter of concern to the sender and receiver. This is important.
When you use a shell such as Bash interactively, the three streams mentioned earlier are connected as follows:
- stdin of your shell is connected to your terminal program, which feeds in the strings you have typed when you press the Enter key.
- stdout and stderr of the shell are also connected to your terminal program, which prints the output of these streams to the terminal window so you can read them.
It is possible to redirect these streams as necessary.
Let's look at what happens when you issue the command:
Your shell invokes the kernel fork() call to create a child process which is identical to the interactive shell you are using, save for the return value from the fork() call itself. The main implication here is that stdin, stdout and stderr are all connected to the same endpoints as the shell you are using. Because the command line began with '/bin/ls', this child process then invokes the kernel exec() call to overlay the '/bin/ls' program - which is to say that the child process becomes '/bin/ls' while retaining the stream assignments of the parent. Now '/bin/ls' does its thing chasing up file meta-information, and sends the resulting series of text characters to stdout, which you'll recall gets printed by your terminal program.
If you wish to store the output from '/bin/ls' in a file, you may redirect stdout of '/bin/ls' to a file like so:
Here I've placed the redirection at the beginning of the command. You can also place it after the command. Location will become important only when the command line is more complex, involving more than one command.
This time the interactive shell creates (or truncates if it already exists) a file, "ls-output.txt", and connects it to the stdout stream of the child process. When '/bin/ls' does its thing, the result gets written to the "ls-ouput.txt" file. Your terminal program doesn't get involved in any way with this redirection. This is a key concept.
Using a terminal program is just one of many ways to move data in and out of a shell, albeit a very handy one for interactive use. If you have a need to automate something which you can accomplish by typing interactive commands, then automating this will be very straightforward. This is probably the most compelling reason to learn how to do things from a command line.
Understanding standard streams is central to the endeavor of shell programming on a *nix system. Once you become confident about how streams work, you will be better able to create mashup applications quickly using Bash (or other shells). This is the key to automation, which is probably the largest value proposition Bash has to offer.