DEV Community

harsh banthiya
harsh banthiya

Posted on

Writing my own minimal shell in C - Part 3(The parsing process leading to an execution tree)

The Parsing Process

Let's try to put a mental model of how the entire command line string needs to be parsed and what are the various modules we might require along the way.

  • Scanner -> Parser -> Executer -> Command
  • Token_list -> Abstract Syntax Tree -> Convert to Execution Unit -> Execute-Execute

The Tentative Process:

  1. Get the whole string until new line using read_line library.
  2. Search the string from index 0 until '\0' for ('')
  3. Divide string in tokens based on whitespace
    • it can be any amount of whitespaces, abcd efgh or abcd efgh; They should both make two tokens.
    • whitespace cannot be included in token characters except when there are double or single quotes. ** "hello world" is one token hello world ** "" hello world "" is two tokens hello and world; and the empty string before and after hello world will not be tokenized. ** "'hello world'" -- A single quote inside a double quote will be included in the token, so the result will be one token called 'hello world'; All the above rules apply to single quotes as well.
  4. After dividing the tokens, if there is a token with a $ sign, whatever comes after the $ sign is compared with the list of all the env variables available and replaced with the value of the said variable. ** Important sub-point - In case of a string in single quote, even if there is a $ sign, it is not replaced with the value of the said env variable.
  5. Environment variables are received using envp parameter from the main function and stored in a key value structure t_env and put in a linked list.

Converting stuff into an Abstract Syntax Tree.

  1. A scanner will scan through the string and make a list of tokens based on the rules, we listed above.
  2. A parser takes that list of tokens verifies the syntax (represents an error if there is a syntax error), and converts it into a tree structure.

Lets try to build the schematics top down:

----- Command line -----
Job ; command line ; job
(Identify command line (unparsed line) as either a formed job or a combination of jobs and unparsed command_lines)

----- Job ------
Command | job ; command
(Identify a job as a command or another job in a list)

---------- Command ----
simple_command redirection list simple_command
(Identify a command as either a simple command or a redirection list )

---------- Redirection List ----
redirection redirection_list

----------- Redirection ------
filename < filename >> filename

---------- Simple_Command -------
pathname token_list

----------- Token_List --------
token token_list

For example:
cat > a > b < c should end up as:

          command
         /        \
  simple cmd      redirection
  /                      \ 
cat                        >
                          /  \
                       aaaa   >
                             /  \
                          bbbb   <
                                 /
                               cccc 
Enter fullscreen mode Exit fullscreen mode
          (hold)
        /        \
   cat            >
                 /  \
              aaaa   >
                    /  \
                 bbbb   <
                       /
                    cccc 
Enter fullscreen mode Exit fullscreen mode
          (hold) 
        /        \
   cat            aaaa (type : NODE_REDIRECT_IN)
                    \
                    bbbb (type: NODE_REDIRECT_IN)
                      \
                      cccc (type: NODE_REDIRECT_OUT)
Enter fullscreen mode Exit fullscreen mode

The tree could finally evaluate to Abstract Tree

       aaaa (type : NODE_REDIRECT_IN)
      /      \
    cat       bbbb (type: NODE_REDIRECT_IN)
                 \
                 cccc (type: NODE_REDIRECT_OUT)
Enter fullscreen mode Exit fullscreen mode

Let's breakdown the logic of the above command tree:

We identify command as one of two structures:

  1. Simple_command and redirection_list structures
  2. Simple_command structure.

  3. In case of simple command

First we check if it is simple_command
** We check the token_list with pathnames
** If it just one token we check once but it is a list we check each one.

  1. In case of redirection list

** We check if it just one redirection or a list.
** We attach one case ( <, >, <<, >>) and attach the filename information, create a node and add it to the list.

Top comments (0)