DEV Community

Junior Nascimento
Junior Nascimento

Posted on • Updated on

Building my Own wc Tool in Rust

Introduction

In this post, I'll walk you through my journey of building a custom version of the Unix wc (word count) tool, which I named ccwc (Coding Challenges Word Count). This project was inspired by a coding challenge designed to teach the Unix Philosophy of creating simple, composable command-line tools. You can find more details and source code in my GitHub repository.

The Challenge

The challenge was to build a wc tool that can:

  • Count the number of bytes, characters, lines, and words in a file.
  • Handle command-line options to specify what to count.
  • Read from standard input if no file is specified.

The Unix Philosophy encourages writing simple programs that do one thing well and can be combined to perform complex tasks. Following this principle, I broke down the challenge into several steps, each adding a new feature to the tool.

Step-by-Step Development

Step Zero: Data Structure

One of my favorite things in Rust has to be the type system, thats why I created a struct to represent the stats we might have to get from the input

struct Stats {
    pub bytes: usize,
    pub chars: usize,
    pub lines: usize,
    pub max_line_lenght: usize,
    pub words: usize,
    pub path: String,
}
Enter fullscreen mode Exit fullscreen mode

Step One: Counting Bytes

The first task was to implement the -c option to count the number of bytes in a file. I did this using a 'constructor'

impl Stats {
    pub fn new(chars: Vec<u8>, path: String) -> Self {
        let mut stats = Stats {
            bytes: chars.len(),
            chars: 0,
            lines: 0,
            max_line_length: 0,
            words: 0,
            path,
        };
        // Additional processing...
        stats
    }
}
Enter fullscreen mode Exit fullscreen mode

This function initializes a Stats struct with the number of bytes based on the length of the input vector.

Step Two: Counting Lines

Next, I added support for the -l option to count the number of lines. This required iterating through the characters and incrementing the line count whenever a newline character (\n) was encountered:

for c in chars {
    if c == '\n' {
        stats.lines += 1;
    }
}
Enter fullscreen mode Exit fullscreen mode

Step Three: Counting Words

The -w option was a bit more complex, as it required detecting word boundaries. I used a simple state machine to keep track of whether the current character is part of a word:

let mut in_word = false;
for c in chars {
    if !c.is_whitespace() {
        in_word = true;
    } else if in_word {
        stats.words += 1;
        in_word = false;
    }
}
Enter fullscreen mode Exit fullscreen mode

Step Four: Counting Characters

The -m option counts the number of characters. This is straightforward unless the locale supports multibyte characters. For simplicity, I treated each byte as a character in this implementation.

Step Five: Default Behavior

I'm handling flags in a very straightforward:

let args = env::args().skip(1); // Skips the frist argument in the command (most likely the binary name i.e: ccwd ....)
    let mut flags: Vec<char> = vec![];
    let mut files_paths: Vec<String> = vec![];
    let avaliable_flags = ['c', 'm', 'l', 'L', 'w']; // All avaliable flags

    for arg in args {
        if arg.starts_with('-') { // if the argument start if a - it must be a flag or multiple flags
            for flag in arg.chars().skip(1) { // skiping the -
                flags.push(flag); 
            }
        } else { // otherwise is treated like a file path
            files_paths.push(arg);
        }
    }

    let invalid_flag = flags.iter().find(|flag| !avaliable_flags.contains(flag));
Enter fullscreen mode Exit fullscreen mode

If no options are provided, the tool should count bytes, lines, and words by default:

if flags.is_empty() {
    flags.push('c');
    flags.push('l');
    flags.push('w');
}
Enter fullscreen mode Exit fullscreen mode

Final Step: Reading from Standard Input

The final feature was to allow the tool to read from standard input if no file is specified:

let number_of_files = files_paths.len();

if number_of_files < 1 {
    let mut buffer = Vec::new();
    match io::stdin().read_to_end(&mut buffer) {
        Ok(_) => {
            stats.push(Stats::new(buffer, "".to_string())); 
        }, 
        Err(err) => { panic!("Error reading from stdin: {err}")}
    }
}
Enter fullscreen mode Exit fullscreen mode

Conclusion

Building ccwc was a rewarding experience that really helped me learning rust. By breaking the project into manageable steps, I was able to incrementally add features and ensure each part worked correctly before moving on.

Feel free to check out the complete source code on my GitHub repository. I hope this post inspires you to tackle similar challenges and build your own command-line tools.

Take care folks!

Top comments (0)