Nivethan

Posted on Nov 9, 2020 • Edited on Feb 20, 2021 • Originally published at nivethan.dev

A Gemini Client in Rust - 08 Parsing Gemini Pages

Hello! Let's do a recap of what we currently have done. Our client now has a visit function that can travel to a Gemini server, it can handle URLs, so we can visit any Gemini page, and we have statuses so we can correctly handle the different types of responses Gemini can send. We also have caching and we can select pages we want to view from the cache.

So far we've been just displaying everything that the server sends to the user. If there are links or headings in a page, we leave them as is. This is really cumbersome, especially for links! The user needs to copy and paste the link to get to it.

In this chapter we're going to add some pizzaz!

Let's get started!

The Gemini type - text/gemini

We've currently hardcoded text/gemini as the mime type for our responses. For now we'll leave this in, but we can change the behavior of our client based on the mime type, for instance if it is anything other than text/*, we can save the file or save the file and open the correct program like an image viewer or browser.

text/gemini is a special format made for Gemini. It has a very simple rule set where the first 3 characters of a line decides the line's type.

Lines started with => are links, the first space after the link is the splitter between the link and the user visible name
3 back ticks is the preformat toggle
Lines started with ### are headings
* is for bullets

5.> is for quotes
6.Everything else is a regular line

We are making a basic command line client so really the only thing we care about are the link lines. Everything else we can get away with by printing directly to the screen!

Let's look at some code!

...
fn display(page: &Page) {
    let lines: Vec<&str> = page.content.split("\n").collect();

    for line.trim() in lines {
        println!("{}", line.trim());
    }
}
...

The first thing we need to do is move our printing logic all to one place, we currently print out in our hx option, the ls, option and in the visit option. Now we'll use one general purpose display function!

This function splits the page based on new line characters, this is because Gemini allows lines to be delimited by \r\n or \n. We then print each line onto the screen. This is where we will add the logic to do the parsing.

But before that, let's update our main function to use our new display function.

...
            _ if tokens[0].starts_with("h") => {
                let option = tokens[0][1..].to_string().parse::<i32>().unwrap_or(-1);
                if option < 0 || option >= cache.len() as i32 {
                    println!("Invalid history option.");
                } else {
                    display(&cache[option as usize]);
                }
            },
            "ls" => {   
                if cache.len() > 0 {
                    display(&cache.last().unwrap());
                } else {
                    println!("Nothing cached.");
                }
            }
            "v" | "visit" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    return;
                }

                let url = Url::new(tokens[1]);
                let content = visit(&url);
                let page  = Page::new(url, content);
                display(&page);
                cache.push(page);
            },
...

We have changed our println! statements to calls to our display function passing in the Page object.

Now we should be able to visit a Gemini page and still have everything displaying as text.

Let's work on the link item type first!

Link Item Type

The Gemini specification says that a link is any line starting with =>, with or without a space followed by a relative or absolute path, which is then followed by a space and some text or nothing at all. If there is some text, that text should appear to the user and the link hidden, if there is no text then the link should appear.

Now the first thing we need to do is change the way our Page struct works. Currently we have a url field and a content field. Now we will remove the content field and instead save a list of lines.

Let's see the code!

...
struct Link {
    text: String,
    link: String,
}

struct Page {
    url: Url,
    lines: Vec<String>,
    links: Vec<Link>
}

impl Page {
    fn new(url: Url, content: String) -> Self {
        let content: Vec<&str> = content.split("\n").collect();

        let mut links: Vec<Link> = vec![];
        let mut lines: Vec<String> = vec![];

        for line in content {
            match line {
                _ if line.starts_with("=>") => {
                    let mut link_line = line.replacen("=>", "", 1);
                    link_line = link_line.replace("\t", " ");
                    let tokens: Vec<&str> = link_line.trim().splitn(2, " ").collect();

                    let link: String;
                    let text: String;

                    if tokens[0].starts_with("/") {
                        link = format!("{url}{relative_path}",
                            url=url.address,
                            relative_path=tokens[0].to_string()
                        );

                    } else if tokens[0].starts_with("gemini://") 
                        || tokens[0].starts_with("http://") 
                        || tokens[0].starts_with("https://")
                    {
                        link = tokens[0].to_string();

                    } else {
                        link = format!("{url}{relative_path}",
                            url=url.request().trim(),
                            relative_path=tokens[0].to_string()
                        );
                    }

                    if tokens.len() <= 1 {
                        text = tokens[0].to_string();
                    } else {
                        text = tokens[1].trim().to_string();
                    }

                    links.push(Link { text, link });
                    lines.push(line.to_string());
                },
                _ => { lines.push(line.to_string()); }
            }
        }
        Page { url, lines, links}

    }
}
...

The first thing to note is we now have a new struct for links, Link only contains a text that we show the user and the underlying link the text points to.

Next we update the Page struct so that instead of content it has a Vector of strings for lines and it has a vector for links.

Now the biggest change will be in our constructor. We still pass in the same variables so this means that our Page creation in various places doesn't need to get updated.

Let's go through what our constructor is doing.

We first split the content by new line characters. We then set up a list of links and lines. As we go through each line in the content, we will add it to these buckets.

We loop through each line in the content.

We then match the line. The base case is that we simply add the line to our lines variable.

The link case is when a line starts with =>.

First we will remove the => characters.

Next we replace any tabs with spaces.

We then tokenize the link line. We split the line into 2 on the first space we find and collect this into a vector.

Next we check to see if this is a relative path or if this is an absolute path.

If it is a relative path starting with / we will append the base url of the page we are on.

If it is an absolute path starting with gemini, http, or https we will leave it as is.

Anything else is a relative path, relative to our current location.

Next we check to see how many tokens we made when we split on the first space. If the link line didn't contain any user visible text then our tokens would just be the link. In that case we set the text that the user sees to the link itself. If we do have have a second part to our tokens variable then we use that as the the text we will show the user.

We finally add the link to our list of links.

We also add the line into our lines. This is so when we go to display everything the display routine can figure out when to use the links variable.

Let's now take a look at our updated display routine!

...
fn display(page: &Page) {
    let mut link_counter = 0;

    for line in &page.lines {
        match line {
            _ if line.starts_with("=>") => {
                println!("{link_counter} => {text}", 
                    link_counter=link_counter,
                    text=page.links[link_counter].text
                );
                link_counter = link_counter + 1;
            },
            _ => println!("{}", line)
        }
    }
}
...

Instead of looping through page.content we loop through page lines.

We match the line against the link type identifier, =>, at which point if we do hit that, instead of printing the line we print the user visible text.

To know which link we need to print, we need to know the index of the link, this is where the link counter comes in. We keep track of how many => we've seen, which will tell us which link we need to display for the user.

This may be a bit weird, but make sure to understand this part, in our Page constructor, we added links as we found them, this means that when we display them, as we find links we can reference them in the Links list by knowing how many links we've seen!

The base case in our match statement is printing the line directly to the screen.

Now we also print the link_counter out as well as we'll use that number to follow a link.

Following Links!

Whew! We have links getting formatted properly now. Let's add the ability to follow a link. We'll do something like cd x where x is the link number we want to go to. This is because I really enjoy thinking of everything in a file system. This works for me so using similar syntax is easier. The great thing is we can modify the code ever so slightly for any sort of syntax!

(I had originally had this with vx syntax but I didn't like it as much so I changed it.)

Let's get started!

...
            "cd" => {
                if tokens.len() < 2 {
                    println!("Didn't specify a destination.");
                    return;
                }

                if cache.len() <= 0 {
                    println!("Nothing cached.");
                    return;
                }

                let page = &cache.last().unwrap();
                let option = tokens[1].to_string().parse::<i32>().unwrap_or(-1);
                if option < 0 || option >= page.links.len() as i32 {
                    println!("Invalid link option.");
                } else {
                    let url = Url::new(&page.links[option as usize].link);
                    let content = visit(&url);
                    let new_page  = Page::new(url, content);
                    cache.push(new_page);
                }
            },
...

This option mixes a few different things that we have already done. The first thing we do is make sure we are on a page, we can do that by making sure we have something in the cache. We also need to make sure the number of tokens the user entered is correct.

Next we get the last page in the cache as that is where we really are. The last item in the cache is the last place we visited.

Next we parse the number that the user entered. v3 would mean that the user wants to visit the 3rd link on the current page.

We validate the option and then we do the same steps as our visit command. We first get the associated link with that option and we create a Url object from it. We then call the visit function on that url to get the content. We then create a new_page from the url and content. We display the content and then we cache this new page.

Voila! We can now handle links in Gemini! We have an honest to goodness Gemini client now. We can read Gemini pages and follow links as we find them.

One thing to note here is that when we do use our cd command, we don't immediately display the screen. Our visit command does. One is not better than other objectively but I like having to type in ls to display the page.

Let's bring out visit command in line with our cd command.

...
            "v" | "visit" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    return;
                }

                let url = Url::new(tokens[1]);
                let content = visit(&url);
                let page  = Page::new(url, content);
                cache.push(page);
            },
...

Now we don't display the page immediately, we simply add it to our cache!

Back Option

Let's do a couple more thing before we end this chapter. We should implement a back option so we can return to our previous page. This is actually very simple! Our cache is a stack, if pop off from it then we're immediately back one page!

...
            "cd" => {
                if tokens.len() < 2 {
                    println!("Nowhere to visit.");
                    continue
                }

                if cache.len() <= 0 {
                    println!("Nothing cached.");
                    continue;
                }

                if tokens[1] == ".." {
                    cache.pop();
                    if cache.len() > 0 {
                        println!("Back to {}", cache.last().unwrap().url.request().trim());
                    } 
                } else {

                    let page = &cache.last().unwrap();
                    let option = tokens[1].to_string().parse::<i32>().unwrap_or(-1);
                    if option < 0 || option >= page.links.len() as i32 {
                        println!("Invalid link option.");
                    } else {
                        let url = Url::new(&page.links[option as usize].link);
                        let content = visit(&url);
                        let new_page  = Page::new(url, content);
                        cache.push(new_page);
                    }
                }
            },
...

Now we can go back by doing cd ..! (Originally I had implemented this as it's own command, back, but I didn't like it as much)

Now that we have the back option, we should also implement a where option. This way we can tell what page we currently have as the last item in the cache.

            "where" => {
                if cache.len() > 0 {
                    println!("{}", cache.last().unwrap().url.request().trim());
                } else {
                    println!("Nowhere.");
                }
            },

Another simple little command that adds a bit of life to our client. We make sure we have something in our cache and if we do we print out the url request we made to get that page.

Now with where, cd, history and ls we have a functioning client that can do all sorts of things! History is a little wonky right now due to it not really being history but really being just the current path we took to get to where we are. If we cd we are modifying the history. Ideally, we would save the history of all the places we visit and then rename the current history to pwd. Let's leave that out for now but just a thought. :)

Now! We are done with this chapter! We hooked up links in Gemini and we can now traverse Gemini space comfortably. We can follow links and go back. We can use the ls command to reprint pages.

In the next chapter we will add some pagination to our pages, that way we don't have to manually scroll up and down!

See you soon!

DEV Community

A Gemini Client in Rust - 08 Parsing Gemini Pages

The Gemini type - text/gemini

Link Item Type

Following Links!

Back Option

Top comments (0)

Read next

Just Joined the Community – Lots of Questions as a Fresher React Developer

How Adam Smith Differentiates Money, Capital, Wealth & Goods

Revolutionize Your Flutter Apps with These Top Cross-Platform CMS Solutions

Inferência de Tipos com o Operador Losango