DEV Community

Brian Berns
Brian Berns

Posted on

F# Tip 3: Recursive sequence expressions

F# sequence expressions are a great way to generate data in a loop. A simple sequence expression like this generates square numbers:

// 1, 4, 9, ...
seq {
    for i in 1 .. 10 do
        yield i * i
}
Enter fullscreen mode Exit fullscreen mode

Often, however, we need to process a hierarchical data structure instead of working through a simple for loop. It turns out that sequence expressions can handle this as well.

For example, let's say that we have a snippet of HTML and we want to extract each node's text. So, for this HTML:

<div>
    before the spans
    <span>inside the first span</span>
    between the spans
    <span>inside the second span</span>
    after the spans
</div>
Enter fullscreen mode Exit fullscreen mode

We want to generate output that looks like this:

div: before the spans + between the spans + after the spans
span: inside the first span
span: inside the second span
Enter fullscreen mode Exit fullscreen mode

Note that the output for each node contains text from only that node, not its children. How can we do this in F#?

Using the HtmlAgilityPack, we can extract the text from a given node like this:

let toSeqSafe (items : seq<_>) =
    if isNull items then Seq.empty
    else items

let getNodeText (node : HtmlNode) =
    let texts =
        node.SelectNodes("text()")
            |> toSeqSafe
            |> Seq.map (fun node -> node.InnerText)
    String.Join(" + ", texts)
Enter fullscreen mode Exit fullscreen mode

Now we can use a sequence expression to process an entire hierarchy recursively:

let rec getHierarchyText node =
    [|
        let text = getNodeText node
        if text <> "" then
            yield node, text

        for child in node.ChildNodes do
            yield! getHierarchyText child
    |]
Enter fullscreen mode Exit fullscreen mode

Inside the loop, we first yield the text for the current node, then yield! the sub-sequence generated by each child node. F# automatically concatenates all the sequences together for us!

We can test drive the function like this:

let doc = HtmlDocument()
use rdr =
    new StringReader(
        "<div>\
            before the spans\
            <span>inside the first span</span>\
            between the spans\
            <span>inside the second span</span>\
            after the spans\
        </div>")
doc.Load(rdr)

for (node, text) in getHierarchyText doc.DocumentNode do
    printfn "%s: %s" node.Name text
Enter fullscreen mode Exit fullscreen mode

This pattern of a yield followed by a recursive yield! is something that I find comes up often when processing hierarchies. I hope it's useful for you as well.

Top comments (0)