F# sequence expressions are a great way to generate data in a loop. A simple sequence expression like this generates square numbers:
// 1, 4, 9, ...
seq {
for i in 1 .. 10 do
yield i * i
}
Often, however, we need to process a hierarchical data structure instead of working through a simple for
loop. It turns out that sequence expressions can handle this as well.
For example, let's say that we have a snippet of HTML and we want to extract each node's text. So, for this HTML:
<div>
before the spans
<span>inside the first span</span>
between the spans
<span>inside the second span</span>
after the spans
</div>
We want to generate output that looks like this:
div: before the spans + between the spans + after the spans
span: inside the first span
span: inside the second span
Note that the output for each node contains text from only that node, not its children. How can we do this in F#?
Using the HtmlAgilityPack, we can extract the text from a given node like this:
let toSeqSafe (items : seq<_>) =
if isNull items then Seq.empty
else items
let getNodeText (node : HtmlNode) =
let texts =
node.SelectNodes("text()")
|> toSeqSafe
|> Seq.map (fun node -> node.InnerText)
String.Join(" + ", texts)
Now we can use a sequence expression to process an entire hierarchy recursively:
let rec getHierarchyText node =
[|
let text = getNodeText node
if text <> "" then
yield node, text
for child in node.ChildNodes do
yield! getHierarchyText child
|]
Inside the loop, we first yield
the text for the current node, then yield!
the sub-sequence generated by each child node. F# automatically concatenates all the sequences together for us!
We can test drive the function like this:
let doc = HtmlDocument()
use rdr =
new StringReader(
"<div>\
before the spans\
<span>inside the first span</span>\
between the spans\
<span>inside the second span</span>\
after the spans\
</div>")
doc.Load(rdr)
for (node, text) in getHierarchyText doc.DocumentNode do
printfn "%s: %s" node.Name text
This pattern of a yield
followed by a recursive yield!
is something that I find comes up often when processing hierarchies. I hope it's useful for you as well.
Top comments (0)