The problem
A user recently reported that VidCoder would hang in the middle of a long encode. After dusting off WinDbg and some !clrstack
and !eestack
I found a thread in garbage collection for FlowDocument
. We use this in only one place, the log window.
VidCoder has had a colored log window for years, powered by a RichTextBox
with a FlowDocument
inside:
The document has a single paragraph and we add Runs to that paragraph that contain the actual log entries. At the time I wrote it, it worked great for the length of logs that were typically created. But after I added some optional extended debug logging for the interproc communication, long encodes would cause VidCoder to eat up progressively more memory and eventually lead to the hang inside a GC prompted by FlowDocument
memory allocation.
Virtualization
After the log window was implicated, I knew exactly what was going on. UI elements are typically orders of magnitude more expensive than raw data, and the standard answer is virtualization: create the UI elements only for what's in the viewport, then add placeholder space for areas outside the viewport. As the user scrolls, create new UI elements on-demand and clean up the old ones.
The approach
Virtualizing list containers come standard with most UI frameworks, but we couldn't use one here. I also decided I didn't want to just load the entire file up into memory, as the log files could get to be 200MB pretty easily and it would double with .NET's UTF-16 string
storage.
I decided I'd break the file up into "chunks", which have a certain number of lines. As you scroll around, read the chunks from the file and add them as Run
s in the FlowDocument
.
I made an outer ScrollViewer
, then the RichTextDocument
inside that, and modified the Margin
on the RichTextDocument
to adjust where it shows up. I used VerticalOffset
, ViewportHeight
and ExtentHeight
on the ScrollViewer
for all the viewport logic.
Problem #1: Where are these chunks, even?
Being able to read a chunk from the file means knowing what byte in the file to jump to in order to start reading, and how many bytes to read. But you can't just declare a fixed number of bytes for each chunk, because it might cut right between a line, which would complicate the layout logic. You need to know where the newlines are.
So I figured I'd do an initial pass through the file and read it line by line and note where the chunks are. Normally you do that with a StreamReader
in .NET, but the trouble is the byte position in the underlying file is not exposed from it. The StreamReader
is a buffered reader, which reads 1024 (usually) bytes at a time, then encodes it to a character array with the specified encoder. It's that character array that the StreamReader
advances through. It tracks where it is in that character array, but doesn't expose that publicly.
An answer on StackOverflow provides a creative solution: Go in with reflection and find the current position in the character array. Then look at the remaining (unread) characters that are unread, find how many bytes they would be, according to the current encoder. Then subtract that from the underlying stream position to get the byte position for that base stream.
But relying on reflection is dangerous; refactoring can always break you. In fact, the switch to .NET Core broke that approach. I decided I didn't want to deal with that, so I made my own stripped-down TrackingStreamReader. Since I knew I would only be using UTF-8 with no BOM, I could strip out all the preamble and encoding detection logic. An added BytePosition property gives us the goods:
public long BytePosition
{
get
{
return this.stream.Position - this.encoding.GetByteCount(charBuffer, charPos, charLen - charPos);
}
}
Now we can compile a list of chunks with proper byte positions!
Problem #2: Variable height lines
Virtualization is much more straightforward when you have fixed height items. You always know exactly how much space you need to insert above and below your real UI items. But in this case, the lines could wrap if they were long enough or the window was small enough.
The answer: measure and guess.
Measure
But nodes inside a FlowDocument don't expose ActualHeight
. You have to measure this way:
Rect start = firstRun.ElementStart.GetCharacterRect(LogicalDirection.Forward);
Rect end = lastRun.ElementEnd.GetCharacterRect(LogicalDirection.Forward);
In my case the "end" was always on the blank next line, so I just had to use end.Top - start.Top
.
Usually, you can measure immediately after adding the Run
s to the UI. That's ideal because you can immediately update the placeholder size and avoid having the UI jump around. But you don't always get it right away; sometimes you get Rect.Empty
instead and you need to try again inside a Dispatcher
call.
Guess
The chunk map tells you how many lines each chunk has, but you need to translate this to an expected height. That means you need data on what past chunks have actually measured to. We can store a double MeasuredHeight
on each chunk to keep track of this, then you can look over all the chunks and get a pretty good guess of the average line height.
Then when calculating the placeholder heights, use MeasuredHeight
if you have it, or guess using the average line height if you don't.
But what about resizing?
If the user resizes the window slightly, one potential way to handle this would be to just clear all the MeasuredHeight
values and wait for them to be re-measured. But this might cause the content currently in the viewport to vanish as the user resizes. A chunk that might shrink or grow drastically as it reverts back to the estimated height, which could push the content out of the viewport. One approach might be to try and doctor the scroll value based on this shift, but I decided to add another chunk property: bool HeightIsDirty
. When you resize, you mark all the chunks with HeightIsDirty
. They still count for estimation purposes, but MeasuredHeight
is re-calculated the next time the chunk is loaded. That way, scroll position naturally stays stable.
Problem #4: Unloading chunks
When a chunk is too far outside the viewport, we need to remove the Run
s associated with it. For this, I added a List<Run> Runs
property on the chunk. It also doubles as an indicator if the chunk is loaded or not. The chunk unload logic becomes simple and efficient with RemoveRange
:
int firstRunIndex = this.logParagraph.Inlines.IndexOf(chunk.Runs[0]);
this.logParagraph.Inlines.RemoveRange(firstRunIndex, chunk.Runs.Count);
Problem #5: Adding log lines
The log window can naturally have lines added to it after being loaded. After a new message comes in, we actually don't need to do any file operations. The logger has already taken care of writing to the file, and it lets the log window know of the message through an event that it's only subscribed to when the window is open.
But we can't just get the log string here, because we need to update the chunk map in case the user scrolls back up there later. I switched the log writer from a StreamWriter
to a FileStream
. I'd encode the log entry to bytes manually and write those bytes to the FileStream
. Then I could fire the event and include the size in bytes. That way we only need to do the encoding once, and we can keep our chunk map up to date.
When the user is scrolled at the bottom of the log, we keep them there, scrolling down after every new entry is added. But if they've scrolled up a bit, we let new entries accumulate without changing the scroll position, to allow users to inspect whatever they were interested in.
Staying at the bottom
In this case we can add Run
s to our paragraph, keeping our chunk map up to date by creating new chunks as needed. The scroll to the end actually triggers our normal scroll handler, which conveniently unloads old chunks without any extra work.
Colors and Run batching
In my log system, lines are marked with the log source and type; and can be colored accordingly. We need to make sure log lines from different sources are in different runs, so we can apply different colors to them. But that doesn't mean that every line has to have its own Run
. We can batch up multiple lines within a Run
, as long as the log coloring hasn't changed. But we also need to take care not to cross chunk boundaries with a Run
, to allow seamless loading and unloading.
This means as log entries come in, they might create a bunch of small Run
s, but if you scroll back up to them after the chunk has unloaded, they come back batched into a single Run
.
Final result:
Maybe I shouldn't have put most of the logic in one file, but at least it's less than 1000 lines.
Top comments (0)