Recently I was asked about the steps that take place when it is rendered. Weeeeeeell, the DOM gets built (I suppose) and then... the stuff shows up on the screen! Right? Hmm, I am sure I can step up my game - so in a good rabbit hole fashion, I set on a quest to research the steps.
I was told that, alongside the
DOM tree, there is something called
CSS tree. So I Googled
css tree vs dom tree. First result was this Stack Overflow thread. From there, I landed here.
Constructing the DOM and CSSOM tree
DOM stands for
Document Object Model, and it is an API that allows the developer to to interact with the nodes (it is a tree) in the browser. In order to build these nodes, the browser receives the information through the network and undergoes a process called
tokenization. The article says that tokens are
groups of characters which provide a template for the DOM tree ('input' will become a token [input]). The browser then used these token to determine how they will relate to each other and build the nodes. The nodes are used to build the tree. So, for instance, the article explains that:
[div] / \ / \ [span] [ul] / \ \ / \ \ [p] [img] [li]
We now have the DOM tree and the browser can paint the style elements for each DOM node. The author says that this is a process called
reflow. In this process, the CSS is converted to a structure similar to the DOM tree, called
CSSOM Tree (CSS Object Model). So far so good. The author promises a follow up post, called
Constructing the CSSOM Tree. But guess what... he never did! Anyway, the earlier SO thread is packed with resources, I moved to the next.
What Every Frontend Developer Should Know About Webpage Rendering, sounds really exciting. This is what I am looking for, so I dive in.
It first go through a recap of the steps that take place when a browser render a page, expanding upon what I learned in the previous article:
- The DOM (Document Object Model) is formed from the HTML that is received from a server.
- Styles are loaded and parsed, forming the CSSOM (CSS Object Model).
- On top of DOM and CSSOM, a rendering tree is created, which is a set of objects to be rendered (Webkit calls each of those a "renderer" or "render object", while in Gecko it's a "frame"). Render tree reflects the DOM structure except for invisible elements (like the tag or elements that have display:none; set). Each text string is represented in the rendering tree as a separate renderer. Each of the rendering objects contains its corresponding DOM object (or a text block) plus the calculated styles. In other words, the render tree describes the visual representation of a DOM.
- For each render tree element, its coordinates are calculated, which is called "layout". Browsers use a flow method which only required one pass to layout all the elements (tables require more than one pass).
- Finally, this gets actually displayed in a browser window, a process called "painting".
The author then explains that
repaint means that only the style of an element is changed and, therefore, the browser just 'repaints' the element. However, if changes affecting content, structure or position happen, something called
reflow takes place. We seem to have another
reflow, the first author talked about it in the context of the step after CSSOM is created. Which brings me to the 3rd resource shared in the original SO thread, this one more extensive.
Behind the scenes of modern web browsers
These are the components of a browser:
Browser have a
rendering engine, responsible for displaying content on the browser screen. Safari uses an angine called
Gecko and Safari and Chrome one called
The rendering engine gets the contents of the document requested by the user in the browser from the networking layer. The article goes through the steps, which we already know now:
- The render engine parses the HTML document and turns the tags to DOM nodes - this is the
- Same goes on for the style with the
- Both are used to create another tree - the
- The render tree goes through a
layout process- giving each node the coordinates where it should appear on the screen.
paintingstage - the render tree is traversed and each node is painted using the UI backend layer (see image above).
Stage 1 - Parsing HTML
In step 1 above, the
HTML parsing algorithm is key to create the DOM nodes. The author says that this algorithm is described in detailed by the HTML5 specification and has 2 stages:
Parses the input into HTML tokens: start tags, end tags, attribute names and attribute values.
The tokenizer recognises the tokens and passes it to the tree constructor before moving on to recognize the next token. And so on and so forth:
During the tree construction the DOM tree, with the Document in its root, will be modified and elements added as we go along.
Stage 2 - parsing CSS
It is now the turn of CSS to be parsed. The CSS file is parsed into a
Stage 3 - render tree
render tree is composed by visual elements in the order they will be displayed, the visual representation of the document. It is not a 1 to 1 relation with the DOM elements, because there may be DOM nodes that are not displayed: i.e. head tag. Building the render tree requires calculating the visual properties of each render element to be displayed on the screen, and takes resources.
Stage 4 - layout process
The rendered tree above does not have position and size. These values are calculated during
reflow stage. A top and left coordinate system relative to the root frame, the html document. The root element position is 0,0 and the dimension is the viewport.
Stage 5 - painting
This is the stage where the render tree is traversed and the content is "painted" on the screen. There is an order
In response to a change, browser may simply
repaint (if changes are done to the visual aspect of the element) (stage 5) or
repaint (if position is changed, a DOM element is added, etc.) (stage 4 and 5).
That's it for today. If you want to continue geeking out on the topic, next step may be Google's Web Fundamentals Docs. Happy reading!
Top comments (2)
Great article, filled a couple of gaps in my knowledge.
I don't think the parse through to layout process is a single task anymore depending on the CSS (pure speculation, hence why asking you might actually be able to help my understand something that bugs me!).
see this question and answer on stackoverflow where there was some strange layout shifts with a significantly long page with flexbox.
My understanding is poor but from observation it appear that the parsing of the HTML was split into 2 tasks and lays out / renders the first part early, then discovers the second part and lays out / renders that resulting in a layout shift that on paper shouldn't happen (CSS is inlined, seems to be valid etc.).
It appears to be flexbox related, just wondering if you knew why that would be as I never quite understood?
This is a page that exhibits this behaviour
I would love to know the cause of this as it bugged me that I could not find an answer on it and my best guess was not satisfactory!
Thanks! I don't I am afraid. It is such a vast topic, let me know if you find the answer!