DEV Community

Discussion on: How to create code compressor in JavaScript | HTML Minifier

Collapse
 
prabhukadode profile image
Prabhu

Would you please explore bit more ?

Collapse
 
tehmoros profile image
Piotr "MoroS" Mrożek • Edited

But of course. :)

We can parse a string into a DOM Document with the DOMParser class. From there we can use a function to traverse the DOM and eliminate any text and comment nodes (nodes have types assigned). This is going to be a bit lengthy:

Let's parse a sample document:

const dom = new DOMParser().parseFromString(`
<!doctype html>
<html>
    <head>
        <title>Test</title>
    </head>
    <body>
        <strong>Simple text<\/strong>
        <!-- comment -->
        <script>
            document.write('<em>This is not</em>      <em>a part of the document</em>');
            console.log('This is not as well');
        <\/script>
    </body>
</html>`, "text/html");
Enter fullscreen mode Exit fullscreen mode

We have here a simple HTML document with new lines, tabs/spaces, a comment and a script block. I've had to escape the closing script tag or otherwise Firefox and VSCode were complaining (unterminated string).

Let's write a simple minify function (recursive - I'm lazy ;) ):

function minify(parent) {
  // we have to make a copy of the iterator for traversal, because we cannot
  // iterate through what we'll be modifying at the same time
  const values = [...parent?.childNodes?.values()];
  for (const node of values) {
    if (node.nodeType == Node.COMMENT_NODE) {
      // remove comments node
      parent.removeChild(node);
    } else if (node.nodeType == Node.TEXT_NODE) {
      // test for pure whitespace node (not containing characters other than whitespaces)
      if (!/[^\s]/.test(node.nodeValue)) {
        // remove pure whitespace node
        parent.removeChild(node);
      }
    } else {
      // process child node recursively
      minify(node);
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

It's simple and won't turn into a mess once you try implementing corner cases (like preventing regex from parsing what's inside a script tag). It also gives more flexibility and control (as is the case with code vs regex).

Finally, let's use it:

console.log(`<!doctype ${dom.doctype.name}>\n${dom.childNodes[1].outerHTML}`); // original HTML
minify(dom);
console.log(`<!doctype ${dom.doctype.name}>${dom.childNodes[1].outerHTML}`); // minified HTML
Enter fullscreen mode Exit fullscreen mode

Yes, I know doctypes are a bit more complex, when you take pre-HTML5 document types into account, but for the sake of simplicity let's assume we're only dealing with simple HTML5 document type.
The first log will print the formatted HTML code generated from the unminified DOM Document. The second log will print it after minification (removal of unnecessary nodes). Outputs to compare below:

First logging - before minify:

<!doctype html>
<html><head>
        <title>Test</title>
    </head>
    <body>
        <strong>Simple text</strong>
        <!-- comment -->
        <script>
            document.write('<em>This is not</em>      <em>a part of the document</em>');
            console.log('This is not as well');
        </script>
</body></html>
Enter fullscreen mode Exit fullscreen mode

Second logging - after minify:

<!doctype html><html><head><title>Test</title></head><body><strong>Simple text</strong><script>
            document.write('<em>This is not</em>      <em>a part of the document</em>');
            console.log('This is not as well');
        </script></body></html>
Enter fullscreen mode Exit fullscreen mode

While the HTML document has been minified, the JavaScript code remained unchanged. In our minify function we could add another condition for detecting script tags and minifying them differently (e.g. compare node.nodeType === Node.ELEMENT_NODE and check if node.nodeName === 'SCRIPT').

It's just a simple example of how you could use DOM to minify your HTML. It could also be used as a parser for XML documents and such, among other use cases.

Thread Thread
 
blumed profile image
Cullan

I do like your answer I think it has a great specific use case, but I am confused by the rigidity of this approach. Example if someone minifies a chunk of html which has no doctype or head or body. First how would your code handle full html files and html chunks? From my testing of your code you can either do one or the other but not both. Is there something I am missing because I do like your answer but not sure it has the flexibility of minifying any html you throw at it.