Discussion on: Demystifying the Long Arrow "Operator"

View post

Note that in every modern browser “the Robin Hood operator” is actually considered a start of a single – line comment if it occurs at the start of the line or is preceded only by whitespace.

So if you accidentally put an EOL after “x”, that code becomes illegal.

OTOH, it is possible to hide some malicious hidden/polyglot code behind the --> operator.

stackoverflow.com/questions/578004...

Basti Ortiz • Dec 12 '18

Wowwwwww... This is just outrageous! As much as I love the simplicity of HTML (and the language itself), I absolutely hate the fact that it forced the JavaScript specification to accommodate for its strange code commenting syntax. In all my years of writing HTML, I personally never liked this  way of commenting code.

gsmoffln • Dec 13 '18 • Edited

Yes, all that overly tolerant HTML tag soup syntax with its complex parsing and fallback rules is inherently evil. The browser should not do any telepathy thinking and guessing on behalf of the programmer, especially in such an unsafe environment as public network computing with plain text communication channels.

That's why I prefer XHTML5 over HTML5, and always put my foreign-language code (CSS, scripts, templates, data blobs, or simply user-entered strings from a database) into well-formed <![CDATA[ ... ]]> sections.

On the other hand, we might as well think of Ecmascript and server-side languages as PHP as especially retrofitted to be mixed with a well-formed XML.

Once Javascript1.7 in Gecko had that E4X extension that actually allowed well-formed XML tags to be valid JS tokens (and first-class data types). Combine that with JS string template interpolation:

<script><![CDATA[
   var span = <span class="{class}">Today is: {new Date.toLocaleDateString()}</span>;
   document.querySelector(`div#output`).appendChild(span);
]]></script>

And you have a perfectly validatable polyglot grammar. As you see, the XHTML/JS/E4X parser contexts are (almost) completely isolated.

Unfortunately, E4X is dead in Gecko, but with its simplistic reincarnation in React as ESX you can get essentially the same.

The only problem is with lazy web developers. Tersiness of XML (hence, XHTML5 too) was never a design goal. So basically we only need to evangelize high quality validating editors. And to teach our students to use them properly, without counting characters.

Lastly, note that syntax and semantics of<!CDATA[ .... ]]> is specifically designed for embedding large unencoded character data verbatim. Contrary to XML/HTML TextNodes or attribute values, its contents are not supposed to be encoded or protected anyhow. It does not have any internal structure except the ending trigraph. It does not interfere with contents inside, as opposed to ordinary <script>, <style>, <pre>, <textarea>, <template> or similar tags. It does not leave junk unlike the <script>//<-- ... //--></script> hack.

The only highly unlikely problem remains: if you really need the actual string "]]>" inside your text block. It is not even syntactically possible nowhere in JS or CSS except comments or constant string literals, where there are many trivial options to work around. You can simply split the trigraph inside the string by means of JS/CSS or by exiting and reentering the CDATA regions. Even if you fail to notice the bug yourself, your editor will probably catch it because both the structure of your script and the XHTML will be severely broken.

Basti Ortiz • Dec 13 '18

Wow, I never knew how much of hassle this was until you explained it to me. Thanks for the new information (at least for me)!