DEV Community

theoluyi
theoluyi

Posted on

What is HTML, really, deep down?

HTML is something every web developer has to know.

But do you know where it came from? That the creator of it, Tim Berners-Lee, wishes he called it HTM instead of HTML, because of the confusion that arises from SGML, XML, and HTML all ending with “ML?” That HTML evolved out of SGML, but is an application, not a subset of SGML? (XML is a subset of SGML. HTML is not a subset of SGML).

In the olden days,

people were just trying to tag different types of documents. The point wasn’t to create webpages. That comes later. There were all sorts of academic and military desires for annotating documents so that different pieces of data could be located more easily within those documents. So in the same way that an English teacher marks up student papers with blue ink, Generalized Markup Language (GML) was developed at IBM. This eventually developed into Standard Generalized Markup Language (SGML).

An important aspect of these markup languages (or should we say, markup META languages) is that by having starting and ending tags to contain text, they created an unambiguous hierarchy of elements that could be arranged into a tree data structure.

Later on, SGML was adapted by Tim Berners-Lee (who invented the world wide web!) to create HTML for displaying text on webpages. This is a big turning point and important for understanding what HTML is. SGML, a hierarchical system of used for punctuation and identification – angle brackets defining tags to wrap around pieces of text to identify what sort of content/category that text is – is used as a template for creating a system that tells a web browser how that information should be DISPLAYED. To emphasize, SGML itself is agnostic about presentation, its purpose is simply for labeling text data, whereas as HTML uses tags in much the same way for a completely different purpose, i.e., to describe how the webpage should look.

Another important contrast between HTML and SGML is that HTML is a FIXED tagset (you can’t just decide you want an tag in HTML, but you could in SGML if you decided that was your tag to identify keyboard mashes).

Seeing this much effort spent to tag and categorize data, it’s ironic that the very names of these systems have done a poor job of differentiating them from one another, leading to much confusion.

The two enabling technologies of HTML (SGML and XML) both end with ML. This implies to most people that SGML, XML, and HTML are all similar. However, SGML and XML are tagset technologies (essentially, punctuation rules). This is in contrast to HTML which is a FIXED tagset. EXTRA BITS: SGML HTML XML - Computerphile

In reality, however, while Tim Berners-Lee could have averted confusion by calling HTML simply HTM, the naming convention was screwed from the get go. It is really SGML and XML that are misnamed; they should be SGMML and XMML, (“markup META language”).

This would clue more people in to the fact that XML is a SUBSET of SGML. In other words, SGML contains XML. HTML is an APPLICATION of SGML, NOT a subset of SGML. These are different sorts of relationships, different evolutionary chains.

“Meta language & Markup language.
XML and HTML are not the same kind of markup language. Xml can be used to describe and generate other language markup that are interoperable with any kind of application in various presentation [sic] for different target groups and purposes. Given its flexibility, Xml is perfectly suited for marking up information of any kind.”

Here’s an example of an explanation that isn’t exactly incorrect but could easily mislead someone.

It says both of these languages are “born out” of SGML, but does not mention that they are not related to SGML in the same way. XML is like a light-weight clone of SGML; both are used for categorizing text inline using tags. HTML, on the other hand, is used for creating web pages.

“HTML was born out of the Standard Generalized Markup Language (SGML). It provides a set of rules for tagging elements in a document and defining markup languages such as HTML. SGML is not a markup language rather it acts as a language to create markup languages. Another markup language which is born out of SGML is XML.” https://jaxenter.com/html-origin-171035.html

Lastly, what’s up with "hypertext?" Hypertext is text like hyperlinks that allows us to navigate from one webpage to another by clicking on HTML elements including text, images, buttons, etc. While we take this for granted, hypertext is a fundamental technology of the web that didn't exist at one point, and is literally what enables us to “browse” in a web browser.

“HTML is the combination of Hypertext and Markup language. Hypertext defines the link between the web pages. A markup language is used to define the text document within tag which defines the structure of web pages. This language is used to annotate (make notes for the computer) text so that a machine can understand it and manipulate text accordingly.” Geeks for Geeks: HTML vs XML

So we’ve figured out what the hyper text part means, and we already figured out what markup is about (it’s for annotating text inline like an English teacher, assuming your English teacher is trying to categorize and scrape data from all of their students’ papers).

Lastly, language? Is it really a language? Is it a programming language? Apparently, yes, but it’s not Turing complete, so it can’t be considered a full programming language.

However, you can think of it as a declarative programming language specific to programming webpages, where a tag acts as a function and the text enclosed by that tag is like an argument, and you are asking HTML to return, for example, an h1 tag that says “I’m a little teapot.” You don’t have to tell HTML how to create that element, it just does it for you, meaning this is declarative, not imperative, programming.
Lastly, and more practically relevant is that HTML is only for making web pages, and thus could never be considered a general purpose programming language.
HTML IS a Programming Language (Imperative vs Declarative) - Computerphile

SOURCES

  • Professor David Brailsford and Computerphile with amazing explanations of these topics.

Top comments (1)

Collapse
 
stevefan1999personal profile image
Steve Fan • Edited

Wait, is the web using S-expression (Lisp) from the beginning?

Always has been.


Wait, is every compiler and data format hiding a half-assed, bug-ridden Lisp compiler from the beginning?

Always has been. (too)