Did you ever wonder what's the best tool to write an article, user manual, book, or any other kind of text document?
There are many options to choose from. Most people use a What-You-See-Is-What-You-Get (WYSIWYG) editor (also called a text processor), such as Google Docs, LibreOffice or Word. However, more and more people are writing their documents using another, less known option: a document markup language.
Should you, too, use a document markup language instead of a WYSIWYG editor? Let's see.
Note: This article does not compare or evaluate different writing solutions/products. It will not tell you why product X is better than product Y. The purpose of this article is to point out general advantages of document markup languages.
Offline or online WYSIWYG editors are often the best solution for non-technical people who occasionally write short or medium-size documents. Some websites have their own WYSIWYG editor integrated in the website, which makes it very easy to write formatted text. WYSIWYG software is also the right choice for design-intensive publications where you want to have total control of the position, size, font, and other visual properties of the document's elements, and you want to immediately see and trim the end result while working on the document. Examples of such documents are flyers, advertisements, party invitations, posters, etc.
There are many word processors to choose from.
Some word processors offer advanced features for particular tasks, such as writing a novel.
However, as said already, this article focuses on markup languages, so let's move on and see why many people prefer them over WYSIWYG editors.
Note: Readers familiar with markup languages can skip the following two chapters.
A document markup language consists of a set of rules and symbols (special characters) used to annotate plain text. The annotated text can then be read by a markup processor to generate styled documents (e.g. HTML, PDF, ePub, etc.) or any other kind of data.
For example, in some markup languages an underline (
_) is used to emphasize text and render it in italics. Writing:
A _good_ girl.
... results in:
A good girl.
Hence, markup code is just plain text intermixed with markup instructions.
A markup document consists of one or more text files that contain markup code.
There are many document markup languages to choose from.
Suppose you create a text file with the following content (written in Markdown syntax):
# Simple Markup Example This is just a _simpe_ example. Here is a list: - orange - banana - apple
After the above text has been converted to HTML (by the markup processor), the result in the browser looks like this:
The style of the final document can be customized. This is often done by modifying a separate CSS files.
All document markup languages work like this:
A markup document consists of plain text.
Content and presentation are defined in separate files.
The content file contains the text and markup instructions. The presentation file contains the stylesheet (e.g. a CSS file).
It turns out that these two simple concepts lead to an astonishing set of practical advantages, explained in the following chapters.
When you write, you focus on content, not on presentation. You focus on what you want to say, instead of how it should be displayed or printed.
Moreover, you can customize your writing environment (editor) without worrying about the end result. For example, you can use a different font and a different number of characters displayed per line, without thinking about how this will affect the final document.
Thus, when you write, it's easier to be in the flow (in the zone), which Wikipedia describes as a "mental state of operation in which a person performing an activity is fully immersed in a feeling of energized focus, full involvement, and enjoyment in the process of the activity".
This is a big deal!
You can use your preferred text editor or Integrated Development Environment (IDE) to write your document. You are not tied to a specific editor. There is no vendor lock-in.
Imagine a team of writers collaborating on the same document. Everybody just uses the text editor he/she likes the most for the task at hand. For example, Bob and Alice are working on a new user manual, but Bob uses Emacs on Linux, while Alice uses Notepad++ on Windows.
Some high-end text editors provide incredibly powerful features (some out-of-the-box, some via extensions) and are highly customizable, so that you can setup your ideal writing software. As a result, you have a more enjoyable writing experience and you are more productive than with a WYSIWYG editor.
Because content and presentation are defined in separate files, you can change presentation by simply choosing another stylesheet (e.g. CSS file) from a predefined set, and adapt it if needed. If your document is read on different reading/printing devices, you can use different presentations for each device.
Sometimes the same stylesheet is used for many documents. Thus, presentation remains consistent over large sets of documents. Moreover, global presentation changes can often be done in a matter of seconds, because only one file needs to be changed.
Depending on the language and tools you use, you can transform your markup code into final documents of different formats, such as HTML, PDF, ePub etc.
And if your tool can't do it, there is Pandoc, the swiss-army-knife for document conversions. At the time of writing, Pandoc can convert not less than 31 input formats into not less than 49 output formats. That's 31 x 49 = 1,519 transformations supported by one tool.
There are many tools and online services available to handle plain text files - some possibly pre-installed on your PC. You can use them to handle your markup documents, in whatever way you want.
You can use a version control service such as Github, Gitlab, or Bitbucket to track changes and issues, collaborate on documents, synchronize documents on different devices, and use all other powerful features.
To get an idea of free tools for technical people, look at this List of Unix Text Processing Tools. Nowadays, you can also easily install these Linux tools on Windows.
Reading and writing plain text files is very well supported in most programming languages. Therefore it is easier for programmers to develop customized tools to explore and manipulate documents.
For instance, pre-processors and post-processors can be created to add features and automate recurring tasks. A concrete example would be a tool that displays a sorted list of website links used in your document and checks for any broken links.
Moreover, it is easy to programmatically create documents. For instance, a product catalog or a reference manual could be created automatically based on structured data stored in a database.
As content and presentation is defined in plain text files, documents are portable among different operating systems (Windows, Unix/Linux, macOS, etc.). All operating systems have very good support for text files.
In this chapter we'll look at additional advantages found in some document markup languages.
Some markup languages allow you to split a document into different files.
For example, each chapter of a book (and maybe also each sub-chapter) can be stored in a different file and in a directory hierarchy of your choice.
This can be a game-changer when a team collaborates on mid-size or big documents, because it makes editing, reorganizing, and collaborating much more convenient.
Some document markup languages support only presentation tags. The better breed of them prefer semantic tags over presentation tags. This means that, when you use markup, you specify the meaning of a piece of text. You do not specify how the text will be displayed or printed. You define the What, not the How.
A first benefit is that this leads to much more flexibility in the rendering process.
Suppose your text contains several warning messages that need to stand out. If you use a markup language that supports only presentation tags, you could decide to aggressively display a centered text in red on yellow, like this:
This works well if the warnings are displayed on a color screen. But if the document is printed on a color-less printer, or displayed on a black-and-white e-ink device, the result is a mess.
On the other hand, in a markup language that provides semantic tags, you would simply adorn your warnings with a
warning tag. The stylesheet used in the conversion process specifies how all warnings are displayed. Hence, you can globally change the presentation of all warnings for a given output device by simply changing one entry in the corresponding stylesheet. For example, in the stylesheet used for e-ink devices, you could specify to display the warnings in italics with a bigger font. Moreover, if you have other messages that have to stand out, like errors or tips, you can use different, specific tags and handle them separately, without any interference.
A second advantage is that semantic markup opens the door for searchable documentation databases. You can query your markup code and extract useful information. For example, you could create a tool to count the number of warnings contained in the document or extract and save the warnings in a separate file for further exploration.
Advanced markup languages support parameters embedded in the markup code. You first define a parameter by assigning a value to a name (e.g.
email@example.com). Then, later in the document, you use the parameter name, instead of the value. If the value changes later, you just need to change it in one place, which is easy, fast, and less error-prone.
This is an application of the important Don't Repeat Yourself (DRY) principle. It improves maintainability, productivity, and reliability. It is useful for all kinds of recurring text and markup attribute values, especially if they are subject to change. For example: your email address, the price of your product, the name of your dog, or whatever.
Here is a brief summary of additional powerful options:
Sometimes it is convenient to see a preview of the final document (e.g. a HTML page) while typing the markup code. As soon as you edit the markup code, you can immediately see the effect, without the need to re-launch the markup processor. Some editors support this kind of immediate feedback out-of-the-box or by plugins. For example, you type the document in one window, and you see the real-time preview in an adjacent window.
You can think of this as a markup editor with WYSIWYG support.
A public Application Program Interface (API) allows programmers to programmatically execute, change, or extend the markup processor's operations.
At the bare minimum, an API enables other applications to convert documents. For example, a web server could read markup code stored in a file or entered by the user and convert it to HTML on-the-fly, by using the API. This could be used, for instance, to provide an online markup tester, so that people can try out snippets of markup code, without the need to install anything on their PC.
More advanced APIs can provide additional functionality, such as:
Change the rendering of some tags
Add more tags to the language
Add more output formats to the converter
Create a markup document programmatically, by retrieving data from different sources.
Hooks (also called extension points)
Hooks allow programmers to execute functions when specific events occur.
For example, once the Abstract Syntax Tree (AST) (i.e. tree structure) of the document has been created by the markup processor, an extension point can programmatically explore the AST to extract and report useful information, or even change it to implement the most extravagant requirements.
Templates allow you to customize or redefine the rendering of specific tags, by modifying text files containing the template code.
You can use configuration files to extend the language and add your own tags to the markup language, and specify how each tag is rendered.
Processor Directives are special instructions inserted in the markup code and interpreted by the markup processor.
Suppose somebody writes a test sheet for students. The sheet contains instructions that should only be visible for teachers. In that case, a directive could be used to display specific text blocks only if the document is printed for teachers.
The advice of a technical writer who uses an "über-powerful text editor" (spoiler: Emacs) and doesn't like the mouse: How and Why to Dump Your Word Processor
The title says it all: Word Processors: Stupid and Inefficient
An interesting article for novel writers (followed by some insightful comments): Best Book Writing Software: Word vs. Scrivener.
I will soon publish a follow-up article with more specific information and comparisons of different document markup languages.
Should you use a WYSIWYG editor or a markup language?
As so often, the answer depends on your use case.
However, as demonstrated in this article, in many cases a document markup language is the better choice, because you can benefit from numerous advantages. In a nutshell:
When you write, you can focus on writing, because you don't have to think about presentation, and you can use your preferred text editor with your customized setup.
Your writing environment is more flexible and powerful, because you have a lot of options to handle plain text files.
It is easier to automate and customize your writing process, which saves time and reduces errors.
Ultimately, a well-designed document markup language makes your writing experience more enjoyable and increases your productivity.
What's your own experience? Please share it by leaving a comment.
Note: For more insight, please read We Need a New Document Markup Language - Here is Why