DEV Community

Cover image for MarkedText - healthy person MarkDown
Jin
Jin

Posted on

MarkedText - healthy person MarkDown

Hello, my name is Dmitriy Karlovskiy and all my articles (and presentations) I write in MarkDown. And you know what? It's already pretty sick of me! I write texts in Russian, but most of the special characters are only in the English keyboard layout. And editing tables is the eternal Leaning Tower of Pisa from vertical lines. In short, is has problems both with the convenience of editing and with the readability. So let's try to design it from scratch, without dragging tons of puzzling structures along with us.

Principles

  • Unambiguous syntax
  • Easy syntax
  • Uniform syntax
  • Minimal impact on natural looking text
  • Ease of editing regardless of keyboard layout
  • Readability
  • Expandability
  • Fast and reliable memorability

As special formatting characters, it is better to use those that are in most keyboard layout, and not just in English. That is, other things being equal, it is better to give preference to the following characters: ! " ; % : ? * ( ) _ + / \ . , - =

Existing solutions

The Wikipedia has a Comparative Review of Lightweight Markup Languages, so we won't repeat ourselves. The following languages ​​are listed there: AsciiDoc, BBCode, Creole, GitHub Flavored Markdown, Markdown, Markdown Extra, MediaWiki, MultiMarkdown, Org-mode, PmWiki, POD, reStructuredText, Textile, Texy, txt2tag.

Almost all of them adhere to approximately similar designs. However, BBCode is not lightweight - it's practically HTML with square brackets instead of angle brackets. Therefore, we will not consider it further. Let's also ignore POD, whose syntax is too verbose and not very descriptive.

Text blocks

Blocks of text will be separated by a double line break, which will give a visual gap between the blocks. The block type is determined by a special prefix at the beginning of each block line. If the prefix is ​​not recognized, then it is a normal paragraph.

Lists

Lists come in two types: ordered and unordered. They can be nested in each other in any combination. In this case, nested lists are displayed indented.

The elements of an unordered list are preceded by a marker. According to the norms of many languages, such a marker is a dash. Bullets are mostly consumed on the web. Lightweight markup languages ​​mainly use the following:

- item
* item
+ item
Enter fullscreen mode Exit fullscreen mode

Of these, the hyphen is best suited for this role - it looks like a dash and is easily typed from the keyboard. The list item marker and text are separated by a space. Thus, the total indent of the text on the left is equal to 2 - this is how we will indent the nested lists so that they are aligned with the text they refer to. That is, we stop at this option:

- first
- second
  - first of second
    - first of first of second
  - second of second
- third
Enter fullscreen mode Exit fullscreen mode
  • first
  • second
    • first of second
    • first of first of second
    • second of second
  • third

Ordered lists require the display of an incrementing counter before each element. The counter is separated from the text of the list element by a separator - a dot, bracket, etc. Some languages ​​require manual setting of counter values, which is extremely inconvenient to keep up to date:

1. item
2) item
Enter fullscreen mode Exit fullscreen mode

Some languages ​​allow you to specify a special list marker that will be displayed as a counter - this is already much more convenient:

# item
Enter fullscreen mode Exit fullscreen mode

True, the hash symbol is only in the English layout. We will use the plus sign for this, because one is added to the counter for each element of the list:

+ first
+ second
  + first of second
    + first of first of second
  + second of second
+ third
Enter fullscreen mode Exit fullscreen mode
  1. first
  2. second
    1. first of second
      1. first of first of second
    2. second of second
  3. third

Quotes

Quotes combine arbitrary blocks of text to show that someone else is the author. A common practice is to enclose quotes with angle brackets:

> quote
> - list in quote
> > inner quote
Enter fullscreen mode Exit fullscreen mode

However, the angle bracket is only in the English layout. Here, the quotation mark character is much better suited, which is already used in quotes inside strings and is available not only in the English layout:

" quote
" - list in quote
" " inner quote
Enter fullscreen mode Exit fullscreen mode

quote

  • list in quote > inner quote

Tables

Tables are a two-dimensional representation of data. And many languages ​​try to keep it two-dimensional for the sake of clarity:

|=  |= table |= header |
| a | table  | row     |
| b | table  | row     |

|   | table | header |
|---|-------|--------|
| a | table | row    |
| b | table | row    |

First Header | Second Header
------------ | -------------
Content from cell 1 | Content from cell 2
Content in the first column | Content in the second column
Enter fullscreen mode Exit fullscreen mode

However, such a view hits hard on the convenience of editing - you have to manually constantly align the columns.

And only this would be fine, but such a representation has an extremely weak expressiveness - only one paragraph can be placed in a cell. You cannot put several paragraphs, a list, a preformatted block, and so on.

Finally, even with the task of visibility, this syntax does not cope well when the cells contain more than a couple of words - the columns go far to the right and either fall out of the scroll, or start to be transferred at random, breaking the entire alignment:

|   | table | header |
|---|-------|--------|
| a | There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. | row |
| b | table | row |
Enter fullscreen mode Exit fullscreen mode

In short, drawing a sign in the text is a pointless exercise. However, it is possible to preserve the two-dimensionality of the view even without a table - it is enough to align the contents of the cells belonging to different columns, one below the other, but with different indents:

!
  ! table
    ! header
! a
  ! There are many variations of passages of Lorem Ipsum available, but the majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable.
    ! row
! b
  ! table
    ! row
Enter fullscreen mode Exit fullscreen mode

Thus, it is easy for a person to navigate such a structure both horizontally and vertically. And exclamation mark markers are similar to vertical lines, but are available not only in the English layout. Editing such a representation is much more convenient. It is not afraid of sudden text wraps. And inside the cells, you can put any other blocks of text in an arbitrary number.

Headers

In some languages, headings are distinguished by different types of underscores:

Level 1 heading
===============

Level 2 heading
---------------

Level 3
~~~~~~~~~~~~~~~
Enter fullscreen mode Exit fullscreen mode

It looks, of course, beautiful, but typing it is tedious, parsing is difficult, and confusing different types of underscores is easy.

There is a more convenient form, where the heading level is determined by the number of special characters in the prefix:

# Level 1 Heading #
## Level 2 Heading ##
### Level 3 Heading ###
Enter fullscreen mode Exit fullscreen mode

On the right there may be the same suffix, which carries only a decorative role, but it is tedious to type it. Different languages ​​use different special characters:

## Level 2 Heading
== Level 2 Heading
** Level 2 heading
!! Level 2 heading
++ Level 2 Heading
Enter fullscreen mode Exit fullscreen mode

The grid is only in the English layout. We use pluses, asterisks, exclamation marks for other semantics. But massive equals symbols are what you need. Headings divide text into sections, so a couple of horizontal lines illustrate this perfectly:

= Level 1 Heading
== Level 2 Heading
=== Level 3 Heading
Enter fullscreen mode Exit fullscreen mode

Level 1 Heading

Level 2 Heading

Level 3 Heading

Preformatted text

Some languages ​​use special quotation marks for preformatted text:

```markdown
reformatted
       text
```
Enter fullscreen mode Exit fullscreen mode

However, if you also need to display these quotes in such a text, then you get a bummer. Therefore, the variant with a prefix before each line is preferable. For example, an indent equal to 2 or 4 spaces can act as such a prefix:

    reformatted
           text
Enter fullscreen mode Exit fullscreen mode

Preformatted text, as a rule, is not typed directly when typing, but is copied from the outside, where there are no prefixes except just indents. Therefore, this option is quite convenient, because in many cases such text can simply be inserted and nothing more needs to be done with it so that it is displayed correctly.

In preformatted text, every character is important, so there can't be any formatting inside it. However, often for narrative purposes it is necessary to highlight some lines or mark them as deleted/added.

Therefore, we will split the prefix into two parts:

  • 2-space preformatted text marker.
  • Marker for formatting a string of a pair of special characters.

It will look like this:

    reformatted
            text
  --deleted
  --   text
  ++inserted
  ++    text
  **highlighted
  **       text
Enter fullscreen mode Exit fullscreen mode

But what about the indication of the language for this text? And to hell with him. Typing every time the name of the language is tedious. Where this name comes from is not clear. And what to do with languages ​​that are not yet supported is not clear. Automation works better here - it must either recognize the language itself, or use universal highlighting that suits most languages.

Inline formatting

A typical technique for adding special meaning to certain words is to surround them with special "quotes". Such quotation marks should be rare enough in plain text that you don't have to worry about escaping, which negatively affects readability. As a rule, such quotes consist of several repeated special characters. Consider the options for their number:

  • 1 - too small, there is a high risk that a plain text character will be perceived as formatting.
  • 3 - too much, pressing the key three times each time is too tiring.
  • 2 - the golden mean, and let's stop there.

Accents

Accents serve to attract attention, highlighting important points, allegories, connotations, etc. However, two types are mainly used: strong and weak accent. And if the purpose of a strong accent is more or less clear to everyone, then a weak accent is used for whatever and you have to guess what the author meant. So I'm not sure if it's worth supporting at all, but let it be for now.

Strong accent options:

*strong*
**strong**
__strong__
'''strong'''
''strong''
Enter fullscreen mode Exit fullscreen mode

Apostrophes are too similar to quotation marks and seconds, so we immediately discard them. Of the remaining ones, the option with "massive" stars is more common - we will choose it.

**strong**
Enter fullscreen mode Exit fullscreen mode

strong

Weak accent options:

'emphasis'
''emphasis''
_emphasis_
/emphasis/
//emphasis//
*emphasis*
~emphasis~
Enter fullscreen mode Exit fullscreen mode

There's more variety here. However, a light accent is usually displayed in italics, so a slanted line looks most natural and instantly fits into memory.

//emphasis//
Enter fullscreen mode Exit fullscreen mode

emphasis

edits

Sometimes it is necessary to select part of the text as deleted, and part as added. It often looks like strikethrough and underlined text. Both options complicate reading to some extent, so it is preferable to use background highlighting for this, as merge tools do. But in this case, underscores and hyphens are no longer associated with their corresponding formatting type.

So, the typical forms of highlighting additions:

_insertion_
__insertion__
+insertion+
Enter fullscreen mode Exit fullscreen mode

And deletions:

~deletion~
~~deletion~~
-deletion-
--deletion--
Enter fullscreen mode Exit fullscreen mode

Additions and deletions are naturally associated with pluses and minuses regardless of visualization, so let's use them:

++insertion++
--deletion--
Enter fullscreen mode Exit fullscreen mode
  • insertion
  • deletion

Links

Links are of two types:

  • Hyperlink - it takes you to the target when you click on it. For it, the displayed content and the url for the transition are set.
  • Embedding - the target is embedded in the document being viewed. For it, the url is set, where the embedding document is taken from, and alternative content, if the embedding did not happen for some reason (not a supported format, not allowed domain, failed to connect, etc.).

Hyperlinks look like this in different languages:

"Text":http://example.com
http://example.com[Text]
<http://example.com|Text>
[Text|http://example.com]
[[Text|http://example.com]]
[[http://example.com|Text]]
[text http://example.com]
[http://example.com Text]
[Text](http://example.com)
`Text <http://example.com/>`_
Enter fullscreen mode Exit fullscreen mode

And embeds like this:

![title](http://example.com/image.png)
{{http://example.com/image.png|title}}
..image::/path/to/image.jpg
Enter fullscreen mode Exit fullscreen mode

It's important to note that typically only images are allowed to be embedded. If you want to embed a video or a website - at best, it is suggested to write HTML. We will allow you to embed anything - security checks for URLs should occur systemically and be regulated separately. The markup language of the text should not regulate them in any way.

To unambiguously interpret the markup, we need opening and closing quotes, as well as a separator between the url and the content. As special characters, there should be those that cannot be found in the url, and which are easy to enter in any layout. There are only two such characters: \ and ".

Note that embedding is actually quoting from a third-party resource, so the quotation mark is more than appropriate for this. As a delimiter for a resource link and its alternative representation, \ is better suited, which is extremely rare in plain text, unlike the quote.

""Embedded image\http://example.org/favicon.ico""
""Embedded video\https://youtube.com/video=1234""
""Embedded site\https://marked.hyoo.ru/""
Enter fullscreen mode Exit fullscreen mode
  • Embedded image
  • Embedded video
  • Embedded site

If alternative content is omitted, then the url itself must be taken as alternative content. That is, the following two options give identical results:

""http://example.org/favicon.ico""
""http://example.org/favicon.ico\http://example.org/favicon.ico""
Enter fullscreen mode Exit fullscreen mode

http://example.org/favicon.ico

For hyperlinks, we use \ in all places:

\\Clickable text\http://example.org/\\
Clickable url: \\http://example.org/\\
Enter fullscreen mode Exit fullscreen mode

Of course, different types of links can be combined. For example, you can make a link image like this:

\\""Example\http://example.org/favicon.ico""\http://example.org/\\
Enter fullscreen mode Exit fullscreen mode

Example

Inline code

Inline code is displayed in a monospaced font and turns off any formatting inside. Each language has something different:

+monospacetext+
`monospace text`
``monospace text``
|monospacetext|
{{monospace text}}
{{{monospacetext}}}
=code=
~verbatim~
@monospacetext@
@@monospacetext@@
Enter fullscreen mode Exit fullscreen mode

Of the available not only in the English layout, there are only + and =, which we already use. That is, you need to come up with something of your own. Different languages ​​use different characters, so there can be no perfect variant. But it seems ;; is as close as possible to it. The double semicolon is generally meaningless in many languages.

;;monospace text;;
Enter fullscreen mode Exit fullscreen mode

monospace text

Summary

Now let's collect all the format constructs in one short cheat sheet:

= MarkedText

**Lightweight formatting** for plain text.

--

== Principles

+ Syntax:
  - Unambiguity
  - Simplicity
  - Uniformity
+ Appearance:
  - Minimal impact on natural looking text
  - Readability
+ Editing:
  - Independence from the layout
  - Fast and reliable memorability

== Compare with alternatives

! **Language**
  ! **Pros**
    ! **Cons**
! Marked Text
  ! - Convenient table editing.
  ! - Support for complex formatting within cells.
  ! - Ease of implementation.
  ! - Easy to remember consistent syntax.
  ! - Ease of editing in the Russian layout.
  ! - Columns do not spread far to the right beyond horizontal scrolling and do not wrap to a new line.
    ! - Not supported yet by any third party tools.
! MarkDown
  ! - Wide support for various tools.
  ! - Visual presentation of tables.
    ! - Difficulties with editing tables.
    ! - Strongly limited content of cells.

== Parsing

    const res = [ ... $hyoo_marked_line.parse( '**text**' ) ]
  --$mol_assert_equal( res[0].strong, '**text**' )
  ++$mol_assert_equal( res[0].marker, '**' )
  **$mol_assert_equal( res[0].content, 'text' )

== Reviews

" " " Typical user: Not supported anywhere, go to --ass-- ++assassins++ with this syntax!
" "
" " But we're programmers, we can fix it.. You don't even need to be an expert in ;;C++;; ..
"
" No one needs it (c) Couch Expert

However, it is a useful design exercise.

== Links

- Sandbox: \\https://marked.hyoo.ru/\\
- \\MarkedText article\https://github.com/nin-jin/HabHub/issues/39\\
- \\Parser on TS\https://github.com/hyoo-ru/marked.hyoo.ru/\\
- \\Converter to HTML on TS\https://github.com/hyoo-ru/marked.hyoo.ru/tree/master/to/html\\
- ""Build result $mol_regexp\https://github.com/hyoo-ru/mam_mol/workflows/mol_regexp/badge.svg""
Enter fullscreen mode Exit fullscreen mode

Links

Top comments (1)

Collapse
 
preciouschicken profile image
Precious Chicken • Edited

I like the logo too.