DEV Community

Cover image for Time to build a markdown parser and processor (MDL Log #1)

Time to build a markdown parser and processor (MDL Log #1)

edA‑qa mort‑ora‑y on March 03, 2019

I need to write a markdown parser and processor. My writing projects have exceeded the abilities of the tools I currently have. There's also a dear...
Collapse
 
marek profile image
Marek Zaluski

I have a bunch of unfinished projects along the same lines. I kept getting frustrated with my tools and I wanted to build something better.

Every time I went down a rabbit hole and eventually woke up to realize I was just yak-shaving.

These days I've settled to just using MDX for my Markdown, even though it's not perfect. I decided to stick to it as a good-enough solution, and that was a good move: I feel like I can finally relax and focus on my content.

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

I have a fairly clear set of requirements, so I won't be chasing a ghost at least. I've been disappointed with the other tools. Primarily I need to be able to customize the syntax, adding extensions. I need advanced output capability -- I don't want things looking like basic markdown generated documents.

Collapse
 
jgburet profile image
Jean-Guillaume Buret

Looks cool.
List somewhere what features and stuff you want eventually?

Thread Thread
 
mortoray profile image
edA‑qa mort‑ora‑y

I'll get back with user stories. I'm going to do it the proper way, as an example.

Collapse
 
marek profile image
Marek Zaluski

Cool, looking forward to following your progress.

Collapse
 
pwnchaurasia profile image
Pawan Chaurasia

If you could make a tutorial of it. It will be a great learning experience for the beginner programers like me.
Any help will be appriciated. any past opensource project of parser can also help.
Thanks

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

I'll keep posting log updates, that say what I've done. There's a lot to cover, so if you have specific questions you'll be able to ask, and I can answer.

Collapse
 
pwnchaurasia profile image
Pawan Chaurasia

sure.
That will be helpful.
Thanks

Collapse
 
pwnchaurasia profile image
Pawan Chaurasia

Hi,

I was checking your code and I couldn't find a way to run it. I am a newbie in python.
It would be great if you could update the readme file so that I can set it up and understand the flow better.

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

I've updated the readme. I only have a test program at the moment. I'll make it a priority to produce some kind of simple CLI.

Collapse
 
pwnchaurasia profile image
Pawan Chaurasia

very helpful thanks.

Collapse
 
rrampage profile image
Raunak Ramakrishnan • Edited

Regarding converting markdown to an e-book, have you tried using pandoc? I found it very useful for converting to and from various publishing formats.

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

I've used pandoc a few times. I'm looking for something I can customize and extend. I didn't dig too deeply into what pandoc supports, but from my initial looking, it wasn't the type of tool to support my needs. I think I still use it to generate some sphinx docs from markdown for a Python project.

Collapse
 
oscherler profile image
Olivier “Ölbaum” Scherler

There’s some power in Pandoc as it lets you access it’s AST and modify it before feeding it to the output converter. It’s actually pretty nice, but the options in the AST are quite limited (you can’t add a class to a list, for example). The AST is also a pain in the neck to read and write, and a lot of sample code and libraries are outdated.

I wrote a filter that lets me write the ingredients of a recipe for my cooking cards as a list, with the quantities in italics, and output it as a table, for easier formatting:

* _1 cup_ milk
* _1 cup_ flour
* _2 tbsp_ baking powder

is much easier to type than

|        |               |
|--------|---------------|
| 1 cup  | milk          |
| 1 cup  | flour         |
| 2 tbsp | baking powder |

If your project allows the user to modify the AST like that, it could be very powerful and customisable.

Thread Thread
 
mortoray profile image
edA‑qa mort‑ora‑y

Supporting a recipe integration is a high-level feature I need for my recipe site. It currently uses an external YAML file and I combine the bits together with some Python code.

This will be integrated by allowing custom sections in the markdown file. Those sections can have their own parser, or if simple enough, options on the default ones. They can produce custom entries in the AST.

Table support in Markdown now is atrocious. I'll provide a yaml-like syntax that generates tables.

The AST will allow custom translations/visitors as well. Each stage will be well defined with a clear format. My project will be all about customization and extension.

Collapse
 
madhadron profile image
Fred Ross

You can load it as a library if you dig into it, or it has been incorporated into Hakyll with a framework for producing multiple outputs and extracting document information from a bunch of documents. It's probably worth a look if you're already comfortable in Haskell.

Collapse
 
juanvegadev profile image
Juan Vega

I guest the choice of C++ is because you already know it, but if you are looking for a modern fast language. I would suggest to take a look at Rust.
I think it worth the effort.

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

I did a lot of Rust programming while doing AI competitions. Unfortunately, manipulation of trees, what I did then, and what I'm doing now, is a weak point for Rust. I had too many questions that the community was unable to answer.

Though, in this case, the component I'd externalize wouldn't be doing much tree manipulation, so perhaps Rust would be an option.

Collapse
 
juanvegadev profile image
Juan Vega

Very nice experience.

I'm still a rookie at Rust, and I was surprised you didn't pick it over C++. Now it all makes sense

Collapse
 
vberlier profile image
Valentin Berlier

You should definitely take a look at markdown-it. It lets you write plugins through which you can create syntax extensions, access the AST, and from there you can basically make it do anything.

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

It does not appear to support the multiple output case that I want. It's focused on rendering to HTML. I really want an accessible high-level tree where I can do abstract operations and lower to any output format.

Note that I currently use pyMarkdown which also offers extensions. And I've used other packages. I'm not keen on going down another path which isn't guaranteed to do what I want.

Collapse
 
mortoray profile image
edA‑qa mort‑ora‑y

I got a big chunk of the low-level parsing done today. It's probably enough for me to move on to the tree converter.