A quick note: all of the raw markdown code blocks in the code snippets here are shown with single quotes rather than actual backticks because all the nested backticks plays havoc with Dev.to's markdown renderer. If you're a copy paster, you'll have to change those back to backticks in the python code.
I’m trying out Ulysses for writing, and so far I’m really liking it. But yesterday, I went to export one of my posts to markdown to get ready to post on my blog, and I found a slight hiccup. Ulysses formats all code blocks with tabs instead of spaces. Now, there’s no way that I’m going to get into that whole thing, but I will at least say that PEP8 tells us that the standard for Python is 4 spaces. What kind of self-respecting technical blogger would I be if I posted code samples with—shudder—tabs in my Python code?
So I asked them!
Ryan Palo@paytasticHey @ulyssesapp any way I can control tabs vs spaces in my markdown export? Python code snippets seem like they shouldn’t export with tabs.14:12 PM - 02 Jul 2019
And they responded really quickly and politely told me “not at this time.” Which is fine.
I immediately began researching to figure out how I was going to solve the problem myself. Could I look into customizing Markdown formats? Tweaking the export process? But before I could start spiraling down that rabbit hole, it hit me: it’s just text!
I’m a programmer! I have the power of the universe at my fingertips! (On my machine, at least.) So I got to work on a script for a platform that was designed to work with, modify, and tweak streams of text: the command line.
My first thought was that it’s a real simple substitution. I can do this in a single command!
$ sed $'s/\t/ /g' example.md Hello this is text '''python def thing(): print("Spaces!") ''' There should be four spaces there.
And honestly, that’s probably good enough. But I was on a roll, and I wanted a little more fine-grained control.
I’ll show you the whole thing for those that are just here for the copy/paste, and then I’ll step through the important bits and how they work. Essentially, we loop through each line of the input stream, and if we’re in a Python code block and there’s a tab, we replace the tabs with spaces. I pull in
fileinput from the standard library to put a little polish on the user experience. Here’s what I came up with:
#!/usr/bin/env python3 import argparse import fileinput parser = argparse.ArgumentParser(description="Convert tabs in markdown files to spaces.") parser.add_argument("-a", "--all", action="store_true", help="Convert all tabs to spaces") parser.add_argument("-n", "--number", type=int, default=4, help="Number of spaces to use.") parser.add_argument('files', metavar='FILE', nargs='*', help="files to read, if empty, stdin is used") args = parser.parse_args() if args.all: start_tag = "'''" else: start_tag = "'''python" in_code_block = False for line in fileinput.input(files=args.files): if line.startswith(start_tag) and not in_code_block: in_code_block = True elif line.startswith("'''") and in_code_block: in_code_block = False elif in_code_block: line = line.expandtabs(args.number) print(line, end="")
The meat of the business logic is here:
in_code_block = False for line in fileinput.input(files=args.files): if line.startswith(start_tag) and not in_code_block: in_code_block = True elif line.startswith("'''") and in_code_block: in_code_block = False elif in_code_block: line = line.expandtabs(args.number) print(line, end="")
Loop through each line, keeping track of if we’re in a code block (more on the specifics of that in a minute) or not. If we’re in a code block, expand the tabs! Finally, output the new version of the line.
But what’s all that other stuff? Even in the main code, there’s a reference to
fileinput. What the heck?
fileinput is a neat (and frankly underrated) module in Python’s standard library that allows scripts to load input from one or many file arguments and even STDIN super ergonomically. The most common use case is in the docs for it and it’s almost comically short:
import fileinput for line in fileinput.input(): process(line)
With these lines of code, you can call your script with as many filenames as you want, and Python will string their contents together into one stream of text. For example, if you have a script that prints the capitalized version of all the text it receives called
capitalize.py, you could run it like this:
$ python3 capitalize.py README.md hello.txt banana.rb # THIS IS THE TITLE OF MY README CHECK OUT THE README CONTENTS. SO MANY CONTENTS. NOW HELLO.TXT IS HERE YOOOOOOOO DEF BANANA PUTS 'A MAN, A PLAN, CANAL BANANAMA' END
But Ryan, in your script it looks different! You’re not using it the same way! That’s right. I’m combining it with another CLI power module:
argparse is the standard library way of handling command line arguments, flags, options, and providing a little bit of a user interface. Its particular syntax is one that I always have to look up, but it’s lightweight, works well, and does what I want. Here’s that relevant code. You’ll see how it starts to tie into the
fileinput section above as well.
parser = argparse.ArgumentParser(description="Convert tabs in markdown files to spaces.") parser.add_argument("-a", "--all", action="store_true", help="Convert all tabs to spaces") parser.add_argument("-n", "--number", type=int, default=4, help="Number of spaces to use.") parser.add_argument('files', metavar='FILE', nargs='*', help="files to read, if empty, stdin is used") args = parser.parse_args() if args.all: start_tag = "'''" else: start_tag = "'''python"
We go in three stages:
- First we create the
ArgumentParser. It will managing all of our parsing for us.
- Then we add arguments and specify their behavior. In this case, I added an
--allflag to make it so we could eradicate all tabs and restore order to all of our code blocks, and a
--numberflag to tell it how many spaces to make each tab. This might be useful if I’ve got Ruby or JS examples where I prefer 2 spaces. Lastly, I add a
*args-flavor positional argument for all of the filenames the user wants to provide.
- Finally, now that everything is specified, we parse and process the args. Depending on the type and action we specify for each input, we can expect different behaviors.
The last little trick is how we tie
fileinput together with this little line:
for line in fileinput.input(files=args.files):
fileinput.input takes an optional list of filenames, rather than trying to get them from the passed in script arguments. Because
argparse gobbles up all the command line arguments, we need to tell it to pass those filenames through so
fileinput can do its thing. And it all works like a charm!
No, I don’t have strong opinions on trivial things, why do you ask? In any case, until that feature request to Ulysses makes its way into their queue, I’ve got my little script, and it makes me happy! How well does it work, you ask? Well, do you see tabs or spaces in the code examples in this post?
P.S. I know you command line one-liners are lurking out there just dying to throw an awk or sed string at me that will do exactly what I want without me having to write a single line of extremely readable, maintainable, non-regex Python. 😉 I want to see it! Let 'em fly!