DEV Community

Gabor Szabo
Gabor Szabo

Posted on • Originally published at code-maven.com

One-liner: Remove first two characters of every line in thousands of files

In a project creating a Ladino dictionary in which I have a few thousands of YAML files. They used to include lists of values, but a while ago I split them up into individual entries. I did this because the people who are editing them are not used to YAML files and it makes it a lot easier to explain them what to do.

However, the previous change left me with 1-item lists in each file. I wanted to clean that up.

Example files

Here are a few examples files that were also reduced in size for this demo.

- ladino: kaza
Enter fullscreen mode Exit fullscreen mode
- ladino: komer
  inglez: to eat
Enter fullscreen mode Exit fullscreen mode
- ladino: biervo
  inglez: word
# some comment
Enter fullscreen mode Exit fullscreen mode

As you can see each one has an entry for a Ladino expression. Some of the files have translations to English. Other files in the real data-set had further translations to Hebrew, Turkish, French, Portuguese, and Spanish.

Some files had comments.

That dash at the first row and the indentation is the left-over from the time when more than one of these were in each file.

So I wanted to get rid of the first two columns in every line, except when they start with a hash-mark (#).

Here is the Perl one-liner to do so.

perl -p -i -e 's/^[^#].//' *.yaml
Enter fullscreen mode Exit fullscreen mode
  • The '*.yaml' at the end is a shell expression that will list all the YAML files in the current directory as the parameters of this command.
  • The -p tells perl to read the content of each file line-by-line and print it.
  • The -i tells perl to replace the original files with the content that was printed.
  • The -e tells perl that the following string is a perl program and not the name of the file where the perl program is
  • The perl program 's/^[^#].//' will be execute on every line read from the files.
  • The 's///' is regex substitution. It works on the current line and changes the current line. So the lines that are saved back to the files are the modified lines.
  • Between the 1st and 2nd slash is the regex.
  • The first ^ means the match must start at the beginning of the line.
  • The [^#] means that there must be a character that is not #. This will match any character on the first place of the file except #.
  • The . means match any character.
  • The string that is between the 2nd and 3rd slash is the replacement. It is an empty string so if there is a match it will be replaced by the empty string.

That's the whole thing.

Improvement

Now that I am explaining it, it occurred to me that this would be a safer solution:

perl -p -i -e 's/^[- ] //' *.yaml
Enter fullscreen mode Exit fullscreen mode

Here the regex is s/^[- ] // which means the first character must be either a dash or a space and the second character must be a space and those two are replaced.
So if there is anything else as the first two characters the line will not be changed. This is safer as it is more specific as what we would like to match for replacement.

Results

For this article I saved the resulting files in a separate place:

ladino: kaza
Enter fullscreen mode Exit fullscreen mode
ladino: komer
inglez: to eat
Enter fullscreen mode Exit fullscreen mode
ladino: biervo
inglez: word
# some comment
Enter fullscreen mode Exit fullscreen mode

Top comments (5)

Collapse
 
__masashi__ profile image
Masashi

Perl seems amazing. I don't think that it is that popular nowadays, but if I want to learn it where can I? I learnt Perl regex but maybe there will be some good "Top to bottom" Perl guide. I'm really interested in Perl after a few pf your posts.

Collapse
 
szabgab profile image
Gabor Szabo • Edited

I am not sure what you mean by "top to bottom", but I can point you to the Perl Tutorial I wrote.

Collapse
 
__masashi__ profile image
Masashi

Thanks :).

Collapse
 
darkwiiplayer profile image
𒎏Wii 🏳️‍⚧️

Nice one, but you can do this with sed instead of perl too and save a bunch of characters 😀

sed -i -e '1s/^[ -] //' *.yaml
Enter fullscreen mode Exit fullscreen mode

will do the trick just fine, and if you really have a lot of work to do:

find . -name '*.yaml' | xargs -P $(nproc) sed -i -e '1s/^[- ]//'
Enter fullscreen mode Exit fullscreen mode

will run it in parallel on as many threads as you have CPU cores 😁

Collapse
 
randalschwartz profile image
Randal L. Schwartz

You kids and your fancy "sed -i". Back in the day, sed didn't have that, but Perl had -i long before! The sed folks wisened up and took that idea as their own!