DEV Community

efi shtain
efi shtain

Posted on • Originally published at itnext.io on

VSCode find and replace using regex super powers to remove duplicates

Photo by Markus Spiske on Unsplash

It happens sometimes, mainly in documentation or comments, that a word slips twice. Not the end of the world, but if there is an easy and quick solution, why leave the wrongs unfixed?

Lets fix “a a word” to “a word” without even knowing there is a problem

Well, the first problem is identifying such cases. It won’t be easy to spot a a two consecutive words repeating in a long document (did you see the a a?)

Then, if it happen once, it will happen again. What about other occurrences of the problem? or similar ones? It might happen a lot using the dark magic of copy pasting stuff.

Some background knowledge about regex

Regex is kind of powerful (at least in spec). If you never heard of it, there is a notion of grouping and capturing. Every time you wrap part of the expression in (), it is captured, meaning grouped and numbered in increasing order, so you can use it later in the expression (without knowing the value in advance).

So defining our problem, we look for full words (at least one letter), words should be bounded (so a ab won’t count as duplicate a) and there should be at least one space between the words (otherwise it is still the same word of course)

\b(\w+)\s+\1\b
Enter fullscreen mode Exit fullscreen mode

This regex does exactly what we want, \b — words with a boundary, \w+ give us a word, \s+ catches at least one space. But what is the\1? and why \w+ is wrapped in parentheses? Well this are the exact captured groups we talked about before. (\w+) — match the first occurrence of a word and we capture it, \1 match the first captured group, in our case, the exact same word!

VSCode usage

Vscode has a nice feature when using the search tool, it can search using regular expressions. You can click cmd+f (on a Mac, or ctrl+f on windows) to open the search tool, and then click cmd+option+r to enable regex search.

VSCode find tool, regex search activated, duplicate regex inserted

Using this, you can find duplicate consecutive words easily in any document. You can also search across all your documents at once.

Matching all duplicate consecutive words easily

Now, we just need to replace the duplicate with one instance of the word. For that, toggle the replace mode (click the right arrow) and type in $1. The $1 references the same captured group as before.

VSCode find and replace expanded mode

Now click replace and watch the magic happens!

For brevity, of course there are no duplicates.


Top comments (0)