Sundeep

Posted on Sep 17, 2020 • Originally published at learnbyexample.github.io

Search and replace tricks with ripgrep

#tutorial #linux #ripgrep #substitution

ripgrep (command name rg) is a grep tool, but supports search and replace as well. rg is far from a like-for-like alternate for sed, but it has nifty features like multiline replacement, fixed string matching, PCRE2 support, etc. This post gives an overview of syntax for substitution and highlights some of the cases where rg is a handy replacement for sed.

Global search and replace

$ cat ip.txt
dark blue, light blue
light orange
blue sky

# by default, line number is displayed if output destination is stdout
# by default, only lines that matched the given pattern is displayed
# 'blue' is search pattern and -r 'red' is replacement string
$ rg 'blue' -r 'red' ip.txt
1:dark red, light red
3:red sky

# --passthru option is useful to print all lines, whether or not it matched
# -N will disable line number prefix
# this command is similar to: sed 's/blue/red/g' ip.txt
$ rg --passthru -N 'blue' -r 'red' ip.txt
dark red, light red
light orange
red sky

Matching Nth occurrence

As seen in previous example, rg will search and replace all occurrences. So, you'll have to be creative with regexp to replace only a specific occurrence per input line.

$ s='see bat hot at but at go gate at sat at but at'

# replace first occurrence only
# same as: sed 's/\bat\b/[xyz]/'
$ echo "$s" | rg --passthru -N '\bat\b(.*)' -r '[xyz]$1'
see bat hot [xyz] but at go gate at sat at but at

# same as: sed 's/\bat\b/[xyz]/3'
# the number within {} is N-1 to replace Nth occurrence, for N>1
$ echo "$s" | rg --passthru -N '^((.*?\bat\b){2}.*?)\bat\b' -r '$1[xyz]'
see bat hot at but at go gate [xyz] sat at but at

# replace last but Nth occurrence, for N>=0
$ echo "$s" | rg --passthru -N '^(.*)\bat\b((.*\bat\b){3})' -r '$1[xyz]$2'
see bat hot at but [xyz] go gate at sat at but at

In-place workaround

rg doesn't support in-place option, so you'll have to do it yourself.

# -N isn't needed here as output destination is a file
# same as: sed -i 's/blue/red/g' ip.txt
$ rg --passthru 'blue' -r 'red' ip.txt > tmp.txt && mv tmp.txt ip.txt

$ cat ip.txt
dark red, light red
light orange
red sky

If you have moreutils installed, then you could use sponge as well.

rg --passthru 'blue' -r 'red' ip.txt | sponge ip.txt

Rust regex and PCRE2

By default, rg uses Rust regular expressions, which is much more featured compared to GNU sed. The main feature not supported is backreference within regexp definition (for performance reasons). See Rust regex documentation for regular expression syntax and features. rg supports Unicode by default.

# non-greedy quantifier is supported
$ s='food land bark sand band cue combat'
$ echo "$s" | rg --passthru 'foo.*?ba' -r '[xyz]'
[xyz]rk sand band cue combat

# unicode support
$ echo 'fox:αλεπού,eagle:αετός' | rg --passthru '\p{L}+' -r '($0)'
(fox):(αλεπού),(eagle):(αετός)

# set operator example, remove all punctuation characters except . ! and ?
$ para='"hi", there! how *are* you? all fine here.'
$ echo "$para" | rg --passthru '[[:punct:]--[.!?]]+' -r ''
hi there! how are you? all fine here.

The -P switch will enable PCRE2 flavor, which has even more tricks. You can also use --engine=auto to allow rg to automatically use PCRE2 when needed (for example: useful as an alias for rg command so that it gives performance of Rust engine by default and use PCRE2 only when needed).

# backreference within regexp definition
$ s='cocoa appleseed tool speechless'
$ echo "$s" | rg --passthru -wP '([a-z]*([a-z])\2[a-z]*){2}' -r '{$0}'
cocoa {appleseed} tool {speechless}

# replace all whole words except 'imp' and 'ant'
$ s='tiger imp goat eagle ant important'
$ echo "$s" | rg --passthru -P '\b(imp|ant)\b(*SKIP)(*F)|\w+' -r '[$0]'
[tiger] imp [goat] [eagle] ant [important]

# recursively match parentheses
$ eqn='(3+a)x * y((r-2)*(t+2)/6) + z(a(b(c(d(e)))))'
$ echo "$eqn" | rg --passthru -P '\((?:[^()]++|(?0))++\)' -r ''
x * y + z

$ # all lowercase letters and optional hyphen combo from start of string
$ s='apple-fig-mango guava grape'
$ echo "$s" | rg --passthru -P '\G([a-z]+)(-)?' -r '($1)$2'
(apple)-(fig)-(mango) guava grape

Extract and modify

The -r option can be used when -o option is active too. The example shown below is not easy to do with sed.

$ s='0501 035 154 12 26 98234'

# numbers >= 100 and ignore leading zeros
$ echo "$s" | rg -woP '0*+(\d{3,})' -r '"$1"' | paste -sd,
"501","154","98234"

Fixed string matching

Like grep, the -F option will allow fixed strings to be matched, a handy option that I feel every search and replace tool should provide.

$ printf '2.3/[4]*6\nfoo\n5.3-[4]*9\n' | rg --passthru -F '[4]*' -r '2'
2.3/26
foo
5.3-29

-F doesn't extend to replacement section though, so you need $$ instead of $ character to represent it literally.

$ echo 'a.*{2}-b' | rg --passthru -F '.*{2}' -r '+$x\tc'
a+\tc-b
$ echo 'a.*{2}-b' | rg --passthru -F '.*{2}' -r '+$$x\tc'
a+$x\tc-b

Multiline matching

Another handy option is -U which enables multiline matching.

$ s='hi there\nhave a nice day\nbye'

# (?s) flag will allow . to match newline characters as well
$ printf '%b' "$s" | rg --passthru -U '(?s)the.*ice' -r ''
hi  day
bye

Handling dos-style input

rg provides support for dos-style files with --crlf option.

# same as: sed -E 's/\w+(\r?)$/xyz\1/'
# note that output will retain CR+LF as line ending
# similar to the sed solution, this will work for unix-style input too
$ printf 'hi there\r\ngood day\r\n' | rg --passthru --crlf '\w+$' -r 'xyz'
hi xyz
good xyz

Speed comparison with GNU sed

Another advantage of rg is that it is likely to be faster than sed. See ripgrep benchmark with other grep implementations by the author for a methodological detailed analysis and insights.

# for small files, initial processing time of rg is a large component
$ time echo 'aba' | sed 's/a/b/g' > f1
real    0m0.002s
$ time echo 'aba' | rg --passthru 'a' -r 'b' > f2
real    0m0.007s

# for larger files, rg is likely to be faster
# 6.2M sample ASCII file
$ wget 'https://norvig.com/big.txt'
$ time LC_ALL=C sed 's/\bcat\b/dog/g' big.txt > f1
real    0m0.060s
$ time rg --passthru '\bcat\b' -r 'dog' big.txt > f2
real    0m0.048s
$ diff -s f1 f2
Files f1 and f2 are identical

# nearly 8 times faster!!
$ time LC_ALL=C sed -E 's/\b(\w+)(\s+\1)+\b/\1/g' big.txt > f1
real    0m0.725s
$ time rg --no-unicode --passthru -wP '(\w+)(\s+\1)+' -r '$1' big.txt > f2
real    0m0.093s
$ diff -s f1 f2
Files f1 and f2 are identical

Other alternatives for sed

rpl — search and replace tool, has interesting options like interactive mode and recursive mode
sd — simple search and replace, implemented in Rust
perl and ruby — programming languages with excellent command line support

DEV Community

Search and replace tricks with ripgrep

Global search and replace

Matching Nth occurrence

In-place workaround

Rust regex and PCRE2

Extract and modify

Fixed string matching

Multiline matching

Handling dos-style input

Speed comparison with GNU sed

Other alternatives for sed

Top comments (0)

Read next

Self-Aligning Dish in Rust: Blink

Automating Flask & PostgreSQL Deployment on KVM with Terraform & Ansible

Bare-Metal Embedded Programming on K230 Using Rust

Amazon SQS: The Backbone of Asynchronous Communication