When contributing to a new open source project,
from time to time I searched the codebase for occurrences of the the
.
This is a common mistake in comments in English codebases.
My friend Miroslav came up with an even better way:
Use a regex to find duplicate words!
rg --pcre2 "\b(\w+)\s+\1\b"
rg
stands for ripgrep,
which is a blazing fast implementation of a regex command line tool,
written in Rust.
When trying to understand the above regex, I found an interesting StackOverflow question, where an alternative regex was mentioned, which even handles words with apostrophes, hyphens…
rg --pcre2 "(\b\S+\b)\s+\b\1\b"
The above link to StackOverflow also explains the regex expression.