Regex

The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Things and Stuff Wiki - An organically evolving personal wiki knowledge base. An on-the-fly taxonomy containing a patchwork trail of topic outlines, descriptions, notes, stubs and breadcrumbs, with links to sites, systems, software, manuals, organisations, people, articles, guides, slides, papers, books, comments, videos, screencasts, webcasts, scratchpads and more. Content is orientated towards mostly free/libre/open, mostly Linux. Quality and age varies drastically. Sometimes old things are first, sometimes last. Use the Table of Contents menu to navigate long pages. Zoom in if text is too small. Dead link? Wayback Machine. I probably need to fix the theme CSS after an update. See also libreav.org. Chat to msg me (not checking tho atm). e

Resources

Weather

General

https://en.wikipedia.org/wiki/Regular_expression - a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. The concept arose in the 1950s, when the American mathematician Stephen Kleene formalized the description of a regular language, and came into common use with the Unix text processing utilities ed, an editor, and grep, a filter.

In modern usage, "regular expressions" are often distinguished from the derived, but fundamentally distinct concepts of regex or regexp, which no longer describe a regular language. See below for details.

Regexps are so useful in computing that the various systems to specify regexps have evolved to provide both a basic and extended standard for the grammar and syntax; modern regexps heavily augment the standard. Regexp processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK.

Many programming languages provide regexp capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python, POSIX C and C++ (since C++11). Most other languages offer regexps via a library.

https://drewdevault.com/2017/08/13/When-not-to-use-a-regex.html

POSIX

PCRE

Guides

YouTube: Best of Fluent 2012: /Reg(exp){2}lained/: Demystifying Regular Expressions

https://news.ycombinator.com/item?id=14976648

Learn regular expressions in about 55 minutes [1]

http://regex.learncodethehardway.org/book/

http://net.tutsplus.com/tutorials/javascript-ajax/you-dont-know-anything-about-regular-expressions/

http://www.regular-expressions.info/

Why Using the Greedy .* in Regular Expressions Is Almost Never What You Actually Want [2]

http://www.rexegg.com/regex-best-trick.html [3]

Regular Expression Matching Can Be Simple And Fast (but is slow in Java, Perl, PHP, Python, Ruby, ...)

The 1960's elegance behind Go's regexp https://news.ycombinator.com/item?id=13920469

Lookahead and Lookbehind Tutorial [4]

Web tools

regex101.com - Online regex tester and debugger: PHP, PCRE, Python, Golang and JavaScript

RegExr - an online tool to learn, build, & test Regular Expressions (RegEx / RegExp). Results update in real-time as you type. Roll over a match or expression for details. Save & share expressions with others. Explore the Library for help & examples. Undo & Redo with Ctrl-Z / Y. Search for & rate Community patterns.

http://refiddle.com/

Debuggex - Online visual regex tester. JavaScript, Python, and PCRE.

http://www.rexv.org/

RegexPalregexpal — a JavaScript regular expression tester

Automatic Generation of Regular Expressions from Examples - with examples and generation from csv

Rubular - a Ruby regular expression editor

Regexper - JS

PCREck - a multi-dialect regular expression editor

Regular Expression Analyzer - An online regular expression tool that helps analyzing regular expression structure.

txt2re - regular expression generator (perl php python java javascript coldfusion c c++ ruby vb vbscript j# c# c++.net vb.net)

http://regviz.org/ [5]

Search

https://medium.com/@savolai/regular-expressions-you-can-read-a-new-visual-syntax-526c3cf45df1 [6]

https://beyondgrep.com/feature-comparison [7]

grep

https://en.wikipedia.org/wiki/grep - a command-line utility for searching plain-text data sets for lines matching a regular expression. Its name comes from the ed command g/re/p (globally search a regular expression and print), which has the same effect: doing a global search with the regular expression and printing all matching lines. Grep was originally developed for the Unix operating system, but is available today for all Unix-like systems.

grep "apple" *.txt

grep ^a.ple oldbashimplementations.txt
  # begin with the letter a, followed by any one character, followed by the letter sequence ple.

http://jlebar.com/2012/11/28/GNU_grep_is_10x_faster_than_Mac_grep.html [8]
- http://lists.freebsd.org/pipermail/freebsd-current/2010-August/019310.html

grep "stuff" sqldump.sql | fold -w 200 | grep -C 1 "stuff"

The first grep gets the (mile-wide) line that has the match, then fold will split the mile-wide line into 200 char long lines, and "grep -C 1" will show only the one 200 char wide line where the match is + 1 line of context before and after. [9]

https://medium.com/@rualthanzauva/grep-was-a-private-command-of-mine-for-quite-a-while-before-i-made-it-public-ken-thompson-a40e24a5ef48

http://ridiculousfish.com/blog/posts/old-age-and-treachery.html [10]

Chris's Wiki :: blog/unix/GNUGrepForceText - [11]

sgrep

sgrep - search a file for a structured pattern
- http://sgrep.sourceforge.net/

sift

sift - grep on steroids [12]

ack

ack - a tool like grep, designed for programmers with large heterogeneous trees of source code, ack is written purely in portable Perl 5 and takes advantage of the power of Perl's regular expressions.
- http://linux.die.net/man/1/ack
- http://www.rustyrazorblade.com/2012/03/making-better-use-of-your-ackrc-file/

ag

The Silver Searcher - or Ag is a tool for searching code. It started off as a clone of Ack, but their feature sets have since diverged slightly. In typical usage, Ag is 5-10x faster than Ack.
- https://github.com/ggreer/the_silver_searcher - ag

ag --hidden --ignore .git --ignore .winscp -l -g ""
  # lists all files

ripgrep

ripgrep - faster than {grep, ag, git grep, ucg, pt, sift} [13]
- https://github.com/BurntSushi/ripgrep

rg 'foo' --files-with-matches | xargs sed -i 's/foo/bar/g'
  (GNU sed)
rg 'foo' --files-with-matches | xargs sed -i  's/foo/bar/g'
  (BSD sed) <-- this includes OSx [14]

strings

https://en.wikipedia.org/wiki/strings_(Unix) - a program in Unix-like operating systems that finds and prints text strings embedded in binary files such as executables. It can be used on object files and core dumps. Strings are recognized by looking for sequences of at least 4 (by default) printable characters terminating in a NUL character (that is, null-terminated strings). Some implementations provide options for determining what is recognized as a printable character, which is useful for finding non-ASCII and wide character text. Common usage includes piping its output to grep and fold or redirecting the output to a file.

Google Code Search

https://github.com/google/codesearch - Fast, indexed regexp search over large file trees

qgrep

https://github.com/zeux/qgrep - Fast regular expression grep for source code with incremental index updates

zeux.io - qgrep internals

drep

https://github.com/maxpert/drep - grep with dynamic reloadable filter expressions. This allows filtering stream of logs/lines, while changing filters on the fly.

CUDA grep

CUDA grep - We successfully created a parallel regular expression matcher using CUDA for GPUs. Our implementation is anywhere from 2x-10x faster than grep depending on the workload and about 68x faster than the perl regex engine. We think that this makes it a viable candidate for use in the real world. [15]

Search and replace

regular expressions 101 — an online regex tester for javascript, php, pcre and python.
- http://regex101.com/?tour

http://www.regexplanet.com/advanced/perl/index.html

My Regex Tester - PHP PCRE with search and replace

REGex TESTER - ver. 1.5.3
Regex Tester 2.0 alpha

regexxer is a nifty GUI search/replace tool featuring Perl-style regular expressions

https://github.com/LightboxTech/liblightgrep

Hyperscan is a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, yet functions as a standalone library with its own API written in C. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions, as well as matching of regular expressions across streams of data. [16]

https://code.google.com/p/kiki-re/

sed

World's best introduction to sed [17]
- http://docstore.mik.ua/orelly/unix/sedawk/appa_03.htm
- http://www.pement.org/sed/sedfaq.html

echo "test string oldWord yadayada" | sed 's/oldWord/newWord/g'

sed -i 's/search/replace#/' filename
sed -i 's#test#replace#' filename
  # in-place editing of a file, alternative separators

find . -name "*.html" -exec sed -i "s/oldWord/newWord/g" '{}' \;
  replace text in multiple files [18]

 echo "<a href="index.html"><img src="logo.svg" id="site-logo"></a>
          <h1>Site Title</h1>" | sed 'N; s@</a>\
          <h1>Site Title</h1>@\
          <h1>Site Title</h1></a>@g'
 multiline replacement

http://www.pement.org/awk/awk_sed.txt [19]

https://news.ycombinator.com/item?id=18761307

https://github.com/SoptikHa2/desed/ - Demystify and debug your sed scripts, from comfort of your terminal. [20]

https://github.com/lhoursquentin/sed-bin - sed to C translator written in sed

awk

https://en.wikipedia.org/wiki/AWK - a domain-specific language designed for text processing and typically used as a data extraction and reporting tool. It is a standard feature of most Unix-like operating systems.The AWK language is a data-driven scripting language consisting of a set of actions to be taken against streams of textual data – either run directly on files or used as part of a pipeline – for purposes of extracting or transforming text, such as producing formatted reports. The language extensively uses the string datatype, associative arrays (that is, arrays indexed by key strings), and regular expressions. While AWK has a limited intended application domain and was especially designed to support one-liner programs, the language is Turing-complete, and even the early Bell Labs users of AWK often wrote well-structured large AWK programs AWK was created at Bell Labs in the 1970s, and its name is derived from the surnames of its authors—Alfred Aho, Peter Weinberger, and Brian Kernighan. The acronym is pronounced the same as the name of the bird auk (which acts as an emblem of the language such as on The AWK Programming Language book cover – the book is often referred to by the abbreviation TAPL). When written in all lowercase letters, as awk, it refers to the Unix or Plan 9 program that runs scripts written in the AWK programming language.

Gawk - If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. To write a program to do this in a language such as C or Pascal is a time-consuming inconvenience that may take many lines of code. The job is easy with awk, especially the GNU implementation: gawk.The awk utility interprets a special-purpose programming language that makes it possible to handle simple data-reformatting jobs with just a few lines of code.

awk

http://www.wra1th.plus.com/awk/awkfri.txt

awk \'{print $NF;}\

  # "GG      TC    CC" to "G G      T C       C C"
awk ' { gsub("GG","G G");gsub("TC","T C");gsub("CC","C C");print } ' file
  # [21]

Wasting time with gawk while parsing lsof output [22]

http://ferd.ca/awk-in-20-minutes.html [23]

https://ia802309.us.archive.org/25/items/pdfy-MgN0H1joIoDVoIC7/The_AWK_Programming_Language.pdf [24]

mawk – pattern scanning and text processing language - an interpreter for the AWK Programming Language.

https://github.com/cup/lake - Portable standard library for Awk

https://github.com/TheMozg/awk-raycaster [25]

https://github.com/patsie75/awk-fps - First Person Shooter in gawk

GoAWK - an AWK interpreter written in Go [26]
- https://github.com/benhoyt/goawk

https://github.com/ezrosent/frawk - a small programming language for writing short programs processing textual data. To a first approximation, it is an implementation of the AWK language; many common Awk programs produce equivalent output when passed to frawk. You might be interested in frawk if you want your scripts to handle escaped CSV/TSV like standard Awk fields, or if you want your scripts to execute faster.

sd

https://github.com/chmln/sd - an intuitive find & replace CLI.

bsed

https://github.com/andrewbihl/bsed - Simple, english syntax on top of Perl text processing. Designed to replace simple uses of sed/grep/AWK/Perl. Bsed is a stream editor. In contrast to interactive text editors, stream editors process text in one go, applying a command to an entire input stream or open file. [27]

Library

Regex Colorizer - JS library

https://github.com/fancy-regex/fancy-regex - Rust library for regular expressions using "fancy" features like look-around and backreferences

Hyperscan

Hyperscan - a high-performance multiple regex matching library available as open source with a C API. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers of regular expressions across streams of data.
- https://github.com/intel/hyperscan

Other

Regex Crossword is a crossword puzzle game, where the crossword clues are defined using regular expressions. [28] [29]

https://github.com/VerbalExpressions

https://news.ycombinator.com/item?id=12269468

https://github.com/intel/hyperscan
- Paper: Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs – Branch Free - [31]

YouTube: RubyConf 2021 - Do regex dream of Turing Completeness? by Daniel Magliola

Regex

General

POSIX

PCRE

Guides

Web tools

Search

grep

sgrep

sift

ack

ag

ripgrep

strings

Google Code Search

qgrep

drep

CUDA grep

Search and replace

sed

awk

sd

bsed

Library

Hyperscan

Other

Navigation menu

Search