Regex

From Things and Stuff Wiki
Revision as of 20:25, 21 March 2017 by Milk (talk | contribs) (→‎Guides)
Jump to navigation Jump to search


General

  • https://en.wikipedia.org/wiki/Regular_expression - a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. "find and replace"-like operations. The concept arose in the 1950s, when the American mathematician Stephen Kleene formalized the description of a regular language, and came into common use with the Unix text processing utilities ed, an editor, and grep, a filter.

In modern usage, "regular expressions" are often distinguished from the derived, but fundamentally distinct concepts of regex or regexp, which no longer describe a regular language. See below for details.

Regexps are so useful in computing that the various systems to specify regexps have evolved to provide both a basic and extended standard for the grammar and syntax; modern regexps heavily augment the standard. Regexp processors are found in several search engines, search and replace dialogs of several word processors and text editors, and in the command lines of text processing utilities, such as sed and AWK.

Many programming languages provide regexp capabilities, some built-in, for example Perl, JavaScript, Ruby, AWK, and Tcl, and others via a standard library, for example .NET languages, Java, Python, POSIX C and C++ (since C++11). Most other languages offer regexps via a library.

POSIX

PCRE

See also Languages#Perl

Guides



Search

Software

grep

  • https://en.wikipedia.org/wiki/grep - a command-line utility for searching plain-text data sets for lines matching a regular expression. Its name comes from the ed command g/re/p (globally search a regular expression and print), which has the same effect: doing a global search with the regular expression and printing all matching lines. Grep was originally developed for the Unix operating system, but is available today for all Unix-like systems.
grep "apple" *.txt
grep ^a.ple oldbashimplementations.txt
  # begin with the letter a, followed by any one character, followed by the letter sequence ple.
grep "stuff" sqldump.sql | fold -w 200 | grep -C 1 "stuff"

The first grep gets the (mile-wide) line that has the match, then fold will split the mile-wide line into 200 char long lines, and "grep -C 1" will show only the one 200 char wide line where the match is + 1 line of context before and after. [5]

sgrep

sift

ack

ag

ag --hidden --ignore .git --ignore .winscp -l -g ""
  # lists all files

ripgrep

strings

  • https://en.wikipedia.org/wiki/strings_(Unix) - a program in Unix-like operating systems that finds and prints text strings embedded in binary files such as executables. It can be used on object files and core dumps. Strings are recognized by looking for sequences of at least 4 (by default) printable characters terminating in a NUL character (that is, null-terminated strings). Some implementations provide options for determining what is recognized as a printable character, which is useful for finding non-ASCII and wide character text. Common usage includes piping its output to grep and fold or redirecting the output to a file.


Web tools

  • RegExr is an online tool to learn, build, & test Regular Expressions (RegEx / RegExp). Results update in real-time as you type. Roll over a match or expression for details. Save & share expressions with others. Explore the Library for help & examples. Undo & Redo with Ctrl-Z / Y. Search for & rate Community patterns.
  • Debuggex - Online visual regex tester. JavaScript, Python, and PCRE.
  • Rubular - a Ruby regular expression editor
  • PCREck - a multi-dialect regular expression editor
  • txt2re - regular expression generator (perl php python java javascript coldfusion c c++ ruby vb vbscript j# c# c++.net vb.net)

Search and replace

Software

  • regexxer is a nifty GUI search/replace tool featuring Perl-style regular expressions


  • Hyperscan is a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, yet functions as a standalone library with its own API written in C. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions, as well as matching of regular expressions across streams of data. [9]

sed

echo "test string oldWord yadayada" | sed 's/oldWord/newWord/g'
find . -name "*.html" -exec sed -i "s/oldWord/newWord/g" '{}' \;
  replace text in multiple files [11]
 echo "<a href="index.html"><img src="logo.svg" id="site-logo"></a>
          <h1>Site Title</h1>" | sed 'N; s@</a>\
          <h1>Site Title</h1>@\
          <h1>Site Title</h1></a>@g'
   multiline replacement

awk

awk \'{print $NF;}\
  # "GG      TC    CC" to "G G      T C       C C"
awk ' { gsub("GG","G G");gsub("TC","T C");gsub("CC","C C");print } ' file
  # [13]

Library

Other

  • Regex Crossword is a crossword puzzle game, where the crossword clues are defined using regular expressions. [16] [17]