R – comparison table for emacs regexp and perl compatible regular expression (PCRE)

emacsregex

Is there a nice table or a cheatsheet on the web that compares the sytax of emacs regex and PCRE?

That I have to remember to escape grouping parenthesis and braces and other differences when I'm using emacs regex, it's all confusing, a syntax comparison table would be good for minimizing confusion.

Best Answer

I will collect syntax differences that I know here. This answer is community wiki, add more if anyone wishes. Always check the preview before adding more.

When to escape ( ) { } |

In Emacs regexp, (, ), {, }, | are literal and escaped ones (\(, \), \{, \}, \|) are meta.

In Perl-compatible regexp, (, ), {, }, | are meta, and escaped ones are literal.

* and +

\* is the literal star in both Emacs and Perl. If an expression starts with a star, the starting star is literal in Emacs regexp, illegal in Perl regexp.

Similarly for the plus.

Character classes

The character classes \d (for digits), \w (for words), \s (for whitespace characters) do not work in Emacs regular expressions, but work in Perl. In Emacs, use [[:digit:]], [[:word:]], [[:space:]] instead (with double brackets). In Perl, they are also [:digit:], [:word:], [:space:] (single brackets).

\w in Emacs matches ' and " too, unlike Perl. This is because text-mode syntax table has ' and " as word characters.

Backslash constructs

Of backslash constructs mentioned in Emacs Regexp Backslash, the following constructs are NOT in Perl compatible regular expressions.

\` \' \= \< \> \_< \_> \sC \cC

See also what \< and > can do that \b cannot do

\A, \Z, \z are NOT in Emacs. In Emacs, use instead:

\` or \'

Complications regarding newlines and interactive usage

See the second section in Text Pattern Matching in Emacs. It also mentions why \n and \t don't match newlines and tabs in incremental search forward for regular expression (C-M-s or M-x isearch-forward-regexp) and what to do.

Etc

Emacswiki regular expression