Regex – Regular expression search replace in Sublime Text 2

regexreplacesublimetext2

I'm looking to do search replace with regular expressions in Sublime Text 2. The documentation on this is rather anemic. Specifically, I want to do a replace on groups, so something like converting this text:

Hello my name is bob

And this search term:

Find what: my name is (\w)+

Replace with: my name used to be $(1)

The search term works just fine but I can't figure out a way to actually do a replace using the regexp group.

Best Answer

Usually a back-reference is either $1 or \1 (backslash one) for the first capture group (the first match of a pattern in parentheses), and indeed Sublime supports both syntaxes. So try:

my name used to be \1

my name used to be $1

Also note that your original capture pattern:

my name is (\w)+

is incorrect and will only capture the final letter of the name rather than the whole name. You should use the following pattern to capture all of the letters of the name:

my name is (\w+)

Use `\r` instead of `\n`.

Substituting by \n inserts a null character into the text. To get a newline, use \r. When searching for a newline, you’d still use \n, however. This asymmetry is due to the fact that \n and \r do slightly different things:

\n matches an end of line (newline), whereas \r matches a carriage return. On the other hand, in substitutions \n inserts a null character whereas \r inserts a newline (more precisely, it’s treated as the input CR). Here’s a small, non-interactive example to illustrate this, using the Vim command line feature (in other words, you can copy and paste the following into a terminal to run it). xxd shows a hexdump of the resulting file.

echo bar > test
(echo 'Before:'; xxd test) > output.txt
vim test '+s/b/\n/' '+s/a/\r/' +wq
(echo 'After:'; xxd test) >> output.txt
more output.txt

Before:
0000000: 6261 720a                                bar.
After:
0000000: 000a 720a                                ..r.

In other words, \n has inserted the byte 0x00 into the text; \r has inserted the byte 0x0a.

Regex – regular expression to detect a valid regular expression

/
^                                             # start of string
(                                             # first group start
  (?:
    (?:[^?+*{}()[\]\\|]+                      # literals and ^, $
     | \\.                                    # escaped characters
     | \[ (?: \^?\\. | \^[^\\] | [^\\^] )     # character classes
          (?: [^\]\\]+ | \\. )* \]
     | \( (?:\?[:=!]|\?<[=!]|\?>)? (?1)?? \)  # parenthesis, with recursive content
     | \(\? (?:R|[+-]?\d+) \)                 # recursive matching
     )
    (?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )?   # quantifiers
  | \|                                        # alternative
  )*                                          # repeat content
)                                             # end first group
$                                             # end of string
/

This is a recursive regex, and is not supported by many regex engines. PCRE based ones should support it.

Without whitespace and comments:

/^((?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>)?(?1)??\)|\(\?(?:R|[+-]?\d+)\))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*)$/

.NET does not support recursion directly. (The (?1) and (?R) constructs.) The recursion would have to be converted to counting balanced groups:

^                                         # start of string
(?:
  (?: [^?+*{}()[\]\\|]+                   # literals and ^, $
   | \\.                                  # escaped characters
   | \[ (?: \^?\\. | \^[^\\] | [^\\^] )   # character classes
        (?: [^\]\\]+ | \\. )* \]
   | \( (?:\?[:=!]
         | \?<[=!]
         | \?>
         | \?<[^\W\d]\w*>
         | \?'[^\W\d]\w*'
         )?                               # opening of group
     (?<N>)                               #   increment counter
   | \)                                   # closing of group
     (?<-N>)                              #   decrement counter
   )
  (?: (?:[?+*]|\{\d+(?:,\d*)?\}) [?+]? )? # quantifiers
| \|                                      # alternative
)*                                        # repeat content
$                                         # end of string
(?(N)(?!))                                # fail if counter is non-zero.

Compacted:

^(?:(?:[^?+*{}()[\]\\|]+|\\.|\[(?:\^?\\.|\^[^\\]|[^\\^])(?:[^\]\\]+|\\.)*\]|\((?:\?[:=!]|\?<[=!]|\?>|\?<[^\W\d]\w*>|\?'[^\W\d]\w*')?(?<N>)|\)(?<-N>))(?:(?:[?+*]|\{\d+(?:,\d*)?\})[?+]?)?|\|)*$(?(N)(?!))

From the comments:

Will this validate substitutions and translations?

It will validate just the regex part of substitutions and translations. s/<this part>/.../

It is not theoretically possible to match all valid regex grammars with a regex.

It is possible if the regex engine supports recursion, such as PCRE, but that can't really be called regular expressions any more.

Indeed, a "recursive regular expression" is not a regular expression. But this an often-accepted extension to regex engines... Ironically, this extended regex doesn't match extended regexes.

"In theory, theory and practice are the same. In practice, they're not." Almost everyone who knows regular expressions knows that regular expressions does not support recursion. But PCRE and most other implementations support much more than basic regular expressions.

using this with shell script in the grep command , it shows me some error.. grep: Invalid content of {} . I am making a script that could grep a code base to find all the files that contain regular expressions

This pattern exploits an extension called recursive regular expressions. This is not supported by the POSIX flavor of regex. You could try with the -P switch, to enable the PCRE regex flavor.

Regex itself "is not a regular language and hence cannot be parsed by regular expression..."

This is true for classical regular expressions. Some modern implementations allow recursion, which makes it into a Context Free language, although it is somewhat verbose for this task.

I see where you're matching []()/\. and other special regex characters. Where are you allowing non-special characters? It seems like this will match ^(?:[\.]+)$, but not ^abcdefg$. That's a valid regex.

[^?+*{}()[\]\\|] will match any single character, not part of any of the other constructs. This includes both literal (a - z), and certain special characters (^, $, .).

Best Answer

Related Solutions

How to replace a character by a newline in Vim

Use \r instead of \n.

Regex – regular expression to detect a valid regular expression

Related Topic

Use `\r` instead of `\n`.