The notion that regex doesn't support inverse matching is not entirely true. You can mimic this behavior by using negative look-arounds:
^((?!hede).)*$
The regex above will match any string, or line without a line break, not containing the (sub)string 'hede'. As mentioned, this is not something regex is "good" at (or should do), but still, it is possible.
And if you need to match line break chars as well, use the DOT-ALL modifier (the trailing s
in the following pattern):
/^((?!hede).)*$/s
or use it inline:
/(?s)^((?!hede).)*$/
(where the /.../
are the regex delimiters, i.e., not part of the pattern)
If the DOT-ALL modifier is not available, you can mimic the same behavior with the character class [\s\S]
:
/^((?!hede)[\s\S])*$/
Explanation
A string is just a list of n
characters. Before, and after each character, there's an empty string. So a list of n
characters will have n+1
empty strings. Consider the string "ABhedeCD"
:
┌──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┬───┬──┐
S = │e1│ A │e2│ B │e3│ h │e4│ e │e5│ d │e6│ e │e7│ C │e8│ D │e9│
└──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┴───┴──┘
index 0 1 2 3 4 5 6 7
where the e
's are the empty strings. The regex (?!hede).
looks ahead to see if there's no substring "hede"
to be seen, and if that is the case (so something else is seen), then the .
(dot) will match any character except a line break. Look-arounds are also called zero-width-assertions because they don't consume any characters. They only assert/validate something.
So, in my example, every empty string is first validated to see if there's no "hede"
up ahead, before a character is consumed by the .
(dot). The regex (?!hede).
will do that only once, so it is wrapped in a group, and repeated zero or more times: ((?!hede).)*
. Finally, the start- and end-of-input are anchored to make sure the entire input is consumed: ^((?!hede).)*$
As you can see, the input "ABhedeCD"
will fail because on e3
, the regex (?!hede)
fails (there is "hede"
up ahead!).
You can access capturing groups like this:
var myString = "something format_abc";
var myRegexp = /(?:^|\s)format_(.*?)(?:\s|$)/g;
var myRegexp = new RegExp("(?:^|\s)format_(.*?)(?:\s|$)", "g");
var match = myRegexp.exec(myString);
console.log(match[1]); // abc
And if there are multiple matches you can iterate over them:
var myString = "something format_abc";
var myRegexp = new RegExp("(?:^|\s)format_(.*?)(?:\s|$)", "g");
match = myRegexp.exec(myString);
while (match != null) {
// matched text: match[0]
// match start: match.index
// capturing group n: match[n]
console.log(match[0])
match = myRegexp.exec(myString);
}
Edit: 2019-09-10
As you can see the way to iterate over multiple matches was not very intuitive. This lead to the proposal of the String.prototype.matchAll
method. This new method is expected to ship in the ECMAScript 2020 specification. It gives us a clean API and solves multiple problems. It has been started to land on major browsers and JS engines as Chrome 73+ / Node 12+ and Firefox 67+.
The method returns an iterator and is used as follows:
const string = "something format_abc";
const regexp = /(?:^|\s)format_(.*?)(?:\s|$)/g;
const matches = string.matchAll(regexp);
for (const match of matches) {
console.log(match);
console.log(match.index)
}
As it returns an iterator, we can say it's lazy, this is useful when handling particularly large numbers of capturing groups, or very large strings. But if you need, the result can be easily transformed into an Array by using the spread syntax or the Array.from
method:
function getFirstGroup(regexp, str) {
const array = [...str.matchAll(regexp)];
return array.map(m => m[1]);
}
// or:
function getFirstGroup(regexp, str) {
return Array.from(str.matchAll(regexp), m => m[1]);
}
In the meantime, while this proposal gets more wide support, you can use the official shim package.
Also, the internal workings of the method are simple. An equivalent implementation using a generator function would be as follows:
function* matchAll(str, regexp) {
const flags = regexp.global ? regexp.flags : regexp.flags + "g";
const re = new RegExp(regexp, flags);
let match;
while (match = re.exec(str)) {
yield match;
}
}
A copy of the original regexp is created; this is to avoid side-effects due to the mutation of the lastIndex
property when going through the multple matches.
Also, we need to ensure the regexp has the global flag to avoid an infinite loop.
I'm also happy to see that even this StackOverflow question was referenced in the discussions of the proposal.
Best Answer
Use zero width assertion lookahead
or