Javascript – why regex, when using global search and {0,} quantifier, match the end of the string

javascriptregular expressionsstring-matchingstrings

I have asked a question here about js, regex, quantifiers and global search. I've understood finally how this works, but, let's take a concrete example and then I`ll write my question.

Based on the same example

var str = 'ddd';
var r = /d*/g;
console.log(str.match(r))

it outputs this array: ["ddd", ""]

I understand that the first item in the array is because it matches the letter d and the last item (that empty string) is because it matches the end of the string, which is nothing, so * makes sense because it matches 0 or more occurrences…

So, my questions are:

  • Why this is happening?
  • Why it just have to query the end of the string to finally obtain a
    true matching?

In my opinion, the end of the string(ddd) should not be queried; because it's not like my string is containing an empty space at the end 'ddd '. If my string was empty, it was logical to match, but not in this case. My logic here is this:

for every character in the string, do this search/regex (d*) …so why does it just continue with the end of the string? It should stop on the last charachter of my string, which in this case is d

Best Answer

match is just a wrapper for exec, per ES5 15.5.4.10, step 8(f)(i):

Let result be the result of calling the [[Call]] internal method of exec with rx as the this value and argument list containing S.

For a global regex, match continuously calls exec until exec retruns a null value.

When we look at exec, we see that each call to exec with a global regex increases the regex object's lastIndex after the match is made:

  1. Let e be r's endIndex value.
  2. If global is true,
    • Call the [[Put]] internal method of R with arguments "lastIndex", e, and true.

However (here's the real mechanical answer), lastMatch is only reset by exec when it is strictly greater than the length of the string:

  1. Let i be the value of ToInteger(lastIndex).

...

  1. If i < 0 or i > length, then
    • Call the [[Put]] internal method of R with arguments "lastIndex", 0, and true.
    • Return null.

(Note i > length, not i >= length.)

Therefore, there will be a final attempt to match the substring whose left bound is lastIndex and whose right bound is the end of the string. Since the final match is done when those positions are identical, a last match attempt is always done on the empty string.

As it happens, d* matches the empty string (since * matches zero and up), so that match is included in the match results.

I cannot offer a surefire explanation why the matching does not stop when lastIndex equal string length. My guess is that this final empty-string check is necessary to match zero-length terminal regexes like /$/ which would never match if considered against a non-zero-length substring.

Related Topic