A number of people have mentioned composing from smaller parts, but no one's provided an example yet, so here's mine:
string number = "(\\d+)";
string unit = "(?:" + number + "\\s*:\\s*)";
string optionalDecimal = "(?:\\s*[.,]\\s*" + number + ")?";
Pattern re = Pattern.compile(
"^\\s*(?:" + unit + "?" + unit + ")?" + number + optionalDecimal + "\\s*$"
);
Not the most readable, but I feel like it's clearer than the original.
Also, C# has the @
operator which can be prepended to a string in order to indicate that it is to be taken literally (no escape characters), so number
would be @"([\d]+)";
match
is just a wrapper for exec
, per ES5 15.5.4.10, step 8(f)(i):
Let result be the result of calling the [[Call]] internal method of exec with rx as the this value and argument list containing S.
For a global regex, match
continuously calls exec
until exec
retruns a null
value.
When we look at exec
, we see that each call to exec
with a global regex increases the regex object's lastIndex
after the match is made:
- Let e be r's endIndex value.
- If global is true,
- Call the [[Put]] internal method of R with arguments "lastIndex", e, and true.
However (here's the real mechanical answer), lastMatch
is only reset by exec
when it is strictly greater than the length of the string:
- Let i be the value of ToInteger(lastIndex).
...
- If i < 0 or i > length, then
- Call the [[Put]] internal method of R with arguments "lastIndex", 0, and true.
- Return null.
(Note i > length
, not i >= length
.)
Therefore, there will be a final attempt to match the substring whose left bound is lastIndex
and whose right bound is the end of the string. Since the final match is done when those positions are identical, a last match attempt is always done on the empty string.
As it happens, d*
matches the empty string (since *
matches zero and up), so that match is included in the match
results.
I cannot offer a surefire explanation why the matching does not stop when lastIndex
equal string length. My guess is that this final empty-string check is necessary to match zero-length terminal regexes like /$/
which would never match if considered against a non-zero-length substring.
Best Answer
Because underscores are second-nature for identifiers in almost all computer languages that matter. Dashes are not; they're typically used as an operator for subtraction, and are specifically excluded from identifiers.