C# – Is it possible to write a regex that does one search then uses its results to do another search

cregular expressions

I'm searching for strings matching the pattern [A-Z]\W*[0-9]+, so that in

V-2345
35A235
Q252

the V-2345 and Q252 would match. In another list, I want to find equivalent items that fit the same regex pattern, so that potential matches would include:

V ++ 2345
Q 252

but not

V-252
Q//2345

Basically, if an item matches the pattern in the first list, I want to search for that letter prefix and number suffix in the second list. Is there a term for using regex to do this kind of search? I know that I can just write my own search by using string manipulation to get the letter and number, then use them to compose a separate regex pattern for each search of the second list, but I'm wondering if there's something built into typical regex flavors (I'm using C# in this case) that serves this purpose so that I can just use the original pattern.

Best Answer

Regular expressions work on strings, not on a "string list" and not multiple string lists. Wherever you need to process more than one string, you will typically need some environmental code to do the processing. For your example, this code has to apply the regex to every element of the first list, then collect the results and use this results to process the second list.

Said that, the usual approach to apply a regexp to a list of strings is to concatenate them by a separator character like "newline". To concatenate two lists and distinguish them, you would need at least a special "magic" character or word for separating the first list from the second, which is not part of the list. Using such magic can cause some maintenance headaches if you are not very careful, nevertheless by combining this with backreferences, this can be used to solve your problem.

For example, numbered backreferences like \1 to \9 refer to other capturing groups found before. Lets assume you used "###" as a separator for the two lists, a regexp along the lines of

  ^([A-Z])\W*([0-9]+)$.*###.*^(\1\W*\+\+\W*\2)$
    ^         ^          ^     ^           ^
    |         |          |     |           |
   first    second       |    backrefs to first
   group    group        |    or second group
                       lists
                       separator

might be a first approximation for what you are looking for (beware of bugs, I did not test it). Put this into a global regexp search, then it should produce all pairs of matches which fit to your constraints.

As a final remark: the resulting code may be very compact, nevertheless harder to maintain (and probably slower) than a more explicit solution where you process the two lists individually.