Java Regex – How to Generate Strings from Regex

javaregular expressionsstrings

Let's say that we have a regex and an index i. If we suppose that the set of strings that match our regex are sorted in a lexicographical order, how could we get the i element of this list?

Edit : I added this simple example for more explanation:

input :

regex="[a-c]{1,2}";
index=4;

In this case the ordered list of strings that this regex matches contains the following elements:

a
aa
ab
ac
b
ba
bb
bc
c
ca
cb
cc

output :

The 4th element, which is ac

ps: It's known that a given regex could match an infinite number of strings. This shouldn't have an impact on the process of extracting the element in the given finite index.

Best Answer

A regular expression maps directly to a finite state machine. Generating strings from the FSM is straightforward with a backtracking algorithm. The example is for the regular expression /aa?b/

(start) 0 -- a --> 1 -- a --> 2 -- b --> 3 (accepting and terminal)
                   |                     ^
                   |---------b ----------|

State   Previous char   Input char    Input string     History
  0        null                           --             []
                            a              a         
  1        null                                        [{a,1}]
                            a              aa          [{a,1},{a,2}]
  2        null
                            b             aab          [{a,1},{a,2},{b,3}]
  3        null
         Accepting state, emit 'aab' as valid input
         Terminal state, backtrack
  2          b                             aa          [{a,1},{a,2}]
         No further valid inputs, backtrack
  1          a                             a           [{a,1}]
                            b              ab
  3        null                                        [{a,1},{b,3}]
         Accepting state, emit 'ab' as valid input
         Terminal state, backtrack
  1          b                             a           [{a,1}]
         No further valid inputs, backtrack
  0          a                             --          []
         No further valid inputs, stack empty, terminate
Related Topic