Python – Efficient multiple substrings search

pythonstring-matchingstrings

I have many substrings(2-5 words each) which I would like to search in some text of about 40-50 words length. What is the most efficient way to flag matching substrings.

Currently I am simply using:

for substring in substrings:
   if substring in fullText:
      return True

substrings – the list of strings to be searched

fullText – text to be searched on.

Worst case for this solution is all substrings are searched on fullText repeatedly.

Best Answer

Create a regular expression from your list (something like "word1|word2|word3") and use the regular expression functions available for your language. It will hopefully create a data structure optimized for matching, maybe a finite state machine or something equivalent.

Related Topic