I have many substrings(2-5 words each) which I would like to search in some text of about 40-50 words length. What is the most efficient way to flag matching substrings.
Currently I am simply using:
for substring in substrings:
if substring in fullText:
return True
substrings – the list of strings to be searched
fullText – text to be searched on.
Worst case for this solution is all substrings are searched on fullText repeatedly.
Best Answer
Create a regular expression from your list (something like "word1|word2|word3") and use the regular expression functions available for your language. It will hopefully create a data structure optimized for matching, maybe a finite state machine or something equivalent.