What are the main differences between the Knuth-Morris-Pratt and Boyer-Moore search algorithms

algorithmstring-searchtheory

What are the main differences between the Knuth-Morris-Pratt search algorithm and the Boyer-Moore search algorithm?

I know KMP searches for Y in X, trying to define a pattern in Y, and saves the pattern in a vector. I also know that BM works better for small words, like DNA (ACTG).

What are the main differences in how they work? Which one is faster? Which one is less computer-greedy? In which cases?

Best Answer

In an rough explanation

Boyer-Moore's approach is to try to match the last character of the pattern instead of the first one with the assumption that if there's not match at the end no need to try to match at the beginning. This allows for "big jumps" therefore BM works better when the pattern and the text you are searching resemble "natural text" (i.e. English)

Knuth-Morris-Pratt searches for occurrences of a "word" W within a main "text string" S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. (Source: Wiki)

This means KMP is better suited for small sets like DNA (ACTG)

Related Topic