Algorithms – Smallest and Largest Contiguous Repeating Sub-Sequence in a Sequence

algorithms

I am trying to solve a outdated quiz problem.
I am tasked to find the smallest and largest contiguous sequence of repeating numbers in a larger sequence.
e.g.

{1,3,6,17,19,3,6,5,4,2,5,6,17,19,3,6,7,5,78,100,101}

For the sequence above, the smallest (non-zero element sequence) and largest non extensible sub sequence would be smallest - 3,6 and largest - 6,17,19,3,6

Could not get started with an algorithm. Any help to get me started would be much appreciated.

Best Answer

To avoid having to enumerate every sequence, it is enough to observe that in any repeated subsequence the first two elements of each occurrence must be the same. In other words, every repeated pair could be the start of a longer repeated subsequence and every number that does not begin a repeated pair could not.

So in your example, the repeated pairs are 3,6 and 6,17. Each of these could start a longer sequence, and nothing else could.

So, start with a pass looking at each pair (N-1) and keep all of those that repeat. This will form a set of clusters of sequence starting points.

Then within each cluster take two sequences at a time and compare how far the match extends. Remember the shortest and longest. You have your answer.

In your example, first try to extend 3,6 for each occurrence. The maximum length is 2. Then try to extend 6,17. The maximum length is 5. Problem solved.

The data structure for recording the clusters is an interesting design point. I would probably use a dictionary keyed on the first two elements of the sequence and containing a list (or vector) of starting points (indexes into the original data).

This is an outline of an algorithm. It should be enough to provide a starting point if you already understand the problem and are a reasonably good programmer. I'm not sure I can provide much more help without actually writing the code.

Related Topic