I think since you want the least total R's, you want to reverse your K/V and use the values as keys so your initial list becomes:
- M: R1, R2, R3
- N: R1, R2
- L: R1, R2
- A: R2
Then you can say the count of members intersecting ( http://msdn.microsoft.com/en-us/library/system.linq.enumerable.intersect.aspx ) of M and N is 2, which means you get n-(2 sets * 2 members) number of R's, and if you intersect L you get n-(3*2) number of R's. and if you intersect A as well you get n-(4*1) number of R's, so obviously your best choice is the second intersection set. Unfortunately this is a naive implementation with O(n!) time because you have to interset all possible sets to find the greatest set of intersections which will yield the largest minimization of R's to portray all sets.
After you find the best optimization in this technique where the largest intersection count * sets crossed, remove all the R's in the intersection set from the sets you intersected, then repeat the terribly inefficient combinatoric intersection comparisons against the sets as they are left. Rinse and repeat the comparisons to find largest intersections and remove them from the sets until no Rs are left in any sets.
There is surely a more efficient comparison technique than to intersect each set combination as I'm suggesting. I'm just suggesting an algorithm that should work (I believe) which is likely the naive approach.
Tail call optimization is present in many languages and compilers. In this situation, the compiler recognizes a function of the form:
int foo(n) {
...
return bar(n);
}
Here, the language is able to recognize that the result being returned is the result from another function and change a function call with a new stack frame into a jump.
Realize that the classic factorial method:
int factorial(n) {
if(n == 0) return 1;
if(n == 1) return 1;
return n * factorial(n - 1);
}
is not tail call optimizatable because of the inspection necessary on the return. (Example source code and compiled output)
To make this tail call optimizeable,
int _fact(int n, int acc) {
if(n == 1) return acc;
return _fact(n - 1, acc * n);
}
int factorial(int n) {
if(n == 0) return 1;
return _fact(n, 1);
}
Compiling this code with gcc -O2 -S fact.c
(the -O2 is necessary to enable the optimization in the compiler, but with more optimizations of -O3 it gets hard for a human to read...)
_fact(int, int):
cmpl $1, %edi
movl %esi, %eax
je .L2
.L3:
imull %edi, %eax
subl $1, %edi
cmpl $1, %edi
jne .L3
.L2:
rep ret
(Example source code and compiled output)
One can see in segment .L3
, the jne
rather than a call
(which does a subroutine call with a new stack frame).
Please note this was done with C. Tail call optimization in Java is hard and depends on the JVM implementation (that said, I haven't seen any that do it, because it is hard and implications of the required Java security model requiring stack frames - which is what TCO avoids) -- tail-recursion + java and tail-recursion + optimization are good tag sets to browse. You may find other JVM languages are able to optimize tail recursion better (try clojure (which requires the recur to tail call optimize), or scala).
That said,
There is a certain joy in knowing that you wrote something right - in the ideal way that it can be done.
And now, I'm going to get some scotch and put on some German electronica...
To the general question of "methods to avoid a stack overflow in a recursive algorithm"...
Another approach is to include a recursion counter. This is more for detecting infinite loops caused by situations beyond one's control (and poor coding).
The recursion counter takes the form of
int foo(arg, counter) {
if(counter > RECURSION_MAX) { return -1; }
...
return foo(arg, counter + 1);
}
Each time you make a call, you increment the counter. If the counter gets too big, you error out (in here, just a return of -1, though in other languages you may prefer to throw an exception). The idea is to prevent worse things from happening (out of memory errors) when doing a recursion that is much deeper than expected and likely an infinite loop.
In theory, you shouldn't need this. In practice, I've seen poorly written code that has hit this because of a plethora of small errors and bad coding practices (multithreaded concurrency issues where something changes something outside the method that makes another thread go into an infinite loop of recursive calls).
Use the right algorithm and solve the right problem. Specifically for the Collatz Conjecture, it appears that you are trying to solve it in the xkcd way:
You are starting at a number and doing a tree traversal. This rapidly leads to a very large search space. A quick run to calculate the number of iterations for the correct answer results in about 500 steps. This shouldn't be an issue for recursion with a small stack frame.
While knowing the recursive solution is not a bad thing, one should also realize that many times the iterative solution is better. A number of ways of approaching converting a recursive algorithm to an iterative one can be seen on Stack Overflow at Way to go from recursion to iteration.
Best Answer
To avoid having to enumerate every sequence, it is enough to observe that in any repeated subsequence the first two elements of each occurrence must be the same. In other words, every repeated pair could be the start of a longer repeated subsequence and every number that does not begin a repeated pair could not.
So in your example, the repeated pairs are
3,6
and6,17
. Each of these could start a longer sequence, and nothing else could.So, start with a pass looking at each pair (N-1) and keep all of those that repeat. This will form a set of clusters of sequence starting points.
Then within each cluster take two sequences at a time and compare how far the match extends. Remember the shortest and longest. You have your answer.
In your example, first try to extend
3,6
for each occurrence. The maximum length is 2. Then try to extend6,17
. The maximum length is 5. Problem solved.The data structure for recording the clusters is an interesting design point. I would probably use a dictionary keyed on the first two elements of the sequence and containing a list (or vector) of starting points (indexes into the original data).
This is an outline of an algorithm. It should be enough to provide a starting point if you already understand the problem and are a reasonably good programmer. I'm not sure I can provide much more help without actually writing the code.