I think it depends on what you want to achieve... Do you want to improve the code? parallelize the code? clean it? just understand it?
Besides the great comment given by @Calphool, what I've done in similar cases (but not with 100K lines code to be honest) is this:
Look for whoever wrote the code. Or has use it. Asked them what I needed to know, that saves you a lot of time. This may seem stupid, but is not.
I made a graph of the dependencies. Take a look at this for an example.
Depending on what you need to do, you could measure the execution time of some (or all) function.
Start playing with it... but with modern tools, like git. If possible, started to add some tests.
If you want to see what functions get called, you could just print the functions that are called (take a look at this question). You could add a printf at each function using a script, but I don't think that is a good idea. Also, you have to think how are you going to go through the generated output.
After you know what you want, and before making my implementations, I try to isolate the part I need to work on. Meaning, I cleanup the code a little, put in a different file if needed, compile it and test it. Only then I proceed to actually modify the code adding functionality or whatever needs to be done. This may also include port the code to use modern building tools if needed.
My two cents.
If you repeatedly merge the 2 first sequences you will get a runtime that is much worse than merge sort as you compare the first elements far too many times (n log(n)^2???) if you merge the 2 (or more) smallest (adjacent) sequences each time you should approach merge sorts efficiency.
Finding the smallest adjacent sequences might be done by building a tree with roughly half the sequence in each branch recursively and then merging the lowest branches first.
---- Edit
First version:
Essential a mergesort where the separators are the implicit partition of the mergesort algorithm.
Order separators according to their index in the array
MergeIt(first, last) {
if (only one or zero separator)
return first;
split = separator containing the middle separator (first, last)
return inplace_merge(MergeIt(first, split), MergeIt(split, end));
}
This ensures that you only do the minimum number of merges, but not the minimum number of compares as larger sequences might be merge with the current minimum sequence.
Version two:
Still basically a mergesort where we now take into consideration the length of the sequences.
Order separators according to their index in the array
MergeIt(first, last) {
if (only one or zero separator)
return first;
split = separator containing the middle element of array(first, last) // not the middle separator
return inplace_merge(MergeIt(first, split), MergeIt(split, end));
}
The split insures that the smaller sequences merge first as the larger stop higher up in the call tree. This still does not insure the least number of compares as there still is the chance that larger sequences are merge with the current minimum length, though this is still better than version one as the large sequences here are smaller.
Version three:
Merge the adjacent sequences which span the least number of elements
Make heap of pairs of adjacent sequences, sort after minimum length of the pairs, for each sequence only add the shortest of its prev and next.
// each sequence will appear max twice except the first and last.
while (heap.size() > 1) {
min = heap.pop
remove the possible other occurrence of min.first and min.second
sequence = inplace_merge(min.first, min.second)
insert the minimum of pair(prev(sequence), sequence) and pair(sequence, next(sequence)) in heap.
}
The overhead might make this slower than the second version but the compares in made by inplace_merge should now be the minimum.
Best Answer
There are two approaches: