C++ – How to extract the active code path from a complex algorithm

algorithmsc

I have been puzzled lately by an intruiging idea.

I wonder if there is a (known) method to extract the executed source code from a large complex algorithm. I will try to elaborate this question:

Scenario: There is this complex algorithm where a large amount of people have worked on for many years. The algorithm creates measurement descriptions for a complex measurement device.

The input for the algorithm is a large set of input parameters, lets call this the recipe. Based on this recipe, the algorithm is executed, and the recipe determines which functions, loops and if-then-else constructions are followed within the algorithm. When the algorithm is finished, a set of calculated measurement parameters will form the output. And with these output measurement parameters the device can perform it's measurement.

Now, there is a problem. Since the algorithm has become so complex and large over time, it is very very difficult to find your way in the algorithm when you want to add new functionality for the recipes. Basically a person wants to modify only the functions and code blocks that are affected by its recipe, but he/she has to dig in the whole algorithm and analyze the code to see which code is relevant for his or her recipe, and only after that process new functionality can be added in the right place. Even for simple additions, people tend to get lost in the huge amount of complex code.

Solution: Extract the active code path? I have been brainstorming on this problem, and I think it would be great if there was a way to process the algorithm with the input parameters (the recipe), and to only extract the active functions and codeblocks into a new set of source files or code structure. I'm actually talking about extracting real source code here.

When the active code is extracted and isolated, this will result in a subset of source code that is only a fraction of the original source code structure, and it will be much easier for the person to analyze the code, understand the code, and make his or her modifications. Eventually the changes could be merged back to the original source code of the algorithm, or maybe the modified extracted source code can also be executed on it's own, as if it is a 'lite' version of the original algorithm.

Extra information: We are talking about an algorithm with C and C++ code, about 200 files, and maybe 100K lines of code. The code is compiled and build with a custom Visual Studio based build environment.

So…: I really don't know if this idea is just naive and stupid, or if it is feasible with the right amount of software engineering. I can imagine that there have been more similar situations in the world of software engineering, but I just don't know.

I have quite some experience with software engineering, but definitely not on the level of designing large and complex systems.

I would appreciate any kind of answer, suggestion or comment.

Thanks in advance!

Best Answer

I think it depends on what you want to achieve... Do you want to improve the code? parallelize the code? clean it? just understand it?

Besides the great comment given by @Calphool, what I've done in similar cases (but not with 100K lines code to be honest) is this:

  • Look for whoever wrote the code. Or has use it. Asked them what I needed to know, that saves you a lot of time. This may seem stupid, but is not.

  • I made a graph of the dependencies. Take a look at this for an example.

  • Depending on what you need to do, you could measure the execution time of some (or all) function.

  • Start playing with it... but with modern tools, like git. If possible, started to add some tests.

If you want to see what functions get called, you could just print the functions that are called (take a look at this question). You could add a printf at each function using a script, but I don't think that is a good idea. Also, you have to think how are you going to go through the generated output.

After you know what you want, and before making my implementations, I try to isolate the part I need to work on. Meaning, I cleanup the code a little, put in a different file if needed, compile it and test it. Only then I proceed to actually modify the code adding functionality or whatever needs to be done. This may also include port the code to use modern building tools if needed.

My two cents.

Related Topic