Design – Best way to analyse a large class before refactoring it into smaller classes

designlegacy codeplanningrefactoring

Foreword

I'm not looking for a way to refactor a large spaghetti code class, that topic has been covered in other questions.

Question

I'm looking for techniques to begin to understand a class file written by another coworker that spans over 4000 lines and has a single huge update method that is more than 2000 lines.

Eventually I hope to build unit tests for this class and to refactor it into many smaller classes that follow DRY and the Single Responsibility Principle.

How can I manage and approach this task? I'd like to be able to eventually draw a diagram of the events that occur within the class and then move on to abstracting functionality, but I'm struggling to obtain a top-down view of what the class's responsibilities and dependencies are.


Edit: In the answers people have already covered topics relating to input parameters, this information is for newcomers:

In this case the main method of the class has no input parameters and runs a while loop until instructed to stop. The constructor takes no parameters either.

This adds to the confusion of the class, its requirements and dependencies. The class gets references to other classes through static methods, singletons and reaching through classes it already has reference to.

Best Answer

Interestingly, refactoring is the most efficient way I've found to understand code like this. You don't have to do it cleanly at first. You can do a quick and dirty analysis refactoring pass, then revert and do it more carefully with unit tests.

The reason this works is because refactoring done properly is a series of small, almost mechanical changes, that you can do without really understanding the code. For example, I can see a blob of code that's getting repeated everywhere and factor that out into a method, without really needing to know how it works. I might give it some lame name at first like step1 or something. Then I notice that step1 and step7 seem to frequently appear together, and factor that out. Then suddenly the smoke has cleared enough that I can actually see the big picture and make some meaningful names.

For finding overall structure, I've found creating a dependency graph often helps. I do this manually using graphviz as I've found the manual work helps me learn it, but there are probably automated tools available. Here's a recent example from a feature I'm working on. I've found this is also an effective way to communicate this information to my colleagues.

dependency graph

Related Topic