Methodology for documenting existing code base

documentationlanguage-agnosticmethodology

I work as part of a team on an existing application that has no inline documentation, nor does it have technical documentation. As I've been working on various bug reports on the application, I've written a sort of breadcrumb trail for myself – bug numbers in various places so that the next developer can refer to that bug number to see what was going on.

My question is thus:

What is the most efficient method for documenting this code? Should I document as I touch the area (the virus method, if you will), or should I document from each section on its own, and not follow paths that branch out into other areas of the application? Should I insert inline comments where none previously existed (with the fear that I may end up incorrectly identifying what the code does)?

What method would you use to accurately and quickly document a rather large application that has no existing inline documentation, nor inline references to external documentation?

Best Answer

Documenting legacy code-bases

I would highly recommend following the scout rule with legacy code-bases.

Trying to document a legacy project independently of working on it will just never happen. Even if you get in contractors to do it, as soon as they finish the project, that documentation will start falling behind all over again, because developers haven't got into the habit of updating it.

In-code documentation

The most important thing is to use the documentation facilities in your chosen development environment, so that means pydoc for python, javadoc in java or xml comments in C#. These make it easy to write the documentation at the same time as writing the code.

If you rely on coming back and documenting things later, you may not get around to it, but if you do it as you are writing the code, what needs to be documented will be fresh in your mind. C# even has the option to issue a compilation warning if the XML documentation is incomplete or inconsistent with the actual code.

Also, if reviewing documentation becomes part of your code review process, everyone can be encouraged to contribute, fostering a sense of ownership of the documentation as well as of the code.

Tests as documentation

Another important aspect is having good integration and unit tests.

Often documentation concentrates on what classes and methods do in isolation, skipping over how they are used together to solve your problem. Tests often put these into context by showing how they interact with each other.

Similarly, unit-tests often point out external dependencies explicitly through which things need to be Mocked out.

I also find that using Test-driven development I write software which is easier to use, because I'm using it right from the word go. With a good testing framework, making code easier to test and making it easy to use are often the same thing.

Higher level documentation

Finally there is what to do about system level and architectural documentation.

Many would advocate writing such documentation in a wiki or using Word or other word processor, but for me the best place for such documentation is also alongside the code, in a plain text format that is version control system friendly.

Just like with in-code documentation, if you store your higher level documentation in your code repository then you are more likely to keep it up to date. You also get the benefit that when you pull out version X.Y of the code, you also get version X.Y of the documentation. In addition, if you use a VCS friendly format, then it means that it is easy to branch, diff and merge, just like your code.

Not only that, but if you use something like readthedocs then you can publish version specific documentation for each software release.

I quite like rst, as it is easy to produce both html pages and pdf documents from it, and is much friendlier than LaTeX, yet can still include LaTeX math expressions when you need them.

Related Topic