How to update a large legacy codebase to meet specific quality standards

large-scale-projectlegacyquality

There is a lot of information about tools and techniques for improving legacy codebases, but I haven't come across any successful real world case studies. Most advice is on the micro level, and while helpful, doesn't convince many people because of a lack of evidence it can help at the macro level.

I am looking specifically for incremental improvements that have been proven to be a success in the real world when updating a large legacy codebase to meet today's quality standards, and not a complete rewrite.

Before:

  • Large: greater than 1MLOC
  • Legacy: no automated tests
  • Poor quality: high complexity, high coupling, high escaped defects

After

  • Automated tests
  • Easier updates/maintenance
  • High quality: lowered complexity, decoupled code, few escaped defects

What kind of incremental steps have been proven in the real world to update a large legacy codebase successfully to meet above quality standards, without going through a total rewrite?

If possible, include an example company or case study of a large legacy project that has gone through a "successful" quality improvement process in your answer to back it up.

Best Answer

Books like http://www.amazon.com/Working-Effectively-Legacy-Michael-Feathers/dp/0131177052 should be witness enough to how large, legacy poor quality code bases are common in the industry.

My guess at why you have not heard or seen, and, more importantly, you will never likely hear about them until you work on one of them yourself, is, nobody seems capable for various reasons, to come out clean and say that their code base was all the above without facing non-trivial repercussions.

This could explain the dearth of studies you speak of. If you read enough books, for example, Peter van der Linden's Deep C Secrets, you will read about million dollar bugs where the part about which project had them will be missing.

NOTE: I wanted to make this a comment, but it was too long. I understand this does not answer the question fully.

EDIT: C++11 & The long term viability of GCC is questioned -- if the developers refactor GCC and make it more toolable as LLVM/clang, it might provide a good example. The discussion notes that the documentation is poor at some places pushing the entry barrier for new developers higher.

Related Topic