Legacy Code – Estimating Time Costs in a Legacy Codebase

estimationlegacy code

Recently I started working on a project where a very old monolithic application is being migrated into microservice-based architecture.

The legacy codebase is very messy ('spaghetti code') and often an apparently-simple function (e.g named as "multiplyValueByTen") later reveals itself as "thousands of lines of validation code involving 10 tables across 3 different schemas".

Now my boss is (rightly) asking me to estimate how long would it take to write feature X in the new architecture. But I'm having difficulties coming up with a realistic estimation; often I hugely underestimate the task due to reasons I've stated above and embarrass myself because I can't finish in time.

The sensible thing might seem to really get into the code, note every branch and calls to other functions and then estimate the time cost. But there is really a minuscule difference between documenting the old code and actually writing down the new version.

How should I approach a scenario like this?

While I perfectly understand how legacy code refactoring works, my question is not about "how to do refactor/rewrite?" but about giving a realistic answer to "how long would it take to refactor/rewrite part X?"

Best Answer

Read Bob Martin's "Clean Coder" (and "Clean Code" while you're at it). The following is from memory but I strongly suggest you buy your own copy.

What you need to do is a three point weighted average. You do three estimates for each piece of work:

  • a best case scenario - assuming everything goes right (a)
  • a worst case scenario - assuming everything goes wrong (b)
  • the actual guess - what you think it probably will take (c)

Your estimate is then (a+b+2c)/4

  • No it won't be accurate. There are better ways of estimating but this method is quick, easy to understand and mitigates optimism by making you consider the worst case.
  • Yes you will have to explain to your manager that you are unfamiliar with the code and that it is too unpredictable for you to make firm, accurate estimates without spending a long time investigating the code each time to improve the estimate (offer to do this but say you need n days just to give a firm estimate of how many more days it will take). If you are a "JuniorDev" this should be acceptable for a reasonable manager.
  • You should also explain to your manager that your estimates are averaged, based on best case, worst case and probable case and give them your figures which also gives them the error bars.
  • Do NOT negotiate on an estimate - if your manager tries to use the best case for every estimate (they are a fool - but I've met some like that) and then bully / motivate you into trying to hit the deadline, well, they're going to be disappointed sometimes. Keep explaining the rationale behind the estimates, (best case, worst case and probable case) and keep getting close to the weighted average most times and you should be OK. Also, for your own purposes, keep a spreadsheet of your estimates and add your actuals when you've finished. That should give you a better idea of how to adjust your estimates.

Edit:

My assumptions when I answered this:

  1. The OP is a Junior Developer (based on the chosen username). Any advice given is not therefore from the perspective of a Project Manager or Team Lead who may be expected to be able to carry out more sophisticated estimates depending on the maturity of the development environment.
  2. The Project Manager has created a Project plan consisting of a fairly large number of tasks planned to take several months to deliver.
  3. The OP is being asked to provide a number of estimates for the tasks they are assigned to by their Project Manager who wants a reasonably accurate number (not a probability curve :)) to feed into the project plan and use to track progress.
  4. OP does not have weeks to produce each estimate and has been burned before by giving over-optimistic estimates and wants a more accurate method than sticking a finger in the air and saying "2 weeks, unless the code is particularly arcane in which case 2 months or more".

The three point weighted average works well in this case. It's quick, comprehensible to the non-technical and over several estimates should average out to something approaching accuracy. Especially if OP takes my advice about keeping records of estimates and actuals. When you know what a real-world "Worst case" and "Best case" look like you can feed the actuals into your future estimates and even adjust the estimates for your project manager if the worst case is worse than you thought.

Let's do a worked example:

  • Best case, from experience the fastest I've done a really straightforward one was a week start to finish (5 days)
  • Worst case, from experience, there was that time that there were links everywhere and it ended up taking me 6 weeks (30 days)
  • Actual Estimate, it'll probably take me 2 weeks (10 days)

5+30+2x10 = 55

55/4 = 13.75 which is what you tell your PM. Maybe you round up to 14 days. Over time, (e.g. ten tasks), it should average out.

Don't be afraid to adjust the formula . Maybe half the tasks end up nightmares and only ten percent are easy; so you make the estmate a/10 + b/2 + 2c/5. Learn from your experience.

Note, I am not making any assumptions about the quality of the PM. A bad PM will give a short estimate to the project board to get approval and then bully the project team to try and reach the unrealistic deadline they've committed to. The only defense is to keep a record so you can be seen giving your estimates and getting close to them.