Agile Estimation – Normalizing Story Points Across Teams

agileestimationuser-story

We have been thinking about comparing product size/effort at least roughly and this is what some suggested:

  • There are multiple products, each with one scrum team
  • All the scrum teams estimate their stories relative to a common reference story. Therefore, e.g. in Project A, they look at their story and estimate it as requiring twice as much effort than the reference story.
  • In the end, all the projects are estimated with relation to this reference story and are somehow comparable in terms of expected effort – if one project has 200 SP and the other 400 SP, it could be expected that it is roughly twice as much WORK.
  • Individual teams have their own velocities, nobody compares that because productivity is of course different.

An analogy: when digging a hole, I can say that a hole 10 meters deep will be 10x more work than a reference hole (which is 1 meter deep). One team will use an excavator and dig their 10 meters deep hole in 30 minutes. The other team will use a spade and spend 2 hours digging their 1 meter deep hole. But the amount of work done is still the same and can be compared (1 vs 10), regardless of productivity. Sure, SW is far from that simple to estimate but it should not be completely off.

Is there a problem with that? To me it seems fine as the teams only need to compare their work with a common reference story and assign points relative to it (as they would do when estimating using with their own reference story). It is completely fine if team A takes a day to finish a story point while team B takes two days, what matter is that the estimation is consistent.

Best Answer

I have tried this approach with several teams and it does not lead to cross-team efficiency. We always used a reference story, that everybody understood, as the basis and it had a non-1 value (2 in our case) in order to make sure that for things we knew were even smaller than that reference story we could give those a 1.

The concept falls apart as soon as the team is brought into the estimation room and needs to say what something is that is BIGGER than the reference story. Some teams, for some reason, see something as 4 times the reference story, others estimate at 2 times the reference story. They are both correct, because STORY POINTS ARE NOT HOURS.

As long as the team are being consistent in their sizings, the team estimates are valid and you can predict from the velocity that team achieves. But comparing across the teams does not work. Team A, regularly using larger increments from reference, will always seem to be accomplishing more story points in a sprint. Team B will be evaluated as 'low performing', when in actuality they just estimate on a different scale.

This was made even more clear to me when I took about 3 weeks of vacation, allowing for two estimation rounds to be done in my absence with a team lead that ran other teams that had 'higher' estimates. When I returned, our velocity had suddenly jumped dramatically and all the story points for the team were much higher, but we had not actually accomplished any more work.

(This also pointed out that I had an uneven influence on the size of the estimates done by the team)

In conclusion, using a reference story is very helpful. The organization can understand the estimation process and feel like everybody is standardizing. However, beyond that, I would not trust any alignment of estimate sizes across teams unless you also have the same people doing all the estimates and remove the team from the equation.