Realistic Burndown Estimations with Monte Carlo Models (Part 1)

Submitted by Ben on Sun, 08/01/2017 - 09:50

If you're using common-garden Scrum delivery at work, you're probably used to seeing things like this:

Classical burndown estimation

This is a basic estimate of work done over the course of a sprint. It assumes that the work in the sprint will be completed uniformly over time (the flat bits are weekends). What generally happens is that work isn't completed in this way at all, and the graph looks a lot more curved, with lots of points coming in towards the end of the sprint.

First off, it's important to recognise that the burndown estimation is less of an estimation and more of a psychological device, but if we actually want to make realistic estimations of the delivery during a sprint, it's a fairly simple matter to model the team using a Monte-Carlo simulation.

The idea behind Monte Carlo simulations is to create a random set of realistic scenarios, figure out what happens in those cases, then look at the results to see what happens, in general. We can put together a Monte-Carlo simulation of a sprint pretty easily, by modelling the developers and their activity programmatically. I used Python; it would be equally sensible to use Excel or another high-level language.

In this first part, I'm going to look at what happens when our programmers are completely naive, and pick up work at random.

Here are the results for a 100-point sprint, with eight programmers, working over two weeks. Each of the coloured lines represents a different simulation:

Monte Carlo Results with Naive Developers

You can immediately see that the classic "last minute dash" appears straight out of a completely random system. In actual fact, this group of programmers doesn't do very well, because they sit idle once there's no work for them to do. We can make them more efficient, by allowing them to work harder when they're programming, but it actually doesn't make much difference to the results. Here's the average performance of all these experiments:

Average of the Monte Carlo simulations

So how about we use a classic response to this problem - break the sprint up into smaller stories, and check how it affects team performance. I'm going to cut the average story size from 5.6 to 3.8, by cutting some of the bigger stories into two pieces (so, from 13 point stories into 8 and 5, and from 8 pointers into 5 and 3). Let's see what happens:

Smaller stories
Wow, look at the difference! The majority of points in the sprint are completed, but the parabolic delivery - the humped curve instead of the classic line-graph - remains. In this situation, it would be reasonable to expect a development team to match a burndown curve like the one below:

Expected burndown

In the next instalment, I'll look at ways of making our simulated developers behave more sensibly when we have larger stories in the sprint, and have a look at what this can tell us about story allocation in the real world.

[For the curious: The graphs were generated using Pygal. The data was munged in raw Python. I'll release the code when I've finished the series of articles]