There’s an old parable used in introductory statistics classes to illustrate how an average can be misleading when maximum values are of interest. The parable is of a person who drowns while walking across a river.
The person can’t swim but is not concerned because the average depth of the river is only 20cm. The problem is the average depth of the river is not useful information here; what is needed is information about the maximum depth so that they don’t end up over their head.
The river might well be only 20cm deep on average but several metres deep in the middle. As with river crossings, so too with various networks loads.
While the precise reason for the meltdown of the Australian Bureau of Statistics (ABS) online census system last night remains unclear, there is a lesson to be learned about load testing.
Prior to the census date of Tuesday, August 9, the ABS announced that there was no danger of the system being unable to handle the load on census night. Why? Because it had tested the system.
Or, rather, the ABS paid a considerable sum of money to an external party to test the system. Load testing is performed to some given specifications and here we find what could be a serious problem in the ABS testing procedure.
In order to reassure the public, who were growing nervous about the new online census, the ABS made the following statement:
The online Census form can handle 1,000,000 form submissions every hour. That’s twice the capacity we expect to need.
From this statement, it seems the ABS load-tested for 1 million submissions per hour, while expecting 0.5 million per hour. But there are between 9 and 10 million households in Australia, and the ABS was expecting around 15 million census submissions in total, with 65% submitted online.
Of course, not all these submissions would come on August 9, but most would. Moreover, the vast majority of these submissions would be expected to come in the peak-traffic time of early evening (between around 6pm and 10pm AEST).
The ABS’s expected load of 0.5 million submissions per hour only makes sense as an average load across a large part of the day. For example, if there were 0.5 million submissions evenly spread across 12 hours on August 9, that would give us 6 million submissions for this period.
But it is clear that load would not be spread evenly. And, to stress the obvious, it is the peak load that we’re interested in. Any reasonable estimate of the peak load for the early evening period is in the vicinity of several million per hour.
Worse still, there is no reason to expect the load to be evenly spread within this period. It is not beyond the realms of plausibility that 3 or 4 million people would be trying to log on to the system at, say, precisely 7.10pm.
Of course, all of this is consistent with an average load of 0.5 million submissions per hour for August 9. But from what the ABS has said, it is not clear that it tested for such peaks.
ABS up to its neck
So we should be careful not to take averages too seriously. As any statistician knows, an average is one (very crude) way of summarising data.
Other summaries include information about the most frequent data (mode), the middle of the data (median) and the spread of the data (variance).
To take the average too seriously in some settings, such as in the river-crossing parable and calculating network loads, is tantamount to confusing the average with the peak (i.e. to take the river to be uniformly 20cm deep or the census submission rate to be uniformly 0.5 million per hour).
It might seem uncharitable to suggest that such an elementary statistical mistake lies behind the ABS website problems last night – especially when talking about an organisation filled with statisticians.
The ABS’s story this morning is that it deliberately shut down the system to protect it from a number of distributed denial-of-service (DDoS) attacks. This is like the river crossing being hit by a flash flood at the crucial time.
But there is good reason to suspect that even without such DDoS attacks, the system was in serious danger of being overloaded. This means even a small rise in the water level, as it were, could have been enough to cause a catastrophic failure.
Our intrepid river crosser may in fact have been drowned by an unexpected flash flood. But given their failure to recognise the limitations of averages as statistical summaries, they were in trouble the moment they dipped their toe in the water.