If you take away someone’s livelihood for three or four days, it’s just good manners to apologize. And to show you really mean it, why not add 10 days worth of free service to show you really mean it. Well that’s what Amazon is doing.
They’ve sent out a 5,000 word apology to customers in which they provide a detailed account of what happened. As you might expect, human error is the cause.
From the LA Times story..
At the root of the outage was a incorrectly performed network change “as part of our normal AWS scaling activities” at a data center in northern Virgina, Amazon said.
“The configuration change was to upgrade the capacity of the primary network,” Amazon said in the letter. “During the change, one of the standard steps is to shift traffic off of one of the redundant routers” to allow the upgrade to take place.
But the “traffic shift was executed incorrectly and rather than routing the traffic to the other router on the primary network, the traffic was routed onto the lower capacity redundant” network.
That move not only resulted in a downed primary and secondary server network, the letter said.
“Traffic was purposely shifted away from the primary network and the secondary network couldn’t handle the traffic level it was receiving,” Amazon said.