Yesterday's Unavailability

Yesterday Sparkle was unavailable for 51 minutes. This is what happened.


Yesterday Sparkle was unavailable for 51 minutes around lunchtime (UK / Europe time). This is the longest downtime Sparkle's had in years. I'm very sorry for the trouble this caused our customers.

On the up side, no data was lost.

What went wrong?

Sparkle itself was absolutely fine. The problem was a network failure at our data centre which cut off Sparkle (and many other sites) from the rest of the internet.

This sort of failure isn't supposed to happen and the data centre's staff haven't given any explanations yet. Perhaps a construction crew severed a cable buried in the street outside, or somebody tripped over a network cable -- we won't know until they tell us.

In the end, though, it's beside the point. Our customers couldn't use Sparkle and that's what matters.

How can we prevent this in future?

We can't prevent things going wrong at the data centre...but we can keep another server on standby at a completely different data centre and, if the original data centre fails, switch to the reserve.

I've had vague plans for a reserve server for a while. But data centre failures are pretty rare so it hasn't been a priority. I will now move this up the to-do list!

And finally

I post updates on Twitter (follow @sparklehq) if/when something goes wrong, so that's the best place to go for information. And of course you can always email or phone.

I appreciate everyone's patience during the outage, and apologise again for the inconvenience. Downtime is rare for Sparkle and I'll do everthing I can to make it rarer still.

