Yesterday morning Sparkle was up and down like a yo-yo: between 8am and 1.30pm London time, Sparkle went offline 15 times for a total of 1h28m. No data was lost and Sparkle kept running, but it was achingly slow and intermittently unreachable from the internet.
I am really sorry for all the trouble you had using Sparkle yesterday morning. I know you depend on Sparkle all day every day - and how frustrating yesterday morning was. Monday mornings are bad enough without this kind of thing too.
We run Sparkle on a server we lease from a high-end provider in the U.S. Last week the provider told us they would do a routine upgrade to our server's operating systems on Sunday (20th March).
On Sunday afternoon, once the provider had finished their upgrade, we logged into the server and ran through various checks to ensure Sparkle was working normally. Everything looked good.
Then on Monday morning (yesterday), once all our European customers started using Sparkle, the server began slowing down and then dropping off the web altogether.
We spent a hectic morning working with the provider to diagnose the problem and then, finally, fix it.
How will we avoid this in future?
We have moved Sparkle to a much newer server with quadruple the resources. We are also going to do two things differently in future:
Next time our server provider plans an upgrade, we'll make sure we know exactly what they are going to change so we can take any necessary action in advance.
After future upgrades from our provider, we'll test Sparkle with a heavier load to check it's still going strong.
Keeping you posted
Almost as bad as a disrupted service is not knowing what's going on or when it's going to be fixed. From now on we'll keep you posted on Twitter (follow @sparklehq) whenever something is up.
You can also check Sparkle's performance on our public uptime reports.
Once again, I am very sorry for yesterday's disruption. Thank you for your patience.