Tonight I moved my servers out of Amazon's cloud and into Linode's. Life on Amazon's cloud has not been simple these past few weeks.
But, wait, you ask - isn't Amazon the first and most prominent cloud provider? And I've never heard of Linode...
That's what I thought for a while. It's what I thought when I first started noticing problems connecting to Amazon's instances - hey, it's probably my fault. So I set up multiple "ping" services to check my web site every 10 minutes to see if it's available. I used Pingdom.com and WasItUp.com. And guess what I found out - it's not me, it's them. Their instances are simply flaky. And incredibly weak.
Let me explain exactly how bad it is. For 2-3 weeks I had the following configuration: 2 load-balanced Amazon instances that serve simple static HTTP pages with Apache. Now, nobody was using them, just to be clear. They were simply there for the purpose of being pinged every 10 minutes.
The result was multiple periods of unavailability lasting a few minutes each every day. My iPhone's SMS inbox was getting full from all the downtime alerts I was getting. This, you might say, is completely unacceptable already, so why the hell did you launch your Alpha on this platform?
Well, I shouldn't have. I should have trusted my own tests.
But the straw that really broke the camel's back was on launch today (two days ago). The system came up and lots of people got invitations to log in. After about 30 minutes the connection to Amazon became a 3KBps upload silly-straw. This meant that even relatively simple operations like saving a Gigantt plan with a few hundred work items took 20 seconds, and sometimes it timed out. The servers themselves weren't stressed at all - it was Amazon's network, which is also very flaky. This is even after I moved my servers from east-cost U.S. to their Ireland farm, mind you (I initially suspected that the east-cost might be a bit too far and that the EU farm may improve latency for me).
Now that I moved to Linode life seems so much better. I got my own HAProxy, and I don't have to rely on Amazon's weird ELB product for load-balancing. The instances are much much stronger (albeit more costly) - buy, hey, you get what you pay for. And the bandwidth is terrific. Let's just hope it stays like this.
In the process of choosing a cloud provider I tried all the big players: Amazon, GoGrid, Slicehost, Linode and Rackspace (well, I couldn't even finish the signup process on Rackspace, to be honest, it was just so painful, so let's take them out of the list).
I knew Amazon had problems, and yet their reputation made me suspect the situation will be similar on other providers. Even after I tested and compared their availability I hesitated to switch away from Amazon because, well, it worked, most of the time, and I had more urgent things to accomplish. That all changed once my servers stopped working 30 minutes into the launch. I hope I learned my lesson.
I also want to commend Linode's support team. I opened a support ticket within the first hour of signing up and they responded in 5 minutes, and followed up for the next 2 hours to help me sort things out. Excellent customer service. Amazon, btw, has no customer service to speak of (unless you're willing to cough up exorbitant fees).
Bottom line, I hope the transition is smooth and that nobody's experienced any down time at all. If you notice anything fishy please let me know. I'm keeping the Amazon instances on for now, just for backup, until I see everything is nominal.
Thanks, Dima, for telling me about Linode.