Continuing Adventures in Downtime
Posted by Jacques Chester on Friday, December 28, 2007
Not long ago I thought our outage problems might be licked. I was wrong, and it seems like the problem is not ours alone.
Currently Club Troppo runs inside a virtual machine which runs on top of our physical server. The software we use to achieve this is called Xen. It uses a clever technique called paravirtualisation to give the advantages of virtual machines with less of a performance hit.
I and my UCC colleagues migrated Club Troppo to this arrangement to ease future problems for Club Troppo. A virtual machine is easier to migrate, backup and manage without hassle. Moving to virtual machine infrastructure meant that I would be able to have features such as hot failover, or to run test-only instances of Club Troppo to preview the latest exciting stupidities on new releases of Wordpress. It also meant that we could share some of the formidable computing power our physical server has with our hosts, the University Computer Club. Even on a very busy day, our server hardly breaks a sweat.
So much for the advantages. It turns out however that we picked the wrong approach, at the wrong time. A sudden shift in the software landscape left us with a buggy, unstable combination.
The problem is this. Xen’s paravirtualisation approach requires both the host operating system and the guest operating system to be modified. So to run Club Troppo we have a Xen-aware kernel running the in the virtual machine. It is specially modified to talk to the physical server’s Xen-aware kernel.
However, the company who support Xen development — XenSource — were acquired not long ago by a company called Citrix. Citrix makes their money in selling very expensive packages designed to help Windows to catch up with stuff Unix nerds have used for 20 years. They bought XenSource as part of a push into the increasingly lucrative virtual machine market. Since they don’t care about Linux, they stopped moving their code forwards to the latest releases.
The last set of XenSource official patches for Xen target the 2.6.18 Linux kernel. The current release we use is 2.6.22, the mainline is at 2.6.23 and 2.6.24 is just around the corner.
And here’s the rub: kernel version 2.6.18 is too old to support our server’s hardware. And third parties have given up on doing the messy gruntwork required to keep porting Xen to each new kernel release. So we can have virtualisation, or hardware support, but not both. The combination which was lashed together before christmas is unstable and tends to drop offline at a whim.
Today I’ve been chatting to UCC colleagues about our options. There are quite a few, but ultimately the sensible ones boil down to two: rolling back from virtualisation or jumping sideways to a different platform. We’re still trying to work out which is the better option, but in the meantime — please bear with us.
This entry was posted on Friday, December 28th, 2007 at 6:04 PM and filed under Site News.
Follow comments here with the RSS 2.0 feed.
Post a comment or leave a trackback.
3 Responses to “Continuing Adventures in Downtime”
Leave a Reply
You must be logged in to post a comment.

Jacques,
I mentioned a few days ago that CT was not being updated on Bloglines (it still isn’t).
I’ve just noticed that Larvatus, since it moved ‘back home’ from exile, isn’t either.
Both blogs are using WP 2.3.1. I wonder if this is a factor.
Posted on 29-Dec-07 at 7:38 am | PermalinkPlease ignore the above: Larvatus was just updated at Bloglines.
CT still hasn’t been updated since Dec 17.
Posted on 29-Dec-07 at 7:41 am | PermalinkSB;
Do Bloglines have a mechanism for submitting help requests? I can’t see anything obviously wrong at our end.
Posted on 29-Dec-07 at 1:16 pm | Permalink