A big bloody mess
Posted by Jacques Chester on Friday, June 1, 2007
Being a sysadmin is a mug’s game.
In particular:
- Baffling transient slowdowns.
- Backup failing to backup.
- Leading to loss of 3 days of site activity.
- And loss of sleep.
If authors would like to try and dig up stuff lost in the current round of madness, Google’s cache has a slightly more uptodate copy.
Update: My current hypothesis is that MySQL is doing a lot of work on disk, leading to very high disk-bound wait times. This causes load average to shoot up as work piles up, making the whole thing worse. When pages are rendered, lighttpd can spit them out very very fast. But MySQL is bottle necking. What’s worse, nothing seems to stop it from doing so.
Update: The illusion of timeliness you may (I hope) be experiencing at this point is due to the introduction of even more caching at even more levels. It works fine until there’s a “miss”, when you’ll probably notice the delay. It doesn’t help that there are 3 or 4 different spiders constantly churning through Club Troppo’s large, old (by internet standards) archives. I’ve put in directives telling the spider programs to bugger off on anything more than a year old, but it doesn’t seem to have gotten through just yet.
This entry was posted on Friday, June 1st, 2007 at 7:29 AM and filed under Site News, Uncategorised.
Follow comments here with the RSS 2.0 feed.
Post a comment or leave a trackback.

Thanks for all your efforts Jacques.
Posted on 01-Jun-07 at 8:07 am | PermalinkSysadm sleep loss more than a standard deviation above personal modal levels is often inversely proportional to paranoia.
Posted on 01-Jun-07 at 1:03 pm | Permalinkwe all appreciate your efforts, Jacques
Posted on 01-Jun-07 at 1:58 pm | PermalinkJacques
I strongly suspect that there's more to it than just search engine spidering. While load times are now better than they were for the last week or so, I haven't yet experienced an occasion when I've attempted to open the site and had an experience that could remotely be described as "fine". My most recent access is typical - it took 14.801 seconds for the front page to open, and even that appears to be with all the blogroll categories and other database-driven sidebar features disabled.
Blogs like Road to Surfdom and Larvatus Prodeo have databases just as large as Troppo's, get crawled by search engines at least as often, and operate on a Wordpress platform. Yet they don't have our drastic speed and reliability problems. However, I know LP DID have such problems up to a year or so ago. It might be a good idea to email Mark Bahnisch (and perhaps Tim Dunlop), find out what techie gurus they used, and pick their brains. For example, it could conceivably be that MySQL isn't up to the task with such large databases including lots of fields containing lots of entries each containing lots of characters (i.e. the posts and comments).
However, as mentioned in my previous email, you might be best advised to leave all this until you have successfully shifted to Perth. We can limp along in the current form until then. It would be best if you don't lose any more sleep or get any more stressed.
Posted on 02-Jun-07 at 11:00 am | PermalinkI've had similar problems as you know, and have experienced a significant speedup with WPCache. However, it still gets flaky at times.
Posted on 03-Jun-07 at 2:09 pm | Permalink