A big bloody mess

Posted by Jacques Chester on Friday, June 1, 2007

Being a sysadmin is a mug’s game.

In particular:

  1. Baffling transient slowdowns.
  2. Backup failing to backup.
  3. Leading to loss of 3 days of site activity.
  4. And loss of sleep.

If authors would like to try and dig up stuff lost in the current round of madness, Google’s cache has a slightly more uptodate copy.

Update: My current hypothesis is that MySQL is doing a lot of work on disk, leading to very high disk-bound wait times. This causes load average to shoot up as work piles up, making the whole thing worse. When pages are rendered, lighttpd can spit them out very very fast. But MySQL is bottle necking. What’s worse, nothing seems to stop it from doing so.

Update: The illusion of timeliness you may (I hope) be experiencing at this point is due to the introduction of even more caching at even more levels. It works fine until there’s a “miss”, when you’ll probably notice the delay. It doesn’t help that there are 3 or 4 different spiders constantly churning through Club Troppo’s large, old (by internet standards) archives. I’ve put in directives telling the spider programs to bugger off on anything more than a year old, but it doesn’t seem to have gotten through just yet.



ShareThis
This entry was posted on Friday, June 1st, 2007 at 7:29 AM and filed under Site News, Uncategorised. Follow comments here with the RSS 2.0 feed. Post a comment or leave a trackback.

5 Responses to “A big bloody mess”

  1. Amanda said:

    Thanks for all your efforts Jacques.

  2. Dave Bath said:

    Sysadm sleep loss more than a standard deviation above personal modal levels is often inversely proportional to paranoia.

  3. paul frijters said:

    we all appreciate your efforts, Jacques

  4. Ken Parish said:

     Jacques

    I strongly suspect that there's more to it than just search engine spidering.  While load times are now better than they were for the last week or so, I haven't yet experienced an occasion when I've attempted to open the site and had an experience that could remotely be described as "fine".  My most recent access is typical - it took 14.801 seconds for the front page to open, and even that appears to be with all the blogroll categories and other database-driven sidebar features disabled.

    Blogs like Road to Surfdom and Larvatus Prodeo have databases just as large as Troppo's, get crawled by search engines at least as often, and operate on a Wordpress platform.  Yet they don't have our drastic speed and reliability problems.  However, I know LP DID have such problems up to a year or so ago.  It might be a good idea to email Mark Bahnisch (and perhaps Tim Dunlop), find out what techie gurus they used, and pick their brains.  For example, it could conceivably be that MySQL isn't up to the task with such large databases including lots of fields containing lots of entries each containing lots of characters (i.e. the posts and comments).

    However, as mentioned in my previous email, you might be best advised to leave all this until you have successfully shifted to Perth.  We can limp along in the current form until then.  It would be best if you don't lose any more sleep or get any more stressed.

  5. John Quiggin said:

    I've had similar problems as you know, and have experienced a significant speedup with WPCache. However, it still gets flaky at times.

Leave a Reply

You must be logged in to post a comment.