The Mystery of the Missing Feed

On and off over the past few months I have received emails to say that our feeds don’t appear in aggregators like Google Reader or Bloglines. Or that they turn up late in big bunches. Or days in arrears.

Each time I would fire up my browser, navigate to the feed URL, confirm that the feed was feeding, and promptly blame Google or Bloglines; sometimes for variety I blamed WordPress.

Then yesterday I got an email from James Andrewartha. He wrote:

http://clubtroppo.com.au/feed/ is sending a last-modified header of Thu, 01 Jan 1970 08:00:00 GMT which is messing up my bookmarks script. In other words, EPOCH FAIL!

James’s joke is a Unixy nerd thing — all computers running Unix, a Unix-derivative or a Unix-like OS measure time by counting the seconds elapsed since midnight, 1st of January 1970, known as the “beginning of the epoch”. The 8am GMT thing has to do with our server living in the Perth timezone.

Jokes aside, this was a big fat clue that I was wrong. Eagerly I scurried to my command line to test the claim:

Alchemist:~ jacques$ curl -I http://clubtroppo.com.au/feed/
HTTP/1.1 304 Not Modified
Status: 304 Not Modified
X-Powered-By: PHP/5.2.3-1ubuntu6.3
X-Pingback: http://clubtroppo.com.au/xmlrpc.php
Last-Modified: Thu, 01 Jan 1970 08:00:00 WST
ETag: "4fa3b9ce812b4c6202cb5560c24e0620"
Content-type: text/html
Date: Mon, 31 Mar 2008 09:57:03 GMT
Server: lighttpd/1.4.18

He was right! The server was insisting that the feed was eternal, unchanging, and stuck forever in that morning in 1970 Perth. Like Groundhog Day, except that Google et al weren’t getting the joke11. Aside on ETag: It gets worse, too. See that header named ETag? That’s supposed to be a universally unique code for a given version of a document which caches and feed-gatherers can read to decide whether or not the file has changed. WordPress is lazy and just hashes the last-modified date, so it just kept returning the same ETag for every. single. request. [].

OK, so far, so bad. The question is: why? Why does it never change?

Again I was inclined to blame WordPress. The bad workman blames his tools, especially if they are as mixed in blessings as WordPress is. So I spent a few hours tracing through the code, watching what called what, before finding this innocuous line in a function called get_lastpostdate:


$lastpostdate = $wpdb->get_var("SELECT post_date_gmt FROM $wpdb->posts WHERE post_status = 'publish' ORDER BY post_date_gmt DESC LIMIT 1");

This was the end-of-the-line as far as my tracing had gone. All function calls of interest eventually led here. So I threw it into the database.


mysql> SELECT post_modified_gmt FROM wp_posts WHERE post_status = 'publish' ORDER BY post_modified_gmt DESC LIMIT 1;
+---------------------+
| post_modified_gmt |
+---------------------+
| 2251-07-12 09:05:49 |
+---------------------+
1 row in set (0.00 sec)

Ah. Oh. Um. According to the database, the most recently-changed story was modified in Anno Domini Two Thousand Two Hundred and Fifty One. That would rather explain why the result never changed.

You see, WordPress works out change by remembering the date each time it runs the query. If the date is the same, it counts that as no change and adjusts its headings to match. Of course of the last changed date is hundreds of years in the future, it’s not going to change any time soon. Hopefully my successor will be around to laugh at this.

There were two stories from the period in which Kirk, Spock, McCoy et al are due to seek out new frontiers. One about the new Director General of the WTO, and one about Australia v New Zealand in the reform game. I for one find it reassuring that NZ is still there to poke fun at some centuries hence; though with the abolition of money around that period I am puzzled about the purpose of the WTO. Oh well.

How’d this come about, though?

Here’s a clue. That move was aimed at improving the stability and flexibility of the server, but was an unmitigated disaster for reasons which a trip through the Site News category will outlay in heart-breaking detail. Apparently during that period a number of posts decided that they were from the future, which planted the seeds of the problem hence.

How do we fix it? My first instinct was to bust out some fancy UPDATE foo WHERE bar queries, but in practice opening the posts, fixing the timestamps and saving them has done the trick. Troppo is now correctly identifying how recently the post feed is being updated:


Alchemist:~ jacques$ curl -I http://clubtroppo.com.au/feed/atom/
HTTP/1.1 200 OK
X-Powered-By: PHP/5.2.3-1ubuntu6.3
X-Pingback: http://clubtroppo.com.au/xmlrpc.php
Last-Modified: Mon, 31 Mar 2008 10:28:09 GMT
ETag: "a2ce9e4e5c552e643fa41140e1decd1e"
Content-Type: application/atom+xml; charset=UTF-8
Date: Mon, 31 Mar 2008 11:38:15 GMT
Server: lighttpd/1.4.18

And that, I believe, solves the Mystery of the Missing Feed. I’d be interested to hear from those of you who’ve had feed reader problems in the past as to whether this has, in actual fact, solved the issue.

The moral of the story is that whenever I think I’m smart, I’m not. I forgot the First Rule of Programming: whatever the bug is, whatever the issue, whatever the glitch, never just assume the problem lies with somebody else. Assume that it’s always your fault.

14 thoughts on “The Mystery of the Missing Feed

  1. Troppo feeds are certainly now appearing in my feed reader, which they never did before (although alwats appeared in the communal Google Reader system for some reason, and indeed in Bloglines when I tested it).

  2. ie, WordPress is responsible for some of it. The rest is due to RSS2 being a crappy format and Google Docs vomiting CSS classes into their HTML export.

  3. This morning, I got 20 feeds at once in Bloglines, the first time I’d received anything in months.

    The earliest was dated MAR 23, the latest yesterday.

    The latest feed was the one I’m commenting on.

  4. Ah. Oh. Um.

    This just about sums up my reaction to this most amusing post.

    I figured out how to enjoy it to the best of my abilities, I just skipped the passages containing the computer coding, sort of like you’d skip the genealogies in the book of Genesis if you were there for the myths, or something.

    Nice work, Mr Chester! Ah. Oh. Um… (etc)

  5. BTW, first time I posted a comment this morning, and when I first tried to put that comment through, WordPress told me -

    YOU ARE POSTING COMMENTS TOO QUICKLY. SLOW DOWN.

    Ah! Oh! Um! Millisecond fail?

  6. I’ve found that a very good way of identifying coding problems is to ask someone else to look at it (not always possible, I know). The funny thing is, just having someone hovering at your shoulder, without them even having looked at the offending code, will often make it jump out at you. The outcome is often embarassing (because of the obviousness of the blunder), but a bit of dented ego is usually an acceptable price (particularly if you’ve been staring blankly at it all morning).

  7. David;

    I have a technique where I go out and try to explain the problem to my mum. She’s very good at nodding and uh-huhing as I babble, but the urge to simplify the problem as I explain it to her often gives a useful insight.

    Not always though. In this case it was just plain plodding.

  8. “Alchemist” – so,… who is trying to turn base comments into gold? Jacques or his machine?

    btw: I forget when epoch end is (somewhere about 2037 from memory, or 2103 for unsigned), when posts then might be from about the time the telephone was invented?

  9. Alchemist – so, who is trying to turn base comments into gold? Jacques or his machine?

    I could pretend it’s a hat-tip to the way that Alchemy feed both into the romantic and scientific views of the world and human nature. But mostly it just sounded cool and I was going for a magician theme.

    btw: I forget when epoch end is (somewhere about 2037 from memory, or 2103 for unsigned), when posts then might be from about the time the telephone was invented?

    http://en.wikipedia.org/wiki/Year_2038_problem

    For contrast, VMS will have a Year 10K problem.

  10. As a user that has had a problem, I was unaware that it has allegedly been fixed since instead of using bloglines for Club Troppo, which has only in the last few days or so updated out of the past 2-3 weeks, I had moved to Google Reader which worked and I note so far has only updated Club Troppo every Saturday with all the feeds for the week.

    I do suspect that Bloglines might be fixed though.

    I hope that made sense.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.