On and off over the past few months I have received emails to say that our feeds don’t appear in aggregators like Google Reader or Bloglines. Or that they turn up late in big bunches. Or days in arrears.
Each time I would fire up my browser, navigate to the feed URL, confirm that the feed was feeding, and promptly blame Google or Bloglines; sometimes for variety I blamed WordPress.
Then yesterday I got an email from James Andrewartha. He wrote:
http://clubtroppo.com.au/feed/ is sending a last-modified header of Thu, 01 Jan 1970 08:00:00 GMT which is messing up my bookmarks script. In other words, EPOCH FAIL!
James’s joke is a Unixy nerd thing — all computers running Unix, a Unix-derivative or a Unix-like OS measure time by counting the seconds elapsed since midnight, 1st of January 1970, known as the “beginning of the epoch”. The 8am GMT thing has to do with our server living in the Perth timezone.
Jokes aside, this was a big fat clue that I was wrong. Eagerly I scurried to my command line to test the claim:
Alchemist:~ jacques$ curl -I http://clubtroppo.com.au/feed/
HTTP/1.1 304 Not Modified
Status: 304 Not Modified
Last-Modified: Thu, 01 Jan 1970 08:00:00 WST
Date: Mon, 31 Mar 2008 09:57:03 GMT
He was right! The server was insisting that the feed was eternal, unchanging, and stuck forever in that morning in 1970 Perth. Like Groundhog Day, except that Google et al weren’t getting the joke11. Aside on ETag: It gets worse, too. See that header named ETag? That’s supposed to be a universally unique code for a given version of a document which caches and feed-gatherers can read to decide whether or not the file has changed. WordPress is lazy and just hashes the last-modified date, so it just kept returning the same ETag for every. single. request. [↩].
OK, so far, so bad. The question is: why? Why does it never change?
Again I was inclined to blame WordPress. The bad workman blames his tools, especially if they are as mixed in blessings as WordPress is. So I spent a few hours tracing through the code, watching what called what, before finding this innocuous line in a function called get_lastpostdate:
$lastpostdate = $wpdb->get_var("SELECT post_date_gmt FROM $wpdb->posts WHERE post_status = 'publish' ORDER BY post_date_gmt DESC LIMIT 1");
This was the end-of-the-line as far as my tracing had gone. All function calls of interest eventually led here. So I threw it into the database.
mysql> SELECT post_modified_gmt FROM wp_posts WHERE post_status = 'publish' ORDER BY post_modified_gmt DESC LIMIT 1;
| post_modified_gmt |
| 2251-07-12 09:05:49 |
1 row in set (0.00 sec)
Ah. Oh. Um. According to the database, the most recently-changed story was modified in Anno Domini Two Thousand Two Hundred and Fifty One. That would rather explain why the result never changed.
You see, WordPress works out change by remembering the date each time it runs the query. If the date is the same, it counts that as no change and adjusts its headings to match. Of course of the last changed date is hundreds of years in the future, it’s not going to change any time soon. Hopefully my successor will be around to laugh at this.
There were two stories from the period in which Kirk, Spock, McCoy et al are due to seek out new frontiers. One about the new Director General of the WTO, and one about Australia v New Zealand in the reform game. I for one find it reassuring that NZ is still there to poke fun at some centuries hence; though with the abolition of money around that period I am puzzled about the purpose of the WTO. Oh well.
How’d this come about, though?
Here’s a clue. That move was aimed at improving the stability and flexibility of the server, but was an unmitigated disaster for reasons which a trip through the Site News category will outlay in heart-breaking detail. Apparently during that period a number of posts decided that they were from the future, which planted the seeds of the problem hence.
How do we fix it? My first instinct was to bust out some fancy UPDATE foo WHERE bar queries, but in practice opening the posts, fixing the timestamps and saving them has done the trick. Troppo is now correctly identifying how recently the post feed is being updated:
Alchemist:~ jacques$ curl -I http://clubtroppo.com.au/feed/atom/
HTTP/1.1 200 OK
Last-Modified: Mon, 31 Mar 2008 10:28:09 GMT
Content-Type: application/atom+xml; charset=UTF-8
Date: Mon, 31 Mar 2008 11:38:15 GMT
And that, I believe, solves the Mystery of the Missing Feed. I’d be interested to hear from those of you who’ve had feed reader problems in the past as to whether this has, in actual fact, solved the issue.
The moral of the story is that whenever I think I’m smart, I’m not. I forgot the First Rule of Programming: whatever the bug is, whatever the issue, whatever the glitch, never just assume the problem lies with somebody else. Assume that it’s always your fault.