Exercises in Spamology

When is spam not spam?

In the last day or so there has been a curious trend of spam comments turning up which say “hi, nice post, i enjoyed it”. Usually such suckup spams include links to someone who is flogging one of porn, viagara or gambling. I guess the theory is that if you suck up to people, they think it’s genuine.

Generally I never see them at all. The antispam system we use here at Club Troppo is called Spam Karma 2. It puts every comment through a range of tests, assigning “karma” as it goes. If a comment’s threshold is too low, it gets shunted.

I first installed SK2 because Akismet, the default antispam system for Wordpress, was eating legitimate comments and there is no way to retrieve what it throws out. In fact Akismet can be used as one of the karma-assigning tests for SK2 and every day or two this site auto-submits comments to Akismet which have been incorrectly identified as either spam or ham.

But then these little comments turned up. The first one got through and I took a look at it. No links to anything were embedded, so I figured it must be legit, if poorly constructed. But then suddenly a few identical ones turned up — absolutely identical, letter for letter — which is a dead-cert way to spot spam.

Why are they trying to do it? Such questions are nearly impossible to deduce an answer for. My theory is that someone is trying to pollute Akismet with millions of these apparently OK remarks, so that in future link-bearing comment spam with that content will get through. Or maybe somebody just forgot to type a link into their spamming software.

Spam is annoying and wasteful. To date SK2 has blocked 120,000 odd spams on this site; during the same period some 6,000 comments have been lodged. That’s 20:1 spam to ham. On our VPS, underpowered as it was, the constant torrential spam traffic was a serious load generator. It dropped off a bit when I took steps to deter search engines from spidering our content — this lowers the page rank and thus attractiveness to spammers over time — but it still wasted a lot of compute time.

You know, stupid people who buy stuff advertised in spam have a lot to answer for.

Elsewhere: The real world as an Internet forum.

17 thoughts on “Exercises in Spamology

  1. I used to get them. Soon afterwards, when I was lax and my blog software wasn’t mature enough to allow me to delete spam without resorting to the command line, the viagra and cialis and poker spam would drop on me like a truck load of bad meat. I have a theory which is kinda like yours.
    My first theory is that they are pioneers, blazing a path to see how carefully the blog is watched. If the scouts get through and stick around, follow up with an all-guns-blazing assault with the real spam. I’m not sure how this theory holds up when tested against reality, though.

  2. My first theory is that they are pioneers, blazing a path to see how carefully the blog is watched.

    It strikes me as too expensive. Given 80 million blogs (an estimate I think I saw on Technorati), several millions use Akismet to identify spam for them. This one is getting through Akismet and Spam Karma, probably because there are no links. My feeling — as I say above — is that it’s probably an attempt to poison the Akismet filter.

    Fat chance, is all I have to say about that. Akismet reckon they have north of 2 billion spams logged. It’d take a long way to wrestle naive bayesian filters to anywhere with a spam corpus like that.

    After reviewing the video clip, I feel disappointed that I didnt just post First!

    Pwned!

  3. I just left a comment here which I think may have been ironically caught in the spam trap. Is it still there and retrivable?

  4. There may well be several millions of blogs using Akismet, but there’s other severals of millions of blogs that are ghost towns; there’s nobody home monitoring them. I’d bet there’s more dead Live Journal and Blogger blogs that the owner grew out of then the entire installation base of some of the b-list blog engines. The trick then is to find them.

  5. gilame;

    They wouldn’t be hard to find. There’s probably blog spambots using the same strategy as email spambots: try every combination. It’s a finite set and you have thousands of zombies to try the combinations.

  6. Hmm, maybe I just forgot click “submit”. Story of my life really. OK, I’ll try again.

    My theory is that someone is trying to pollute Akismet with millions of these apparently OK remarks, so that in future link-bearing comment spam with that content will get through.

    A bit like chaff perhaps?

    Very interesting to watch this whole semi-automated internet battle unfolding over the past few years.

    Civilised-state bots like Akismet and Spam Karma designed to clean up online highways vs constantly mutating street huckster and pickpocket algorithms, generally hatched in the murkier corners and failed states of Eastern Europe, Russia and Asia. With their human masters only stepping in when a new strategic offense or defence is needed.

    T’will be intriguing to see if the first true AI, capable of not just passing but also setting Turing tests, emerges out the 21st century spam wars the way the first true binary logic-chopping machines emerged out of WW2.

  7. Jacques,
    If it is any consolation, the same ones have been hitting my site regularly. The “Hi, nice post” and similar ones are particularly annoying as they look legit. For a while I was also getting nearly 1,000 a day from a server in Panama.
    Akismet at wordpress.com picked them all up. So far I have had only 2 or three legitimate comments go to spam – but then I do not get many comments.

  8. Andrew;

    We gave up on the Akismet-only solution because it began to eat legitimate comments, including from site admins.

    SK2 at least keeps items marked spam for a certain amount of time, so you can go in and rescue them. In the basic configuration Akismet’s rulings are final without appeal. Out they go. Very annoying on false positives.

  9. Hi, nice post; I enjoyed it.

    Seriously, a good read in the 6 August NY’er.

    I’ve had a few people drop in on my blog to do something like this, with only their profile providing a link, always to a commercial site, the last one selling custom t-shirts. Volume is suggestive of a new spam technique, rather than just an innocuous individual trying to drum up some traffic and sales, but if a low-profile site like mine can get a half-dozen, maybe not so much.

  10. Jacques,
    As implemented at wordpress.com akismet puts all the spam into a queue and then deletes it after 14 days. I normally wade in every day or two – about 70 is normal. At times, though, it can be huge so when it is I just delete them.

  11. Andrew;

    I think you’re right about that. Still, it was far too many false positives for my taste. With SK2 at least I can tell it to automatically assign generous commenting karma to registered users etc.

  12. Ive had a few people drop in on my blog to do something like this, with only their profile providing a link, always to a commercial site, the last one selling custom t-shirts.

    We get a lot of that too. What was interesting was that they didn’t. No links, anywhere, in any field. Most unusual.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Notify me of followup comments via e-mail. You can also subscribe without commenting.