Catching image spams with Spamassassin
By ArnY on Saturday 16 December 2006, 12:51 - spam - Permalink
Image only based spam are now for more than 40% of the total spam being sent (here, yesterday: 155833 spams were detected and 67401 of them were image based, that's a nice 43%), how to dected them?
Well, first i want to warn you: there's no 100% safe way to detect that a mail containing an image is a spam or not. Of course you don't want to stop every mail containing a jpg or gif file. No safe rule, but still, you can use several criteriums:
- lot of them are fake replies
- lot of them are html based (used for poisonning)
- it has an image
Detecting fake replies
I posted about it earlier here. This is the rule that will actually make the difference since a "ham" with an image will never be caught as a fake reply unless you are using a really crappy email client.
Detecting html multipart messages
Spamassassin already has a rule detecting this:
HTML_MESSAGE
Detecting image attachements
You need to use the multipart attachment header for this:
full __JPG_ATTACH /image\/jpeg/i full __GIF_ATTACH /image\/gif/i
The meta rules
now that your can detect fake replies, images and html mails, let's create a meta-rule:
describe IMG_SPAM fake reply in html with image meta IMG_SPAM (__JPG_ATTACH || __GIF_ATTACH) && UN_FAKE_REPLY && HTML_MESSAGE score IMG_SPAM 3
there you go!
Comments
Works great.
But when testing the attachment header without assigning a score it would be __JPG_ATTACH and __GIF_ATTACH.
I use the SARE rules (rulesemporium.org) and therefore can use your rules with a lower score, to avoid false positives.
Thanks!
Mirko
Actually, i meant to put the '__'. (dotclear2 ate them? weird)
But you're right. The '__' (double underscore) convention is used to avoid a rule from being scored 1.0, the default score. Nevertheless, not using this convention won't break the meta rule. The rule will score 1.0 and then the meta rule score will also be added.