multipart spam

Dave Lovelace dave at firstcomp.biz
Mon Nov 15 15:24:42 CET 2004


Tom Anderson wrote:
> 
> The best thing to do with this class of emails, IMHO, like most others,
> is to simply train with them.  The hammy tokens will tend toward
> neutrality rather than spam, while the spammy tokens will stand out.
> Unless you receive very little ham to balance it out, there's virtually
> zero chance of false positives from doing so.  Give yourself a decent
> unsure range just to be sure though.  If you don't see any effect, train
> til exhaustion.
> 
> Tom
> 
This has been my experience.  For a while I saw quite a lot of this
kind of spam - several every day.  I told bogofilter to classify them as
spam.  Now I almost never see one; when I do it's apt have disguising
content related to topics I receive lots of ham on.  (In fact, a couple
of those I decided not to classify as spam, out of fear of generating
false positives later.  Doesn't seem to have opened the floodgates or
anything.)

Relating to what David said, in the early stages, I also actually found
some interesting things in the disguising content - tracked down & read
a couple of works of fiction.  No deathless literature, but readable.

(Oddly, only one of the things I looked at seemed to have been pulled off
the web.  Mostly they seem to have been actually scanned in by hand.
This could be seen in the fact that words were missing at somewhat regular
intervals, chopped off long lines.)

-- 
- Dave Lovelace
  dave at firstcomp dot b i z
  davel at cyberspace dot org



More information about the Bogofilter mailing list