Tokens including header values?
David Relson
relson at osagesoftware.com
Sat Feb 1 01:42:42 CET 2003
At 07:36 PM 1/31/03, Chris Wilkes wrote:
>This spam got past bogofilter today, the -bfvvv one is the result of a
>running it through bogofilter -vvv after registering it as spam:
> http://ladro.com/bf/20020131-01.txt
> http://ladro.com/bf/20020131-01-bfvvv.txt
>
>As you can see there isn't much there to go off of, the main words I
>could that show it to be spam are:
> alice deflowered hymen's
>Which aren't likely to show up in another spam email. Nor is the
>website, kellu.com. They'll just change it to be kellv.com next time.
>
>However I have seen quite a lot of spams with my email address in the
>Subject line:
> Subject: See her get deflowered cwilkes at pobox.com
>Hardly anyone that's going to email me puts my email address in the
>subject.
>
>Likewise, I'm never going to get legitimate email with a bunch of other
>cwilkes's listed in the To:.
>
>Has anyone thought of making bogofilter header and body aware? I'm not
>sure how much of a gain that would be versus making the code that much
>more complicated. You could write out to files like spamlist-subject.db
>and goodlist-to.db.
>
>Chris
Chris,
Funny you should mention it. As bogofilter 0.10.1.x is almost ready for
promotion as the stable version of bogofilter, my attention is moving
forward to what's next.
As your message arrived, I'm looking at gtkdiff showing bogolexer from the
current version and an experimental version. The goal of the experimental
version is to recognize various standard header phrases and create special
tokens for each word in that header line. For example, "Subject: See
cwilkes at pobox.com" may soon return "subj:see", "subj:cwilkes", and
"subj:pobox.com".
Thanks for asking :-)
David
More information about the bogofilter-dev
mailing list