filter mode and hints.

David Relson relson at osagesoftware.com
Tue Apr 8 00:33:40 CEST 2003


Hello Michael,

Sounds like you're doing something a bit different.  As it is now, 
bogofilter expects to be run by procmail or comparable MDA as part of 
receiving an email.  Sounds like your plan is to put each message in its 
own file, use bogofilter in a bulk mode to classify each of the files, then 
sort (distribute) the files according to bogofilter's classification.  It's 
probably pretty easy to implement.

Regarding your second suggestion, there're all sorts of things that could 
be used to generate meta-tokens.  What is not clear is which ones would be 
useful, i.e. would help bogofilter do a better job.  Making the changes and 
testing them for value would be a great project for someone to take on.

David

At 06:16 PM 4/7/03, michael at optusnet.com.au wrote:


>Is there any interest in adding
>a 'filter' mode to bogofilter? I.e. having it
>take a list of filenames on STDIN, and output
>filename plus spamicity rating on STDOUT?
>
>My desire for this arises from wanting to do high
>throughput filtering (multi-million emails per day).
>At the moment, the throughput is (obviously) hugely
>impacted by the fork and exec startup costs.
>
>Is this a reasonable feature to add to bogofilter?
>
>
>Secondly, it seems to me that there's a lot of
>hints in spam emails that bogofilter is ignoring.
>Spamassassin checks for a large number of email
>features (like 'may be forged' in received lines,
>and large amounts of whitespace in subject lines)
>that are used to decide if an email is spam or
>not.
>
>Obviously, spamassassin's biggest problem is that
>the weights attached to those hints are little
>more than guesses.
>
>Would it make sense to check for some of those
>features and add them as tokens into the bogofilter
>stream? (i.e 'hint:subj-whitespace' etc ).
>
>In particular, checking for a large number of
>HTML comments in email looks like a good
>indicator in recent spam. :)
>
>
>Thanks,
>Michael.





More information about the Bogofilter mailing list