filter mode and hints.

michael at optusnet.com.au michael at optusnet.com.au
Tue Apr 8 00:16:46 CEST 2003


Is there any interest in adding
a 'filter' mode to bogofilter? I.e. having it
take a list of filenames on STDIN, and output
filename plus spamicity rating on STDOUT?

My desire for this arises from wanting to do high 
throughput filtering (multi-million emails per day).
At the moment, the throughput is (obviously) hugely
impacted by the fork and exec startup costs.

Is this a reasonable feature to add to bogofilter?



Secondly, it seems to me that there's a lot of
hints in spam emails that bogofilter is ignoring.
Spamassassin checks for a large number of email
features (like 'may be forged' in received lines, 
and large amounts of whitespace in subject lines)
that are used to decide if an email is spam or
not.

Obviously, spamassassin's biggest problem is that
the weights attached to those hints are little
more than guesses.

Would it make sense to check for some of those
features and add them as tokens into the bogofilter
stream? (i.e 'hint:subj-whitespace' etc ).

In particular, checking for a large number of
HTML comments in email looks like a good
indicator in recent spam. :)


Thanks,
Michael.







More information about the Bogofilter mailing list