Bogofiler with a specified wordlist.db
Tom Anderson
tanderso at oac-design.com
Thu Apr 6 01:17:56 CEST 2006
mouss wrote:
> sure but bayesian filters require training. so
> - their accuracy is poor at start.
> - for users who don't retrain the filter, accuracy may never be
> satisfactory. (using a "global" wordlist may help, but not if these
> users receive mail that is different from the one used to train the
> global db).
Accuracy has generally been above 95% within 48 hours and a few dozen
messages when training on error from scratch. If starting with a corpus
of previous messages, it's much faster.
If you don't want to train the filter, don't use Bogofilter or any other
statistical filter. You won't be happy with the result, even if you
pair it with a procedural filter.
> Chris Idea is to "shoulder" (or boost?) bogo using SA. I would love to
> see the results of this. (I find this better than using public corpuses).
>
> for example, when you install bogo for the first time, you use SA too.
> if SA score is "sure" (<0 or >10 for instance), then train bogofilter
> with this email. There is still a risk of error (FN or FP) of course,
> but for users who don't retrain bogofilter, this is better than nothing.
>
> once the user's wordlist is "mature", SA can be skipped for that user.
If you don't want to train, Bogofilter should be skipped altogether...
just use Spam Assassin exclusively. The idea of "boosting" Bogofilter
with Spam Assassin is like hand-holding some grandma through her first
time creating a Word document and then walking away and letting her
loose on a Sendmail config.
>>I feel that adding Spam Assassin to the mix would only introduce false
>>positives, of which I currently recieve zero.
>
> one can reduce this by using a conservative setup (disable or lower the
> score of rules that generate FPs).
You can't reduce false positives below zero. Just train the 4 errors
per week in Bogofilter and be done with it. Spam Assassin doesn't bring
anything to the table unless you want complete and total automation and
don't mind receiving lots of FNs and discarding FPs. And if that's what
you want, then Bogofilter isn't for you. Bogofilter provides
near-complete automation and unrivalled accuracy, but you have to
provide feedback every few days to keep it on track.
This can certainly be a little harder to achieve in a multiuser setup,
but in that case, I would recommend training a global wordlist on the
input from a honeypot (known spam) and that of your SMTP server (known
ham). You could probably even manage to use per-user wordlists trained
with their own outgoing mail as ham and the honeypot as spam. It may
not be quite as good as training their own actual errors, but I'd wager
it'd be much better than Spam Assassin, and it would be completely
automated.
Tom
More information about the Bogofilter
mailing list