can I do this with bogofilter

Matthias Andree matthias.andree at gmx.de
Wed Aug 31 19:33:59 CEST 2005


Tom Allison wrote:
> I don't know if I can do this or not and the answer would almost require
> a code review....
> 
> Can I use bogofilter to score in a binary fashion (ham/spam) a generic
> text string to classify it into one of two pools?
> 
> It's definitely not email.
> it's typically only one line, but very long.
> I only need a binary classification.
> 
> Would this still work?
> 
> I've looked at some of the code available in CPAN perl modules and they
> all tend to assume you are using email...

If the long line looks like text and can be broken up into words, then yes -
you may however want to make the cutoff and robx settings symmetric, bogofilter
default settings are very much and deliberately "lopsided", i. e. tilted
towards "ham" to keep the false positive count near zero.

I hope I'm not forgetting anything, you'd set robx, spam_cutoff and ham_cutoff
all to 0.5.

An alternative might otherwise be CRM114 <http://crm114.sourceforge.net/>

-- 
Matthias Andree



More information about the Bogofilter mailing list