Training without ham.

David Relson relson at osagesoftware.com
Sun Sep 7 19:39:28 CEST 2003


On Sun, 7 Sep 2003 19:05:52 +0200 (CEST)
"Hr. Daniel Mikkelsen" <daniel at copyleft.no> wrote:

> Hi.
> 
> I'm wondering what experiences people have with using bogofilter for
> site wide spam filtering. Since spam is generally the same for all
> accounts, while ham can differ widely (between nationalities for
> instance), is it viable to set up a bogofilter that only uses a spam
> corpus provided by some of the site administrators?
> 
> The alternative right now is to use SpamAssassin, which is very slow
> compared to bogofilter.
> 
> Any input on this would be greatly appreciated.
> 
> -- Daniel Mikkelsen, Copyleft Software AS, Norway

Daniel,

That's an interesting thought ...  Remember there's a big difference
between SpamAssassin and bogofilter.  

SA gives points to anything that's spammish and, if there are enough
points, calls it spam.  

BF compares the tokens to ham and spam lists and determines which one
matches better.  If you only train on spam, the comparison becomes one
of "known" words (which are all spam) and "unknown" words.  As the
ham/spam comparison is lost, the results can't be good.

I don't think it would work :-(

David




More information about the Bogofilter mailing list