Training without ham.

Hr. Daniel Mikkelsen daniel at copyleft.no
Sun Sep 7 19:58:32 CEST 2003


On Sun, 7 Sep 2003, David Relson wrote:

> On Sun, 7 Sep 2003 19:05:52 +0200 (CEST)
> "Hr. Daniel Mikkelsen" <daniel at copyleft.no> wrote:
>
> > site wide spam filtering. Since spam is generally the same for all
> > accounts, while ham can differ widely (between nationalities for
> > instance), is it viable to set up a bogofilter that only uses a spam
> > corpus provided by some of the site administrators?

> BF compares the tokens to ham and spam lists and determines which one
> matches better.  If you only train on spam, the comparison becomes one
> of "known" words (which are all spam) and "unknown" words.  As the
> ham/spam comparison is lost, the results can't be good.

So a comparable statistical filter package with another kind of logic for the
comparision/determination part (not learning, not scanning) would possibly be
do the trick?

Are there such packages out there?

(Downloading the bogofilter sources now to have a look.)

-- Daniel Mikkelsen, Copyleft Software AS





More information about the Bogofilter mailing list