Training without ham.
Hr. Daniel Mikkelsen
daniel at copyleft.no
Sun Sep 7 19:58:32 CEST 2003
On Sun, 7 Sep 2003, David Relson wrote:
> On Sun, 7 Sep 2003 19:05:52 +0200 (CEST)
> "Hr. Daniel Mikkelsen" <daniel at copyleft.no> wrote:
>
> > site wide spam filtering. Since spam is generally the same for all
> > accounts, while ham can differ widely (between nationalities for
> > instance), is it viable to set up a bogofilter that only uses a spam
> > corpus provided by some of the site administrators?
> BF compares the tokens to ham and spam lists and determines which one
> matches better. If you only train on spam, the comparison becomes one
> of "known" words (which are all spam) and "unknown" words. As the
> ham/spam comparison is lost, the results can't be good.
So a comparable statistical filter package with another kind of logic for the
comparision/determination part (not learning, not scanning) would possibly be
do the trick?
Are there such packages out there?
(Downloading the bogofilter sources now to have a look.)
-- Daniel Mikkelsen, Copyleft Software AS
More information about the Bogofilter
mailing list