Nearly everything is either 0.500000 or 1.000000
Jochem Huhmann
joh at revier.com
Sat Aug 18 10:42:11 CEST 2007
On 2007-08-15, at 17:58, Tom Anderson wrote:
> At first glance, it seems to me that one or two headers should not
> have
> that kind of effect. Moving from 0 to 0.5 would require something
> else
> than all of a sudden having a few tokens slightly more spammy than
> before. Are you classifying on the headers only?
No, I'm classifying on headers and body.
> Run a ham through
> with -vvv and see what all of the body tokens are contributing.
Already did that, but it didn't help me much... looks quite OK actually.
>
> As a quick solution, if it were me, I would just grab my entire
> archive
> of hams and run it through training once.
I meanwhile created a fresh database with all my ham and 450000 spam
mails -- bogofilter behaves more normally now, with good mail near 0,
most spam near 1 and false negatives in between. I still have spam
getting through with a bogosity of about 0.5, but these are
containing large amounts of (hidden) text which drags the bogosity
down, so bogofilter seems to do its best.
>
> BTW, this is why I never do batch training in the first place. Just
> train on error and you should never have problems like this.
This batch *was* training on error, it was all false negatives
catched by other filters running after bogofilter (in my MUA).
Thanks for replying,
Jochem
--
When the revolution comes, I will be shot by both sides.
More information about the Bogofilter
mailing list