scaling and learning [wasRe: Inline image based spam]
Dwayne Hottinger
dhottinger at harrisonburg.k12.va.us
Sat Oct 7 02:55:56 CEST 2006
My wordlist is around 2 years old. Would a fresh list be better?
Quoting David Relson <relson at osagesoftware.com>:
> On Fri, 6 Oct 2006 16:13:09 -0700
> Chris Wilkes wrote:
>
> ...[snip]...
>
> >
> > Anyway I'm open for other ideas, this is very annoying.
> >
> > Chris
>
> Hi Chris,
>
> I agree. 'Tis annoying. I'm seeing a few such Unsures each day.
> Bogofilter _is_ catching some of the messages, but not all. The
> messages commonly have a passage from a book (or some such) in hopes of
> fooling filters. Since those passages rarely match my ham email, I
> anticipate that bogofilter will eventually come to recognize the new
> words as spammish.
>
> My wordlist is about 4 yrs old which means the message count is high
> and some of the tokens have very high counts. That produces a type of
> inertia and slows down learning. For example, here are 2 token counts:
>
> bogoutil -p $BOGOFILTER_DIR osagesoftware.com to:osagesoftware.com
> spam good Fisher
> .MSG_COUNT 350984 120977 0.500000
> osagesoftware.com 53543 11119 0.624030
> to:osagesoftware.com 322413 39974 0.735452
>
> It'll take a lot of messages for their score to change noticeably. To
> lessen the wordlist's inertia, I may scale the numbers so
> that .MSG_COUNT is 1000//1000 and the others are correspondingly
> small. It'll be interesting to see how this affects the ability to
> learn.
>
> Regards,
>
> David
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
>
--
Dwayne Hottinger
Network Administrator
Harrisonburg City Public Schools
More information about the Bogofilter
mailing list