scaling and learning [wasRe: Inline image based spam]
dhottinger at harrisonburg.k12.va.us
Fri Oct 6 20:55:56 EDT 2006
My wordlist is around 2 years old. Would a fresh list be better?
Quoting David Relson <relson at osagesoftware.com>:
> On Fri, 6 Oct 2006 16:13:09 -0700
> Chris Wilkes wrote:
> > Anyway I'm open for other ideas, this is very annoying.
> > Chris
> Hi Chris,
> I agree. 'Tis annoying. I'm seeing a few such Unsures each day.
> Bogofilter _is_ catching some of the messages, but not all. The
> messages commonly have a passage from a book (or some such) in hopes of
> fooling filters. Since those passages rarely match my ham email, I
> anticipate that bogofilter will eventually come to recognize the new
> words as spammish.
> My wordlist is about 4 yrs old which means the message count is high
> and some of the tokens have very high counts. That produces a type of
> inertia and slows down learning. For example, here are 2 token counts:
> bogoutil -p $BOGOFILTER_DIR osagesoftware.com to:osagesoftware.com
> spam good Fisher
> .MSG_COUNT 350984 120977 0.500000
> osagesoftware.com 53543 11119 0.624030
> to:osagesoftware.com 322413 39974 0.735452
> It'll take a lot of messages for their score to change noticeably. To
> lessen the wordlist's inertia, I may scale the numbers so
> that .MSG_COUNT is 1000//1000 and the others are correspondingly
> small. It'll be interesting to see how this affects the ability to
> Bogofilter mailing list
> Bogofilter at bogofilter.org
Harrisonburg City Public Schools
More information about the Bogofilter