greg at cambria.com
Wed Mar 3 14:41:57 EST 2004
On 3/3/2004 at 1:49 PM David Relson <relson at osagesoftware.com> wrote:
>True. Ignoring tokens (via ignore lists) is different from ignoring
>lines. What ideas have you on this? So far, "ignore 'X-ABC:' lines"
>and "ignore 1st n ABC:" lines have been suggested. What else?
I have a funny problem with my scoring that an ignore wordlist would probably help. Email headers (at least with sendmail) always contain the current date. My ham and spam corpuses (corpi?) are all from recent email and my spam corpus, which gets automatically updated from spamtrap addresses, is updated much more frequently than my ham, with about 1200 new spam every day.
The unexpected consequence is that every time the month changes, the abbreviation for the current month instantly gets a very high spam score until I manually throw some more ham at it. Here's one from today.
"rcvd:Mar" 3790 0.000000 0.029003 0.999998 +
In this case, an ignore wordlist would probably be more useful than ignoring lines, since the "Received:" lines that contain the date also contain lots of other useful information, like the sender domain and IP address.
More information about the Bogofilter