relson at osagesoftware.com
Sun Feb 1 09:16:58 EST 2004
On Sun, 1 Feb 2004 14:03:22 +0000
Simon Huggins wrote:
> On Mon, Feb 02, 2004 at 12:00:47AM +1100, Tig wrote:
> > On Sun, 1 Feb 2004 08:33:48 +0000 Simon Huggins <huggie at earth.li>
> > wrote:
> > > On Sat, Jan 31, 2004 at 08:53:29PM -0500, David Relson wrote:
> > > > An interesting trick in the spam below.
> > > I should probably just unsubscribe from the bogofilter list but do
> > > you think it might be an idea to create a bogofilter-new-spam or
> > > something list to discuss spam and post it so that people never
> > > have to corrupt their DBs with messages sent to this list?
> > Is this because you use the -u option on all email? I found that
> > using-u damaged my wordlist.db very quickly. So quickly I went back
> > to not using it.
> "damage" in what way?
> I use the three way classification and -u and then train on teh few
> that get into my spam-unsure folder.
> On Sun, Feb 01, 2004 at 08:32:52AM -0500, David Relson wrote:
> > With bogofilter's use of many tokens from each email in scoring,
> > I've yet to see a problem caused by one or two misclassified tokens.
> Yeah, it's probably fine and more resilient than most people believe I
> just wondered more if there was an actual need for such a separate
> list to discuss new attacks etc.
I'm sure there are a number of lists discussing spam tricks. I don't
think bogofilter needs a special one.
> > Assuming you're using procmail, maildrop, etc, you could whitelist
> > the mailing list with a simple test. That'd keep list messages out
> > of your wordlist.
> Yup, I do this for an abuse@ address already which whilst it does
> receives spam it would be terrible if someone attached a spam as a
> complaint and it didn't get dealt with for instance.
> I probably should just succomb and whitelist bogofilter lists.
I don't remember many spam samples posted on this list. I see a lot
more spam on gnu.org lists (which have an open to all policy; no
subscription needed) and on the bogofilter-announce list for which I'm
Having run bogotune I use a high min_dev (in the 0.45 range). That
excludes words that score from 0.05 to 0.95, leaving only the extremely
hammish and spammish words to contribute to the message's score. Using
"bogoutil -H" to generate my wordlist histogram, I've found that a vast
majority of my wordlist's tokens _are_ at one extreme or the other and
also that most of the extreme scorers are used only in ham or in spam.
Summary: the extremes rule!
More information about the Bogofilter