RDMS

David Relson relson at osagesoftware.com
Fri Nov 24 05:23:32 CET 2006


On Thu, 23 Nov 2006 22:12:54 -0500
Tom Allison wrote:

...[snip]...
> Perhaps I haven't been paying much attention to the conversations
> about exclusion lists, but I haven't heard anything in the past to
> say that exclusion lists are going to help the success of a
> statistical filter.  In fact, I would think that the statistical
> filter would be more effective without any exclusion lists simply
> because if it didn't matter then it wouldn't show up as important. If
> it did matter, even if it was a routing path, then it would show up.
> 
> What I've worked out on the tables is essentially three tables:
> user (who the recipient is on the local system)
> user_tokens (many to many table associating each token to a known
> user.  It is here that the counts of good/bad instances would be
> stored for each token) tokens (words seen...)
> 
> There might be some optimizations that can be done, but this is where
> I would start with the first normalized tables.
> 
> It would probably add some complexity to the process, but it might
> also be worthwhile.  It's probably a matter of speed of operation
> versus maintenance time/speed...

Tom,

FWIW, I've used a small ignore list for quite a while.  Consider a
mailing list that doesn't restrict postings.  Because it's a wanted
mailing list it's got hammy header tokens.  Suppose it also receives a
fair amount of spam.  The combination of hammy headers and spammy
content leads to "unsure" ratings.  By ignoring the header tokens, each
message is scored on the content.  This is the situation that lead to
_my_ use of an ignore list.

Regards,

David



More information about the Bogofilter mailing list