multiple wordlists

Greg Louis glouis at dynamicro.on.ca
Sun Mar 16 13:08:56 CET 2003


On 20030315 (Sat) at 1729:43 -0500, David Relson wrote:

> 4 - Weight the two lists, i.e. give higher importance to token values in 
> the user list.  The relevant formula would be:
> 
>     p(w,weighted) = (W*p(w,user) + p(w,site))/(W+1)
> I think that option 4 doesn't extend 
> well beyond 2 list.

Why not?  In the two-list case, it makes sense to give the site list a
weight of 1 and not mention it.  In the case of more than two lists,
assuming there's a reason to combine them with weighting (which there
may or may not be), you generalize:

  p(w,weighted) = sum(i)(W(i) * p(w,i)) / sum(i)(W(i))

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list