multiple wordlists

Greg Louis glouis at dynamicro.on.ca
Tue Mar 18 13:45:06 CET 2003


On 20030317 (Mon) at 0925:15 -0800, elijah wrote:

> Now I will violate my just-made resolution to keep to the facts: I suspect
> that another big problem with shared wordlists is garbage in, garbage out.
> Your average user is probably not going to be very diligent about
> correcting misidentified mail.

You said it!  Even the best of my users just bounce the odd spam to my
spam folder.

The only way I've come up with to run a multi-user list is to appoint a
spam monitor to whom all mail is copied; this person periodically uses
bogofilter to classify the accumulated mail into three folders: spam,
nonspam and unsure.  Next step is manually to verify the
classification; fortunately this can be done, in all but a few cases,
by simply glancing at the subject line.  While classifying the unsures,
the monitor separates them out into spam and nonspam.  The results are
then used for training bogofilter.

This has one very strong positive aspect: the training db is kept
accurate.  There are three drawbacks: (1) No matter how many times you
tell people that an email message is no more private or secure than a
postcard, they passionately desire to believe otherwise, and they have
concerns about the monitor "reading their mail;" (2) as Mike found,
people differ about what they do and don't want to receive, and the
monitor will inevitably get it "wrong" for some users; (3) the volume
of mail a monitor can reasonably handle is limited; a large
organization like Mike's would need several monitors, possibly
full-time (gaahh, what an awful job).

-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |




More information about the Bogofilter mailing list