training with unsure [was: religion]

David Relson relson at osagesoftware.com
Wed Jan 22 20:27:18 CET 2003


At 11:52 AM 1/22/03, Matt Armstrong wrote:

>I'm curious about how you deal with the extra complexity.  Unless I am
>missing something, for the way I work I think it adds complexity with
>no value.  Maybe you can suggest an improved way of dealing with
>things:
>
>     (1) I throw all SPAM into a SPAM folder, and all other mail is
>         automatically sorted into a plethora of other folders.
>
>     (2) I periodically read the SPAM folder for false positives and
>         "bogofilter -N" them (a rare occurrence).
>
>     (3) When I run across a SPAM in one of my many non-SPAM
>         folders, I "bogofilter -S" it.
>
>With this system, all messages get trained on with minimal fuss.
>
>Now if bogofilter doesn't train on 'unsure' messages, step 3 becomes
>more complex since I have to decide whether -S or -s is appropriate.
>I also have to somehow come up with a way of training on the 'unsure'
>non-SPAM.

In my environment, email reading is done on the Windoze box to my left and 
access to my mail server is via rsh session from the Linux box on my right.

Like you I have a SPAM folder and do steps 1, 2, and 3.  In addition, I 
have an UNSURE folder where a dozen or so messages go each day.  When there 
are messages in the UNSURE folder, I use the Linux box to access an 
unsure-bogofilter file on the mail server.  Using emacs, I get the messages 
and put copies into the spam-fixups folder.  The copies get names like 
unsure-good.MMDD.HHMM.txt and unsure-spam.MMDD.HHMM.txt.  Any false 
positives and false negatives get copies made as well (as 
spam.MMDD.HHMM.txt or good.MMDD.HHMM.txt).  An hourly cronjob looks for 
unsure-spam.*, unsure-good.*, good.*, and spam.* and feeds them to 
bogofilter with the -s, -n, -N, or -S switch.

I could have set up special email addresses for these 4 message groups and 
had the cronjob use the mailboxes as its input source.  However, I'm 
comfortable with the current process and haven't felt a need to change.

The question that's asked is "How do I benefit from unsure?"  The answer is 
that the "odd" messages go into one place for easy inspection.  Messages 
that go into my "spam" folder get a quick (very cursory) author/title scan 
and then are moved to cold storage.  Messages in the "unsure" folder get 
closer attention.  All other messages go into their various folders and. 
since I've come to trust bogofilter's spam/ham/unsure classifications, I 
just read those messages.

Periodically, a spam message shows up, usually in my In box.  Then I look 
at the X-Bogosity: line.  Usually it's missing.  The reason for _that_ is 
that I often put experimental code onto the mail server.  Other than those 
times, bogofilter does well.

Hope the long winded reply didn't put you to sleep:-)

David





More information about the Bogofilter mailing list