training with unsure [was: religion]
David Relson
relson at osagesoftware.com
Wed Jan 22 20:27:18 CET 2003
At 11:52 AM 1/22/03, Matt Armstrong wrote:
>I'm curious about how you deal with the extra complexity. Unless I am
>missing something, for the way I work I think it adds complexity with
>no value. Maybe you can suggest an improved way of dealing with
>things:
>
> (1) I throw all SPAM into a SPAM folder, and all other mail is
> automatically sorted into a plethora of other folders.
>
> (2) I periodically read the SPAM folder for false positives and
> "bogofilter -N" them (a rare occurrence).
>
> (3) When I run across a SPAM in one of my many non-SPAM
> folders, I "bogofilter -S" it.
>
>With this system, all messages get trained on with minimal fuss.
>
>Now if bogofilter doesn't train on 'unsure' messages, step 3 becomes
>more complex since I have to decide whether -S or -s is appropriate.
>I also have to somehow come up with a way of training on the 'unsure'
>non-SPAM.
In my environment, email reading is done on the Windoze box to my left and
access to my mail server is via rsh session from the Linux box on my right.
Like you I have a SPAM folder and do steps 1, 2, and 3. In addition, I
have an UNSURE folder where a dozen or so messages go each day. When there
are messages in the UNSURE folder, I use the Linux box to access an
unsure-bogofilter file on the mail server. Using emacs, I get the messages
and put copies into the spam-fixups folder. The copies get names like
unsure-good.MMDD.HHMM.txt and unsure-spam.MMDD.HHMM.txt. Any false
positives and false negatives get copies made as well (as
spam.MMDD.HHMM.txt or good.MMDD.HHMM.txt). An hourly cronjob looks for
unsure-spam.*, unsure-good.*, good.*, and spam.* and feeds them to
bogofilter with the -s, -n, -N, or -S switch.
I could have set up special email addresses for these 4 message groups and
had the cronjob use the mailboxes as its input source. However, I'm
comfortable with the current process and haven't felt a need to change.
The question that's asked is "How do I benefit from unsure?" The answer is
that the "odd" messages go into one place for easy inspection. Messages
that go into my "spam" folder get a quick (very cursory) author/title scan
and then are moved to cold storage. Messages in the "unsure" folder get
closer attention. All other messages go into their various folders and.
since I've come to trust bogofilter's spam/ham/unsure classifications, I
just read those messages.
Periodically, a spam message shows up, usually in my In box. Then I look
at the X-Bogosity: line. Usually it's missing. The reason for _that_ is
that I often put experimental code onto the mail server. Other than those
times, bogofilter does well.
Hope the long winded reply didn't put you to sleep:-)
David
More information about the Bogofilter
mailing list