Bogofilter for notification filtering 2
Ed Blackman
ed at edgewood.to
Tue Jan 18 23:25:13 CET 2011
On Tue, Jan 18, 2011 at 06:04:29PM +0100, Florian Lindner wrote:
>Now I want to use it for another filtering task. Everybody knows these
>notification mail from forums, facebook, ... (New reply, New message, ...)
>
>I want to use bogofilter to sort them in a seperate folder.
I do something very similar. I call it "archivicity", because I want to
save all my email conversations with humans, but not machines. In ten
years I'll still want to be able to pull up conversations with friends,
if only as a personal reminscence, but won't want to be remember that my
bank statement was available, that disk space was low on the web server,
or that someone wanted me to feed their pigs on Farmville. I'm even
slowly training bogofilter to mark my sister's email as archive-worthy
when we're having a conversation, but not when she's sending me yet
another urban legend or joke collection.
I use a separate config file, rather than another directory. I have the
normal config directives (robs, spam_cutoff, etc) set to values I've
tuned, but the core of the changes are the following directives:
wordlist r,archivewords,archivewords.db,1
spam_header_name=X-Archive
spamicity_tags = No, Yes, Unsure
spamicity_formats = %0.6f, %0.6f, %0.6f
header_format = %h: %c, tests=bogofilter, archivicity=%p, version=%v
log_header_format = %h: %c, archivicity=%p, version=%v
I run "bogofilter -c ~/.bogofilter/config.archive -ep" on incoming
messages to classify them according to the existing wordlists that I
originally created from a hand-selected training corpus of ham (messages
that should be archived) and spam (messages that should not). The
wordlist directive stores the statistics in a separate DB from the spam
words.
I have my mail user agent display messages in the message list with
different colors depending on whether they're "X-Archive: No" or "Yes",
and have keybindings set up to run "bogofilter -c
~/.bogofilter/config.archive -s" or "... -n" for misclassified messages.
Note that since archivicity is a positive trait versus bogosity being a
negative one, I reversed the normal order of the spamicity_tags: "No,
Yes, Unsure" instead of "Yes, No, Unsure". That way the headers would
express what I wanted.
Before I thought of that, I tried classifying conversations as spam and
notifications as ham, but that was very confusing to me every time I had
to work on it.
It works very well, with fewer misclassifications than my regular
spam vs ham pass. Probably because spammers have a very good incentive
to come up with ways to try to trick Bayesian classifiers, whereas
Facebook couldn't care less what I do with their notification once I
receive it.
Ed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.txt
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20110118/4aff11e6/attachment.sig>
More information about the Bogofilter
mailing list