Bogofilter for notification filtering 2

Ed Blackman ed at edgewood.to
Tue Jan 18 23:25:13 CET 2011


On Tue, Jan 18, 2011 at 06:04:29PM +0100, Florian Lindner wrote:
>Now I want to use it for another filtering task. Everybody knows these
>notification mail from forums, facebook, ... (New reply, New message, ...)
>
>I want to use bogofilter to sort them in a seperate folder.

I do something very similar.  I call it "archivicity", because I want to 
save all my email conversations with humans, but not machines.  In ten 
years I'll still want to be able to pull up conversations with friends, 
if only as a personal reminscence, but won't want to be remember that my 
bank statement was available, that disk space was low on the web server, 
or that someone wanted me to feed their pigs on Farmville.  I'm even 
slowly training bogofilter to mark my sister's email as archive-worthy 
when we're having a conversation, but not when she's sending me yet 
another urban legend or joke collection.

I use a separate config file, rather than another directory.  I have the 
normal config directives (robs, spam_cutoff, etc)  set to values I've 
tuned, but the core of the changes are the following directives:

wordlist r,archivewords,archivewords.db,1
spam_header_name=X-Archive
spamicity_tags = No, Yes, Unsure
spamicity_formats = %0.6f, %0.6f, %0.6f
header_format = %h: %c, tests=bogofilter, archivicity=%p, version=%v
log_header_format = %h: %c, archivicity=%p, version=%v

I run "bogofilter -c ~/.bogofilter/config.archive -ep" on incoming 
messages to classify them according to the existing wordlists that I 
originally created from a hand-selected training corpus of ham (messages 
that should be archived) and spam (messages that should not).  The 
wordlist directive stores the statistics in a separate DB from the spam 
words.

I have my mail user agent display messages in the message list with 
different colors depending on whether they're "X-Archive: No" or "Yes", 
and have keybindings set up to run "bogofilter -c 
~/.bogofilter/config.archive -s" or "... -n" for misclassified messages.

Note that since archivicity is a positive trait versus bogosity being a 
negative one, I reversed the normal order of the spamicity_tags: "No, 
Yes, Unsure" instead of "Yes, No, Unsure".  That way the headers would 
express what I wanted.

Before I thought of that, I tried classifying conversations as spam and 
notifications as ham, but that was very confusing to me every time I had 
to work on it.

It works very well, with fewer misclassifications than my regular 
spam vs ham pass.  Probably because spammers have a very good incentive 
to come up with ways to try to trick Bayesian classifiers, whereas 
Facebook couldn't care less what I do with their notification once I 
receive it.

Ed
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.txt
Type: application/pgp-signature
Size: 190 bytes
Desc: Digital signature
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20110118/4aff11e6/attachment.sig>


More information about the Bogofilter mailing list