tri-state classification

Robin Bowes robin-lists at robinbowes.com
Sun Oct 24 17:44:41 CEST 2004


David Relson wrote:
> Greetings,
> 
> There's a basic change in bogofilter's classification output that I
> think should be made.
> 
> In the GETTING.STARTED document, particularly the section on "ongoing
> training", thinking and writing in terms of bogofilter's tri-state
> abilities seemed especially valuable.  Being able to describe messages
> scores as "Spam", "Ham", and "Unsure" is much clearer than describing
> them as "X-Bogosity: Yes" and "X-Bogosity: No".
> 
> I propose that bogofilter's default configuration be changed to use
> tri-state classification with a conservative ham cutoff of 0.4 and with
> bogosity tags of "Spam", "Ham", and "Unsure".
> 
> Let me know if you approve/disapprove of this change.
> 
> Regards,
> 
> David
> 
> Note:  Sites wishing to continue using bi-state classification, can do
> so by adding options "ham_cutoff=0.0" and "spamicity_tags=Yes,No" to
> bogofilter.cf

David,

I've been thinking about doing this for a while and your message has 
prompted me to think about it some more.

I currently use bogofilter in its default configuration with 
qmail/vpopmail/maildrop. I filter all mail through bogofilter using 
maildrop as follows:

   xfilter "${BOGOFILTER} -e -p -u -d ${BOGODIR}"

where:
   BOGOFILTER is the full path to bogofilter
   BOGODIR is the mail receipients .bogofilter dir.

I then check for spam using:

   if (/^X-Bogosity: *Yes/)
   {
      ...
   }

Detected Spam is dumped in the users' "Spam" folder, everything else is 
delivered as normal.

Any Spam not detected can be manually dumped into the users' 
Spam/Undetected folder. Any messages wrongly identified as Spam can be 
manually dumped in the users' Spam/Misdetected folder. A cron job 
periodically checks these folders and re-processes the messages.

So, to use the tri-state classification I would need to add an 
additional Spam/Unsure folder and add an additional check something like:

   if (/^X-Bogosity: *Unsure/)
   {
      ...
   }

Is that it?

What cutoff values will I be using at the moment with my bi-state 
classification?

How do I get bogofilter to perform tri-state classification?

What are the default ham and spam cutoff values?

Thanks,

R.
-- 
http://robinbowes.com



More information about the Bogofilter mailing list