tri-state classification

David Relson relson at osagesoftware.com
Sun Oct 24 18:20:43 CEST 2004


On Sun, 24 Oct 2004 16:44:41 +0100
Robin Bowes wrote:

> David,
> 
> I've been thinking about doing this for a while and your message has 
> prompted me to think about it some more.
> 
> I currently use bogofilter in its default configuration with 
> qmail/vpopmail/maildrop. I filter all mail through bogofilter using 
> maildrop as follows:
> 
>    xfilter "${BOGOFILTER} -e -p -u -d ${BOGODIR}"
> 
> where:
>    BOGOFILTER is the full path to bogofilter
>    BOGODIR is the mail receipients .bogofilter dir.
> 
> I then check for spam using:
> 
>    if (/^X-Bogosity: *Yes/)
>    {
>       ...
>    }
> 
> Detected Spam is dumped in the users' "Spam" folder, everything else
> is delivered as normal.
> 
> Any Spam not detected can be manually dumped into the users' 
> Spam/Undetected folder. Any messages wrongly identified as Spam can be
> 
> manually dumped in the users' Spam/Misdetected folder. A cron job 
> periodically checks these folders and re-processes the messages.
> 
> So, to use the tri-state classification I would need to add an 
> additional Spam/Unsure folder and add an additional check something
> like:
> 
>    if (/^X-Bogosity: *Unsure/)
>    {
>       ...
>    }
> 
> Is that it?
> 
> What cutoff values will I be using at the moment with my bi-state 
> classification?
> 
> How do I get bogofilter to perform tri-state classification?
> 
> What are the default ham and spam cutoff values?
> 
> Thanks,
> 
> R.

Hi Robin,

Bogofilter generates Yes/No results if only SPAM_CUTOFF is set and
Yes/No/Unsure results if both SPAM_CUTOFF and HAM_CUTOFF are set.

All you need to do is modify the "HAM_CUTOFF=0.45" line in
/etc/bogofilter.cf by removing the leading hash marks.

The default SPAM_CUTOFF value is 0.99.  That high value minimizes the
odds of false positives.  If you have more false negatives (spam
classified as ham) than you like, you may wish to use a lower value than
0.99.

To change tags from Yes/No/Unsure to Spam/Ham/Unsure, remove the hash
marks from the "spamicity_tags = Spam, Ham, Unsure" line.

Note that enabling tri-state mode affects the "-u" (autoupdate) option
-- "Unsures" aren't fed back into the wordlist.  This means that your
cron job needs to behave a bit differently for "Unsure" than for "Yes"
("Spam").

HTH,

David



More information about the Bogofilter mailing list