New header token tagging

Greg Louis glouis at dynamicro.on.ca
Fri Sep 26 13:07:12 CEST 2003


On 20030926 (Fri) at 0917:02 +0200, Boris 'pi' Piwinger wrote:

> >The fp rate did go through the roof for exactly one type of mail:
> >valid, short messages from mailing lists on which spam is frequently
> >posted.  In addition, the fn rate increased sharply; the spam not
> >recognized tended to be short and not egregiously spammy-looking. 
> 
> Well, if the messages has almost only header and that is not
> understood ...

Precisely.  Take a gold star ;)

> Here is what you could do: Take you current database to
> decide if something is spam or not (not using header
> tagging). Use those new mails (now with header tagging) to
> build a new database. For a while you'll have to correct
> errors in both databases, but after not too long you can
> just switch the database.

David and I are testing a facilitating hack: for now one can register
with tagging, but for classification the code will combine tagged and
untagged counts for the tagged tokens.  This allows the user to keep
getting results no worse than before, while building up the counts of
tagged header tokens within the existing db.  When there are enough
tagged header tokens, we disable combination and (one hopes) reap the
benefit of the tagging.

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |

Header information for this message:
Subject: Re: New header token tagging
     To: bogofilter <bogofilter at aotto.com>
   From: Greg Louis <glouis at dynamicro.on.ca>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 211 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20030926/ea93edaf/attachment.sig>


More information about the Bogofilter mailing list