Using casefolded wordlists
    Greg Louis 
    glouis at dynamicro.on.ca
       
    Fri May 30 13:09:45 CEST 2003
    
    
  
On 20030530 (Fri) at 0900:12 +0100, Peter Bishop wrote:
> If I use the old casefolded wordlist it clearly will not recognise mixed 
> case words like FREE OFFER. Ditto for ham email, there will be  more 
> unrecognised words.
> 
> email   spamicity (robinson)
> ham   0.35 (Pi)     0.47 (PI)
> ham2 0.37 (Pi)     0.47 (PI)
> spam 0.62 (Pi)     0.63 (PI)
> spam2 0.61(Pi)    0.61 (PI)
> 
> >From my unscientific sample it appears that ham is affected more than spam 
> and the effect could be to increase false positives until the wordlists get 
> updated with mixed case words
That could be so.  I suggested advising people to classify with -Pi but
train with -PI for a couple of months if they couldn't rebuild their
training databases; alternatively, one could speed up the process (as I
did at work, where I can't rebuild) by training, with -PI, on a large
batch of new messages (roughly equal numbers of spam and nonspam) right
after doing the upgrade.
-- 
| G r e g  L o u i s          | gpg public key: finger     |
|   http://www.bgl.nu/~glouis |   glouis at consultronics.com |
| http://wecanstopspam.org in signatures fights junk email |
    
    
More information about the bogofilter
mailing list