My wordlist doesn't detect spam very well anymore
    Jonathan Kamens 
    jik at kamens.us
       
    Sun Feb  9 20:42:57 CET 2020
    
    
  
With the numbers you mentioned in your .MSG_COUNT, I doubt "several 
hundred" messages of either ham or spam is going to be enough to 
generate accurate training.
The rolling corpus of messages that I save to do my monthly training 
currently has 16126 ham and 5519 spam messages in it.
Note, also, that it's important to do your training on /recent/ ham and 
spam messages, because the relative frequency of various words will 
change in both over time, so old messages will produce worse training.
   jik
On 2/9/20 2:17 PM, Teemu Likonen wrote:
> Jonathan Kamens [2020-02-09T07:47:37-05] wrote:
>
>> In my experience, you need to save a big corpus of known, recent spam
>> and ham messages and periodically run bogotune to determine the
>> currently optimal parameters for your .bogofilter.cf file.
> Ok, thanks. So I need to collect at least several hundred spam messages
> before running bogotune. It will take some months before I have enough
> but I will do that.
>
    
    
More information about the bogofilter
mailing list