My wordlist doesn't detect spam very well anymore

Jonathan Kamens jik at kamens.us
Sun Feb 9 20:42:57 CET 2020


With the numbers you mentioned in your .MSG_COUNT, I doubt "several 
hundred" messages of either ham or spam is going to be enough to 
generate accurate training.

The rolling corpus of messages that I save to do my monthly training 
currently has 16126 ham and 5519 spam messages in it.

Note, also, that it's important to do your training on /recent/ ham and 
spam messages, because the relative frequency of various words will 
change in both over time, so old messages will produce worse training.

   jik

On 2/9/20 2:17 PM, Teemu Likonen wrote:
> Jonathan Kamens [2020-02-09T07:47:37-05] wrote:
>
>> In my experience, you need to save a big corpus of known, recent spam
>> and ham messages and periodically run bogotune to determine the
>> currently optimal parameters for your .bogofilter.cf file.
> Ok, thanks. So I need to collect at least several hundred spam messages
> before running bogotune. It will take some months before I have enough
> but I will do that.
>


More information about the bogofilter mailing list