switching between different databases - in 1.3.0.rc1

Rob McEwen rob at invaluement.com
Mon Jun 9 03:30:23 CEST 2025


 >>could you provide me a sample message

Matthias,

https://www.invaluement.com/public_evidence/bogotest-spam.zip
(this was a spam sent directly to me - so it's safe to share publicly - 
please download for long-term usage - this link might not be available 
in the future)

(1) So basically I had a hard time finding this example - because when 
I'm just manually searching for one like this, the vast majority of what 
I'm seeing are not that different between versions. So I think examples 
like this might be somewhat anomalies?

(2) But yet meanwhile my overall database size massively bloats in 
1.3.0.rc1 (compared to 1.2.5) when training a fresh wordlist.db against 
a large amount of messages (many 10s of thousands)

So these two things (1 and 2) SEEM to contradict each other. So here is 
my theory as to how they can both be true at the same time:

(A) When it does do this boating, it can be very excessive - just like 
this example - 191 tokens vs 37K tokens. That's a massive difference! So 
only a tiny percentage of these - can still make a huge difference in 
the overall size of the database.

(B) this boating of tokens tends to happen more with certain attachments 
- and because messages that have attachments and little else that 
distinguishes ham/spam - therefore - bogo has been someone weak in 
dealing with those - and so over the years - as a direct result - I've 
trained MORE of those types of emails - since my training focuses on 
what NEEDS training - and meanwhile the ones that bloat have a LARGER 
overlap with these ones I just described that I trained more often. 
THEREFORE - my ham/spam folders for bogofilter training have a 
disproportionately larger amount of these bloating ones - than many 
batches of message - thus the reason my overall database size being much 
larger then when I'm using 1.2.5, even if finding such examples is 
difficult.

But now I'm thinking that this bloating might be more like an 
unintended/intermittent bug?

PS - Also, Matthias, thanks for your replies to my other two recent 
messages. I read all of those carefully and found that to be helpful. 
Sorry I didn't already reply to those to let you know that.

Rob McEwen, invaluement


More information about the bogofilter mailing list