Tom Allison tallison at
Fri Jul 9 14:49:09 CEST 2004

Tom Anderson wrote:
> On Fri, 2004-07-09 at 04:12, Andreas Pardeike wrote:
>>On 2004-07-09, at 08.58, Peter Bishop wrote:
>>Then what happens tO tExT LiKe tHiS?
> I'd imagine it'd be ignored completely since it doesn't meet the minimum
> token length.  This isn't actually a terrible idea since it's not very
> readable text anyway, and there should be sufficient other tokens to
> make the message spammy.  However, perhaps bogofilter could score both
> ways... with and without breaking on the case changes.  But now we're
> getting more complicated.
> Tom


And then there's those java and MSFT discussion lists where every 
variable is written like this by tradition: MyDocuments, FirstLogin, 
MyMusic....  AOLuser...
They are all, to the brain, a unique token with meaning seperate from 
their split tokens.  If anyone sends me email about MyDocuments it's 
likely to be spam.

I suppose the statistics of variant configurations will answer the 
question, but I think the approach of viewing the words differently then 
intended (reading them by people) might backfire.

More information about the Bogofilter mailing list