tallison at tacocat.net
Fri Jul 9 14:49:09 CEST 2004
Tom Anderson wrote:
> On Fri, 2004-07-09 at 04:12, Andreas Pardeike wrote:
>>On 2004-07-09, at 08.58, Peter Bishop wrote:
>>Then what happens tO tExT LiKe tHiS?
> I'd imagine it'd be ignored completely since it doesn't meet the minimum
> token length. This isn't actually a terrible idea since it's not very
> readable text anyway, and there should be sufficient other tokens to
> make the message spammy. However, perhaps bogofilter could score both
> ways... with and without breaking on the case changes. But now we're
> getting more complicated.
And then there's those java and MSFT discussion lists where every
variable is written like this by tradition: MyDocuments, FirstLogin,
They are all, to the brain, a unique token with meaning seperate from
their split tokens. If anyone sends me email about MyDocuments it's
likely to be spam.
I suppose the statistics of variant configurations will answer the
question, but I think the approach of viewing the words differently then
intended (reading them by people) might backfire.
More information about the Bogofilter