Month Abbreviations as Stopwords

Suzanne Skinner tril at
Wed Jan 8 20:54:08 CET 2003

I think it would be a good idea to add the month abbreviations Jan-Dec (as
found in mail headers) to the default stopwords in lexer.l. I recently noticed
that the scoring for these words was somewhat lopsided here because of the way
my spam intake has increased over the past year. "Feb" was the worst--for some
reason, February 2002 is entirely missing from my otherwise year-spanning
corpus, resulting in a 0.01 probability! Fortunately, I noticed this before
Feb 2003 rolled around :-)


tril at -

A Pope has a Water Cannon.                               It is a Water Cannon.
He fires Holy-Water from it.                        It is a Holy-Water Cannon.
He Blesses it.                                 It is a Holy Holy-Water Cannon.
He Blesses the Hell out of it.          It is a Wholly Holy Holy-Water Cannon.
He has it pierced.                It is a Holey Wholly Holy Holy-Water Cannon.
Batman and Robin arrive.                                       He shoots them.
                                    -- Principia Discordia

More information about the bogofilter-dev mailing list