Month Abbreviations as Stopwords
tril at igs.net
Sun Jan 12 23:52:12 EST 2003
David Relson wrote:
> Evidentally, you've not been looking at the mime branch of development
> (where the new mime parsing code presently exists).
Er, nope. I actually hadn't noticed that a branch had occurred in CVS.
> The current processing of html discards all text within tags.
Hmm...is that going to be made an option? Stuff like the FF0000 in
'font color=#FF0000' can be really useful as spam indicators :-)
> A project that has come up from time to time is to implement an "ignore"
> list, i.e. a list of words that should be ignored when scoring
Maybe separate ignore lists for each of header, text body, and HTML body?
> The idea was to have the list be easily maintainable by a
> user. Using a plain text list would allow maintenance with any old text
> editor. If you're looking for a project, I can send you a partially
> completed version of an ignore list implementation :-)
tril at igs.net - http://www.igs.net/~tril/
A Pope has a Water Cannon. It is a Water Cannon.
He fires Holy-Water from it. It is a Holy-Water Cannon.
He Blesses it. It is a Holy Holy-Water Cannon.
He Blesses the Hell out of it. It is a Wholly Holy Holy-Water Cannon.
He has it pierced. It is a Holey Wholly Holy Holy-Water Cannon.
Batman and Robin arrive. He shoots them.
-- Principia Discordia
More information about the Bogofilter-dev