HTML comments

David Relson relson at osagesoftware.com
Sun Jan 18 19:43:50 CET 2004


On Sun, 18 Jan 2004 18:26:49 +0100
"Burkhard Kaas" <bkaas at geva.de> wrote:

> What to do with mails like this one?
> 
> The amount of spam, which bogofilter can't catch, increases ...
> 
> Burkhard

Hi Burkhard,

I don't see anything difficult about the mail in your message.  It's
marked as text/html so bogofilter will remove the comments.  As a sample
of
what you quoted, "te<!--106 -->-->il an ei<!--545 -->ner" will be parsed
as "teil an einer".  If you run "bogolexer -p < msg" you'll see the
tokens that bogofilter finds.  If you want to see the original text and
the tokens intermixed run "bogolexer -x l -vv -p < msg".

If you want to see how bogofilter scores the tokens of the message, use
"-vvv" (as in "bogofilter -vvv < msg").

Is it possible that the text of the message (after comments are removed)
simply doesn't match the "good" words in your wordlist?

What version of bogofilter are you running?

Hope this helps.

David





More information about the bogofilter-dev mailing list