[cvs] Potential for error?
David Relson
relson at osagesoftware.com
Tue Oct 22 05:34:43 CEST 2002
At 11:02 PM 10/21/02, Allyn Fratkin wrote:
>> > Also, I noticed that there were a lot of words in my lists that weren't
>> > words. Things like ab34af127 would be listed, but only once. Based on
>> > this, eventually the list files will bloat to inifinity.
>
>
>are you possibly training bogofilter using mailboxes from microsoft
>windows, that use CRLF as line endings? bogofilter up through 0.7.5 is not
>recognizing and discarding base64 attachments correctly with CRLF (the CR
>is throwing it off). it is treating them as normal text and parsing the
>base64 data as words. i submitted a fix for this but it didn't make it
>into 0.7.5.
Would you please resubmit the fix?
>my good word db went from 50MB to 3MB after i figured out and fixed this
>problem. i guess i get a lot of attachments. :-)
>
>by the way, it occurs the me that bogofilter will think any single word
>on a line is base64 and discard it, based on the regexp it uses to
>"recognize" base64. i guess this is not too serious until spammers
>start sending messages with only one word per line. :-)
>
>>Similarly, one could periodically discard any tokens whose good+spam
>>count is 1.
>
>did you mean good=spam? i think you would definitely
>want to keep a word that only appeared in one of the lists.
>--
>Allyn Fratkin allyn at fratkin.com
>Escondido, CA http://www.fratkin.com/
>
>
>---------------------------------------------------------------------
>FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
>For summary digest subscription: bogofilter-digest-subscribe at aotto.com
>For more commands, e-mail: bogofilter-help at aotto.com
More information about the Bogofilter
mailing list