[cvs] Potential for error?

David Relson relson at osagesoftware.com
Tue Oct 22 05:34:43 CEST 2002


At 11:02 PM 10/21/02, Allyn Fratkin wrote:

>> > Also, I noticed that there were a lot of words in my lists that weren't
>> > words.  Things like ab34af127 would be listed, but only once.  Based on
>> > this, eventually the list files will bloat to inifinity.
>
>
>are you possibly training bogofilter using mailboxes from microsoft
>windows, that use CRLF as line endings?  bogofilter up through 0.7.5 is not
>recognizing and discarding base64 attachments correctly with CRLF (the CR
>is throwing it off).  it is treating them as normal text and parsing the
>base64 data as words.  i submitted a fix for this but it didn't make it
>into 0.7.5.

Would you please resubmit the fix?


>my good word db went from 50MB to 3MB after i figured out and fixed this
>problem.  i guess i get a lot of attachments.  :-)
>
>by the way, it occurs the me that bogofilter will think any single word
>on a line is base64 and discard it, based on the regexp it uses to
>"recognize" base64.  i guess this is not too serious until spammers
>start sending messages with only one word per line.  :-)
>
>>Similarly, one could periodically discard any tokens whose good+spam
>>count is 1.
>
>did you mean good=spam?  i think you would definitely
>want to keep a word that only appeared in one of the lists.
>--
>Allyn Fratkin             allyn at fratkin.com
>Escondido, CA             http://www.fratkin.com/
>
>
>---------------------------------------------------------------------
>FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
>To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
>For summary digest subscription: bogofilter-digest-subscribe at aotto.com
>For more commands, e-mail: bogofilter-help at aotto.com





More information about the Bogofilter mailing list