Naive Bayes classifier derived from bogofilter-0.7
Greg Louis
glouis at dynamicro.on.ca
Tue Nov 26 12:45:34 CET 2002
On 20021125 (Mon) at 1931:20 -0500, Scott Lenser wrote:
> > With 0.6.0 I can't actually get your bogofilter to complete its runs;
> > I get gadzillions of gmime-WARNING messages, some of which are utterly
> > bogus, eg
> > gmime-WARNING **: No domain in email address: "Attila
> > =?iso-8859-1?q?Szov=E1thy=22?= <aszovathy at gw.cdk.bme.hu>
> >
> > and there is one email in the second of three test runs I scripted that
> > causes bogofilter_srl to hang, eating cpu but effecting nothing.
> > Unless I can get around that, I won't be able to do any worthwhile
> > comparisons.
Quite a while after I sent this, the program did indeed terminate on
that message and went on to complete the run.
> I get a lot of the stupid gmime-WARNING messages as well. I usually just
> redirect stderr to /dev/null to ignore them.
Unfortunately it seems the gmime-CRITICAL ones cause the program to
quit without reporting a result; this happens about 0.3% of the time.
> I've noticed a problem while doing some further testing. Since I am relying
> on gmime to get rid of mime and I bumped up the MAXWORDLEN so that I could
> store tokens like 'HF:User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.8) Gecko/20020214'
> if the email includes quoted parts that used to be mime encodings, I'll end
> up encoding a whole bunch of "words" out of the base64 encoded cruft. Basically
> messages like
>
> > <base64 stuff>
> > <base64 stuff>
> > <base64 stuff>
>
> will cause it to take a long time on that message. I've never seen it not terminate
> but sometimes it takes a long time. You should be able to fix that particular
> problem by putting in a base64 encoding filter in lexer_text_plain.l and lexer_text_html.l.
> The current on in bogofilter-0.9 is suitable if you remove the ^ from the beginning
> (and maybe the $ from the end but probably not needed).
I was able, with patience, to complete the run and will be doing the
data reduction this morning. Further report to follow.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
More information about the bogofilter-dev
mailing list