Messages that slow bogofilter down (was: profiling)
Greg Louis
glouis at dynamicro.on.ca
Thu Feb 20 23:21:32 CET 2003
On 20030220 (Thu) at 1656:47 -0500, David Relson wrote:
> Message 3.txt is essentially 100,000 x's and 4.txt is 600,000 x's. Both of
> these messages have an initial group of tokens followed by one monstrously
> long token. They could be considered extreme, pathological cases.
Unquestionably. But legal.* The sort of thing we must be able to
handle in the real world in case someone attempts a DoS. (Or, as in
this case, fudges up a big attachment because there's a suspicion big
attachments are going astray and he wants to test mail handling.)
> Message 2.txt is 5MB long, which is certainly bigger than average.
It's bigger than our average, anyway, because I refuse messages larger
than 5 megabytes at work (personally I refuse anything over 1.6). The
tendency these days is to press for higher limits, however, and we
should be designing for such limits (I _HATE_ pontificating like this
when I can't contribute meaningfully to the code, but someone's gotta
say it).
> I'm running another test set - with optimization turned on (rather than the
> previous unoptimized, debug code). The optimized times are much better
> (10.90s for 2.txt, 6.66s for 3.txt, 151.13 for 4.txt).
That is better, yes. Still over two and a half minutes for the
six-hundred-thousand-x file. If you send a mere hundred of those to a
UP mail server that normally handles 1000 emails an hour (not a big
load) with the MTA feeding each message through bogofilter, by how much
do you slow his mail handling down?
(*)This was a plain-text attachment with a line longer than 1000
characters, so strictly speaking it is _not_ "legal." That, however,
wouldn't stop most MTAs these days from passing it on.
> I suspect that the way to pursue this issue is to post info to the mailing
> list and see what responses appear.
Quod feci.
--
| G r e g L o u i s | gpg public key: |
| http://www.bgl.nu/~glouis | finger greg at bgl.nu |
| Help free our mailboxes. Include |
| http://wecanstopspam.org in your signature. |
More information about the Bogofilter
mailing list