breaking the training db

Matthias Andree matthias.andree at gmx.de
Tue Sep 23 01:26:41 CEST 2003


On Mon, 22 Sep 2003, Peter Bishop wrote:

> On 22 Sep 2003 at 20:00, Matthias Andree wrote:
> 
> > True spoken, but nobody will care for such cruft code a few months from
> > now. Such degeneration or fallback code must be written, debugged,
> > integrated, only to be removed a month later, again with debugging,
> > de-integration and other tests. That's a lot of work in a beta version.
> 
> But on the other hand, it does make transitions more painless for dinosaurs
> like me who are well behind the leading edge and do not maintain spam 
> archives

I see the problem, and I am sharing part of the problem as well; I am
receiving >20 spam mails per day, not counting the virus junk that I can
only handle by bulk erasing and bouncing anything that remotely looks
like windows active content (I thought W32.Sobig.? had been nasty, but
then came Swen, and it's really close to DoS: 30 worms received per hour
-- without size limits on my mailbox, it's unusable (quota exhausted) in
2 hours...) -- and the problem is, there are so many users you can't
possibly summon some big brother (read: regional court) to sue them to
the edge of bankruptcy; the legal system would falter the moment you
tried that. Running insecure computers should be considered misdemeanor
just like driving a car with non-working brakes or slick tyres.

Back to the problem: I cannot possibly keep all spam, but one month
worth of spam (not counting viruses) is like 3 or 4 MB here and gives a
good start in training from scratch, could your hard disks accomodate
that? If you'd store that mail in a "Maildir/" or MH format folder
(bogofilter can now read these when training, thanks to the new built-in
bogoread* modules -- just use -B and name the Maildir), it'd be trivial
to age out the folder once it grows too large or to dispose of all spam
older than 30 days:

find Maildir/.spam/{cur,new,tmp} -mtime +30 -type f -xargs rm '{}' ';'

That's all you need to flush old mail from Maildir/.spam.




More information about the Bogofilter mailing list