TODO for 1.0

Chris Wilkes cwilkes-bf at ladro.com
Mon Jan 13 21:44:38 CET 2003


On Mon, Jan 13, 2003 at 09:30:35PM +0100, Matthias Andree wrote:
> Chris Wilkes <cwilkes-bf at ladro.com> writes:
> 
> > I see the use in an MD5 of the body as being useful as then you can keep
> > an accurate track of if an email has already been seen and when it was
> > categorized.  I only use the body as if you bounce the mail back to the
> > server for a check the headers should be ignored.
> 
> body checksums are no good. Spammers send the same mail with just a
> unique tag -- this breaks your MD5 and gets the message re-registered.

My main reason for writting this was so that you'll have a track record
of each email you've received.  You could bounce a message back to the
server asking it what classification it had on the email and it could
respond that it was added to the spam list on a certain date.

Right now there isn't a way to find out if a message has been run
through bogofilter.  You could run the email again through BF to see
what classification it has now, which could be different from what it
was last week due to further learning by BF.

> > could move away from the -S and -N switches which un-registered a
> > message as one type and re-registered it as another and fold that into
> > -u, which automatically registers spam in the right database as you'll
> > know if you've seen the mail before.
> 
> You'd need to store the full token set for the message alongside the
> MD5. Talk about making the data base files big.

Not really: keep a 2nd set of datafiles that just stores the MD5 hash
and the date added or updated.  You don't really care what words in the
(good|spam)list.db and how they relate to an individual email address.

> You might want to look at spamprobe (also hosted at sourceforge) to
> figure if /that/ does something; it somehow keeps track of the messages
> it's seen.

I'll look into that, and I didn't know about the ./contrib/ section of
BF.  I'll keep quiet now about my idea ;)

Chris




More information about the Bogofilter mailing list