RDMS

Matthias Andree matthias.andree at gmx.de
Thu Nov 23 15:15:55 CET 2006


Tom Allison schrieb:
> will bogofilter ever do RDMS instead of Berkeley?

It already does support SQLite3. Porting the sqlite3 interface to
PostgreSQL, for instance, shouldn't be too much of an effort for
somebody who is acquainted with PGSQL and C. However...

> I'm trying to set up bogofilter for a slightly bigger email system than I've 
> been using in the past.  While it's been very effective, I just can seem to 
> rationalize the complexity of maintaining all these seperate files for users 
> preferences and word lists.

...for your setup, we'd need some more interfacing code in bogofilter
that allows you more flexibility in the queries and table layout
(currently, queries and schema are hard-wired into the sqlite3 driver),
and probably to provide credentials or at least tell the database which
user account to select.

This is going to add a lot of complexity unfortunately, so we shouldn't
do an ad-hoc solution, and we're standing in front of a blank whiteboard
- we need to draw up a decent concept how things are supposed to happen
and what part of the whole installation takes which responsibilities.

This looks interesting, but I haven't got much (if any) time before
Early 2007.

> I know SpamAssassin did this with their Bayes wordlists and found that initially 
> it was very painful, but as the number of inserts decays over time the 
> performance comes back.  But I am violently opposed to the hacked up methods of 
> SA.  They have large lists of exclusions in their bayesian filter objects that 
> now have to be maintained along with everything else.  Lots of work.

I think that ignore lists wouldn't be all that bad for bogofilter
either. For instance, if you have several inbound paths for messages (=
accounts), you may not want to "penalize" messages towards spam just
because they've taken a route with less efficient or nonexisitng
pre-filtering upstream, IOW, because they've taken a route that is
haunted by more spam. In such cases, headers specific to that route
might be eliminated so as not to contribute to the spamicity/bogosity
(the metric).

As to the speed issue, SpamAssassin's Bayes implementation is unbearable
for me. An orders of magnitude, if not more, too slow - personally, I've
got use_bayes 0 for all my SpamA' setups.

HTH,
Matthias



More information about the Bogofilter mailing list