cdb support

Matthias Andree matthias.andree at gmx.de
Wed Jul 9 15:27:23 CEST 2003


WEEE -- ezmlm-idx removed my text/plain and left the application/x-perl in.

Adrian, is there any misconfiguration on the MIME filtering stuff on
this list?


Gyepi SAM <gyepi at praxis-sw.com> writes:

> On Sat, Jul 05, 2003 at 05:53:28PM -0400, Greg Louis wrote:
>> I'd just started thinking a few days ago about maybe supporting cdb, at
>> least for those of us who don't use -u.  The idea would be to
>> accumulate separate spam and nonspam register_me mailboxes, and then
>> run a registration tool daily (or every six hours, or whatever) that
>> would update the counts, build a new .cdb file and mv it into place
>> (I'm using the one-list patch, but pluralize as appropriate).  What do
>> you think?
>
> Sure, that's doable. There are a couple of issues to keep in mind:
>
> I would use a perl script to build the
> databases, using bogolexer and bogoutil, and just add read-only cdb support
> to bogofilter proper.
>
> Rebuilding the databases from scratch may take a long time though,
> depending on the message counts and file sizes so it may not be worth
> it. Please try it and see if the speed is acceptable.

I have a Perl script to dump the current Berkeley .db files in a format
that cdbmake understands.

To test, I use a test .DB file (btree format, DB 4.0, goodlist.db) that
has 551,633 entries and is 29,580 kByte.

To avoid any disk seeks distorting the result, I prime the caches by
doing cat goodlist.db >/dev/null

It takes 20.5 s (18.0 user 0.7 sys) to dump my BerkeleyDB to cdbdump
format. The resulting file is 14,758 kByte.

It takes 4.2 s (1.0 user 0.4 sys) to run cdbmake to convert the cdbdump
format to .cdb format. Most of the wallclock time is spent in
fsync(). The .cdb file is 23,060 kByte.

It takes 1.2 s (0.9 user 0.3 sys) to run cdbdump to convert the .cdb
file back to cdbdump format.

The missing part is how .cdb would fare in bogofilter. Someone eager to
code this?

This all happens on a Linux 2.4 box with SYM53C875 SCSI, Fujitsu MAH
drive, ext3 file system, plenty of RAM and 700 MHz Duron CPU.

It is good to know that you need plenty of disk space (twice the .cdb
file plus input data) to build the .cdb, in my case, I need 60 MB
(2 * 23 + 14) to use cdb, as opposed to 29 MB with BDB.

The Perl script is attached, documentation and modified BSD-license,
including an example of how to use, are inside the script.

NB: DJB's original cdb isn't Free or Open Source software. "You may
distribute unmodified copies of the cdb package."

However, there are two options; Debian ships a freecdb (haven't looked
at it), and Michael Tokarev has a public domain "tinycdb" that shares
the file format with DJB's cdb, I presume this is also available on
Debian Linux.

http://www.corpit.ru/ftp/tinycdb/changelog
http://www.corpit.ru/ftp/tinycdb/tinycdb-0.72.tar.gz

Perl script, gzipped and uuencoded, just in case:

begin 644 bdbtocdb.pl.gz
M'XL("`<3##\"`V)D8G1O8V1B+G!L`*U6?V_C1!#]FWR*(1=$`VX;.'Z(E#O)
MB;?M2JD=;*>]"A!R[$VSJF,'KYU2`=^=-^LD30L'0E!%Z69GY\V\M[.S^^I#
M.FU,=3K7Q:DJ-K165=[IO*)Y-J_+-)N?K',ZIH1,6NEU375):5EL5%5C;J2J
M>Y6K1_)&UI#-5\F](EVLFYH69;5*ZLXK8(W+]6.E[Y8U'8W[]/E@\)KFCW25
MU/52)X;<(JN4PCHWS\FN,U0IHZJ-RDXL@">B<2BGL0S\(1W_QS^+&"\U at C29
M7JFB3JK''3_,FF:]+HW*F%*EDNP%T:3(:%WI`FOQ*2@!6LN5ZB6^=BHT1:8J
M4V.Y:4E(BUZ4-4&^QU;&0B,\!VJ,<AA0_;)6*?C72T5%LE)4+NSXR"AUG\QS
MU:<LJ3GF/#&*H-Y"(PHEU5W#5!R;G[XK2BA("00MX5[M[4^YF&79Y!G-%9,T
M90%PJ)`LU#Z=>5/;V!RP#5>S:ENE2E1*4BL#M+*@58,DF!OPTF52W$&_AZ7.
MG_M436%.6NVW,RP5BT)J`1%K1D/X=9ZDBO(RO=_F*]ZY5].)H."<9I&@H\08
MT#$T#2+YKO\O:\(BI at F+G:S6N3K)YO3V-%.;TZ*!8."SKC4$R1VD!TW6"F2:
M]98Q+'#7"WM2GAV30[C=N$H>Z(Q59"?:%\>WA_;=&,;]N%ZMN:XTOJH5'2_H
MKQSL(JR8R+'P(_'?C\9>G5!EVM251 at F`KZTIE`27NRF;"GO#,V at 9?'*X^(U#
M#[I>4EG9_V6#@T^K,M,+G5K)4)>58LE6NJX5GZ!RHS,^9,NDK;)%F>?E at R[N
M^&1DFIV,=5JI>FB3^N1%6H8/QS:?M,Q46X25JA/DR9#)O-RP:==]4)\Z16%S
M20(O!Q9#',8KLA?)(&":)V at 3U<E[DD"P`R5V28!@UJ3J[_+ at 6N>C\"_SH"V[
MK$SMD=Z5)'Q.H7][WM&.5*63W#PI;3?(.AZDOR/E*VW]7O:=%TT:J3^ML?HS
M`23>`I85Z\J'!FV at V?9056285UP;R&55UHI:<=#FT"(UNCPM8-AVBG)1/_"V
M/]41#F#*A<1MEPNLXA(JVF(R9D\BOI011<%Y?..&@C">AL&U](1'HUL8!8V#
MZ6TH+RYCN at PFG@@C<GT/LWX<RM$L#C#1=2-X=ODN at LGU;]%YIJ&((@I"DFA!
M$G#`#UT_EB)R2/KCR<R3_H5#@"`_B'$:KV2,97'@<%CNMG]RY$9V)<+Q)7ZZ
M(SF1\:V->"YCGZ.=(YQ+4S>,Y7 at V<4.:SD+T. at 9C<IZ,QA-77 at GO!!D@*HEK
MX<<47;J3R0NNP8TO0D[_&=&1L'W#':&IVF"@ZLE0C&/F]#0:0T"D.'$HFHJQ
MY(%X)\#'#6\=H/+M'J#W?#?#,IC)<Z_<"Q`\^@=EL#GC62BN..W@'##1;!3%
M,I[%@BZ"P+.*1R*\1F^+SF at 21-&V^SN($;L<G$&@&,P8CV:19.U8;S\683BS
M[X4^-OL&ZH"_"V?/RASXEC"$"L);AF4E["XX=',I,(_=]EMN<>BR%!&T&\>'
M"Q$34L8'3,D7%Q-Y(?RQ8&O`.#<R$GV[;3+B);(-?>,B[HRIV^WB2\T.#TK8
ML9M*\IQ<[UIRZNUB@*$4(KDM&TQ&L_'E5OCM]<Z7)4OK_@]/)HO8D]GP\+)S
M-C0X>6W?<J>#KT\'W]!GKX>#+X>#`:G5*B'QRYIZ\.1;0P;#X26Z4Z[.[&]O
M]-.YYA^=3"OJXO+G;M+%BRE7AGO"0A?H"3TWO+C^?O`CUO4\>D,U%G^T3,S2
MH8^W$!\[^U4.#?#YZHLO,`7K*`Z%Z.#&Y;9CPZ1)P>\31D%3VKD-J?=A%Q'0
MNNFHA^=:W1C$ZGG';XWZ^:AWKQX!N$GR!O=&^-.Y#*.X?V:!:;_\#0WHY=Q[
M(7SQ+N[W.[_:]>U#LOMIUZ%<%7?UTB[O.]1U#J:L+T\.V7K\MIWXH>AV/GC!
MCGNCY8?.B4?XEMWOG0[>*WMZW]*@3[\^\\.E66FU:=]ZK1?!JZ?+-X5Z>+:#
MAVYX$16TM(;]]L'G^.TB8]/1`GM4E$=1[*'6F<!#M[_;]BW$(F_,TD;\T_Y;
J(#8_#[HPCT7Z?@^V(D93[.OEK-/[#1ORV5EG*S>4.^O\`4[0#7OY#```
`
end

-- 
Matthias Andree




More information about the bogofilter-dev mailing list