[cvs] bogofilter/src word.h,1.17,1.18

Matthias Andree matthias.andree at gmx.de
Mon Mar 14 03:04:57 CET 2005


David Relson <relson at osagesoftware.com> writes:

> There's nothing gratuitous about the strlen() call in word_new().  It's
> part of the (unwritten) semantics of the function.  I'll be glad to
> document the feature and make it official.

Rather not make the interface so wide, all this automation is killing
us, and it isn't consistently used across the code either, in fact,
there is ZERO user except the bogus macro left.

> At the moment, rather than have more tweaks for word_t, I'd rather have
> the new bogoutil capabilities so I can build/release 0.94.1

Wow! I've tried just to add the --db-checkpoint option, which turned out
to be a major hassle.

I've needed to touch two files for the low-level implementation (quite
easy), two unrelated files for the stubs, two files for the ds_*
interface, then four files for the option, and spent half an hour
debugging two bugs with recovery that warrant further analysis as to
where these came from.

I must assess that we have

- simple code (option parsing, help, option definitions) spread over way
  too many files

- an absolutely inconcise and twisted database_db* implementation over
  half a dozen files, partially from trying OOP in C which is just
  painful as it cannot be articulated properly which left two groups of
  other implementations behind, 1. sqlite, 2. qdbm + tdb

- way too many #ifdef hacks all over the code

- blurry interfaces. I don't like to overload arguments when we want in fact
  be overloading functions, word_new() being the most recent example.

  I've created facts here now so we have a simple and well-defined
  interface without "if"s that add complexity. The cleanup removed 15
  lines from the word.c/word.h pair, which is a low-level data shuffling
  function. We could split word_new to word_new as creator and word_neww
  as copy constructor.

- too much dead code all over the place that need to be remembered and
  touched for trivial changes, switch() constructs in option parsing for
  instance

- still too much code duplication down deep, with dbe_cleanup_lite
  vs. dbx_common_close,

- come close to the point where the code is unmaintainable.

No matter what is on our schedule, we _MUST_ clean up, and the key
points to clean up are:

- option parsing. This must be in once central location per
  executable. M_* mode sets here, O_DB_* sets in half a dozen files,
  with callbacks - awful.

  We want something that keeps --help output, .xml for the man pages,
  the actual option and the parsing all in one place. I'll look at
  AutoOpts et al. next week to see if it can help us

- database backends. we need to kill one layer of indirection as it's
  only ballast without serving a real purpose, datastore.? will have to
  disappear before 0.96, and the whole probe/init stuff will need to be
  redone, streamlined. It has too little power in its current shape.

- database responsibilities. I'm not clear what I mean here, I dislike
  the idea that we hack around ifdefs, fTransaction and such, rather
  than just keeping a state variable (struct perhaps) that has
  information on what the back-end can do, and who's responsible for
  cleaning up, resizing things and so on.

- layers. option parsing will move out of data* functions for sure.

- perhaps stubs. a stub library to mask the missing data* functions,
  rather than having one set of stubs in sqlite, one for TDB/QDBM, might
  be easier. I started such but it became blurry when sqlite3 was added
  and the DB-TXN+DB-NOTXN-unification took place.

- dead or optional code. I've never liked the idea of serving complaints
  about 1,000 bytes in code size with #ifdef'ing code out. It is
  miniscule compared to the lexer, and causes maintenance headaches, and
  diversity for support, which luckily hasn't struck ourselves too hard,
  but diversity is awful since it clutters documentation and support
  mails with too many ifs and buts, which became somewhat obvious when I
  rewrote doc/README.db last week, which I've done without spending more
  than a subordinate clause on the option to disable TXN or non-TXN
  modes at compile-time.

-- 
Matthias Andree
_______________________________________________
Bogofilter-dev mailing list
Bogofilter-dev at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter-dev



More information about the bogofilter-dev mailing list