bogotune [was Re: bogotrain]

Dave Lovelace dave at firstcomp.biz
Mon Dec 22 14:34:01 CET 2003


David Relson wrote, in part:
> Message counts can be determined with:
> 
>    cat ham*.mbx | grep -c "^From "
>    cat spam*.mbx | grep -c "^From "
> 
But the error I got referred to the wordlists, not to the mail files.
Again, how do I find out how many messages are in the wordlists?  And
why can't bogotune check that before starting other, very lengthy,
procedures?  The mail that was used to produce the wordlists is not
available for counting - it's now mixed with other mail.

> Converting the input messages to the message count format is a good
> thing to do.  It takes a hunk of time, but will save time when bogotune
> is run.
> 
The man page refers to msg-count files but doesn't say what they are.

> When you learn more about what bogotune was doing during those hours,
> let me know.  I _may_ be able to suggest something that will speed
> things for you.
> 
This system is slow (by current standards) & does not have much memory.
I can't justify running something that bogged things down for well over
12 hours without (apparently) having really started whatever work it does,
just in the hope that it will give me information about what went
wrong so that I can run it some more.  If it has to load the wordlists
and the mail files into memory, it's useless for us; there's just not
enough memory there, & swapping that much is not feasible.

BTW, the bogotune man page has an error.  Under synopsis, it indicates
that the argument to -d is a wordlist file, but it actually must be the
directory containing the wordlist.  It is also confusing (to me, anyway)
in saying "the ratio of spam to non-spam should be in the range ...".
Is that messages, or tokens, or what?  And I understood it to refer
to the message files, not the word list, but the errors I got indicated
that it applied to the word list - and, again, I have no way of determining
what the values for the wordlist are.

-- 
- Dave Lovelace
  dave at firstcomp.biz
  davel at cyberspace.org




More information about the Bogofilter mailing list