contributing datasets, was: Is bogotune helpful?

Greg Louis glouis at dynamicro.on.ca
Wed Dec 3 13:11:26 CET 2003


On 20031202 (Tue) at 2330:38 -0800, Bill Wohler wrote:

> I've got 40,000 spam and hams roughly 50/50. Let me know how I can
> create and get the datasets to you.

I'm not sure if any message-count converter is supplied with bogofilter
these days, but running a command of the form

    formail -s bogol dbdir <mboxfile >messagecountfile

where dbdir (optional, default ~/.bogofilter) is where your training
database is stored, will do the job if you put this in file "bogol":

#! /bin/sh
db=~/.bogofilter
test "x$1" = "x" || db=$1
( echo .MSG_COUNT; bogolexer -p | sort -u) | \
    bogoutil -w $db | \
    awk 'NF == 3 {printf("\"%s\" %s %s\n", $1, $2, $3)}'

Once you have the message-count files, the next step is roll a tarball
with those and the training db (.bz2 preferred).  Then, if you have
access to somewhere from whence we can pull it, you could put the
tarball there and send us the URL; otherwise, you could ftp it to
ftp://ftp.consultronics.com/incoming and send me mail to let me know
it's there.  (That directory is write-only and the file will
automatically be moved out of the ftp directory tree, but I'll be able
to retrieve it.)

-- 
| G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
|  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
|  http://wecanstopspam.org in signatures helps fight junk email. |




More information about the Bogofilter mailing list