[PATCH] Better tagging.
michael at optusnet.com.au
michael at optusnet.com.au
Sun Sep 14 07:06:44 CEST 2003
David Relson <relson at osagesoftware.com> writes:
[...]
> > No, haven't used bogotune. My working set is about 200k messages,
> > and bogotune looked like it was going to take weeks to finish. :)
> >
> > I'm using pure defaults (mindev 0.1).
> >
> > Michael.
[...]
> With your 200k messages, bogotune could conceivable take weeks. It
> might be useful to break the 200k into 10 or 20 equal groups and then
> pick several groups and run bogotune. That would reduce the compute
> time a whole lot and it would be interesting to see if bogotune found
> the tested to be comparable, i.e. produced similar results.
That would be interesting...
Note that I'm more interested in the relative improvement than the
absolute. I don't belive that bogotune will spit out parameters
that turn an improvement into a detriment.
My task is made harder becuase I'm working with the mail for
a large set of users, not just my mail. So the 'ham' corpus
is much less distinguished than it would commonly be.
PS: Interestingly, using word-pairs gives a HUGE improvement in
accuracy. w/o using word-pairs are tokens, I get 43,705 false
negatives (in a 400k message corpus), and with word-pairs I get
27,596. Very nice.
Michael.
More information about the bogofilter-dev
mailing list