Is bogofilter classifying algorithm "symetrical"
David Relson
relson at osagesoftware.com
Thu May 26 13:10:10 CEST 2005
On Thu, 26 May 2005 07:24:36 +0200
Marek Zachara wrote:
> On Thursday 26 of May 2005 00:51, David Relson wrote:
> >
> > The algorithm is _almost_ symmetrical. There are some special scoring
> > factors that break the symmetry. Initially the goal was to bias
> > messages towards ham scores (on the theory that false negatives are
> > preferable to false positives).
> >
>
> Thanks for the thorough answer.
> Actually i plan to set up an experiment to use the bogofilter for
> classification of unknown text files (not e-mails) so i needed to know if i
> shall expect such behaviour. If the asymmetry is not to high, it shouldn't
> matter, otherwise i'll set up two instances of bogofilter trained in the
> "opposite way" and get a result as i.e. (result1 + 1-result2)/2
>
> Marek
Using bogofilter to separate text files into two groups should work
fine. Bogofilter has some email specific behavior that may cause some
difficulty. Using "-H" switch to turn off tagging of header lines is
one thing to do. Also, bogofilter gives special treatment to lines
beginning with "From ". Changing that behavior requires a change to
lexer3.l and (possibly) other parts.
Good luck!
David
More information about the Bogofilter
mailing list