Is bogofilter classifying algorithm "symetrical"

David Relson relson at osagesoftware.com
Thu May 26 13:10:10 CEST 2005


On Thu, 26 May 2005 07:24:36 +0200
Marek Zachara wrote:

> On Thursday 26 of May 2005 00:51, David Relson wrote:
> >
> > The algorithm is _almost_ symmetrical.  There are some special scoring
> > factors that break the symmetry.  Initially the goal was to bias
> > messages towards ham scores (on the theory that false negatives are
> > preferable to false positives).
> >
> 
> Thanks for the thorough answer.
> Actually i plan to set up an experiment to use the bogofilter for 
> classification of unknown text files (not e-mails) so i needed to know if i 
> shall expect such behaviour. If the asymmetry is not to high, it shouldn't 
> matter, otherwise i'll set up two instances of bogofilter trained in the 
> "opposite way" and get a result as i.e. (result1 + 1-result2)/2
> 
> Marek

Using bogofilter to separate text files into two groups should work
fine.  Bogofilter has some email specific behavior that may cause some
difficulty.  Using "-H" switch to turn off tagging of header lines is
one thing to do.  Also, bogofilter gives special treatment to lines
beginning with "From ".  Changing that behavior requires a change to
lexer3.l and (possibly) other parts.

Good luck!

David




More information about the Bogofilter mailing list