FAQ: How to train

Peter Bishop pgb at adelard.com
Wed Jul 30 20:47:14 CEST 2003


Could you train on "near to error"
i.e. if spam is near to the spam cut-off point, add it to the database
even though the classification is correct. (ditto for ham)
This might make the database less sensitive to extra messages

Might be  easier to do this woth the Robinson algorithm which has a more 
linear spamicity range.(e.g. spam cutoff=0.54, add spam if <0.60)

On 29 Jul 2003 at 11:14, Boris 'pi' Piwinger wrote:

> <p>The smaller your database the greater the risk of this
> training to have an adverse effect on the other side. When
> you train with another spam message this might make some
> other ham message look more spammish and vice versa.</p>
> 
> <p>If you use method 3 above you can compensate this effect,
> by again doing the training with your complete training
> collection (don't forget to add the new messages to that
> collection). This will add messages to the database which
> show that adverse effect on both sides until you have a new
> equilibrium.</p>


-- 
Peter Bishop 
Adelard and Centre for Software Reliability, City University
Drysdale Building, 10 Northampton Square, London, EC1V 0HB
Tel: +44-20-7490-9467, Fax: +44-20-7490-9451
pgb at adelard.com, http://www.adelard.com/
pgb at csr.city.ac.uk, http://www.city.ac.uk/





More information about the Bogofilter mailing list