junk test

John McCain jmccain at layer3al.com
Wed May 28 21:30:07 CEST 2003


Fascinating.  I was able to confirm this by creating a test message and then 
adding a number of junk tokens to it.  The spamicity score was unchanged.

If I understand this situation correctly, then it would be possible to wash 
out all single tokens in the database with absolutely no impact on accuracy, 
assuming that no statistically significant token would repeat itself in X 
period of time.  Does this sound reasonable?


On Wednesday 28 May 2003 02:03 pm, David Relson wrote:
> At 02:48 PM 5/28/03, John McCain wrote:
> >So is the natural behavior of Bogofilter going to be to tend to increase
> >spamminess score based on the number of junk tokens?
>
> John,
>
> A new, never before seen token, gets score 0.415 (from the robx
> parameter).  Tokens within 0.100 (min_dev) of 0.500 (EVEN_ODDS) are not
> included in the score.  So "webwehg" probably doesn't affect the score of
> _this_ message, but "asdf" might have an effect.
>
> If I send you a second copy of this message, "webwebg" will be recognized
> as hammish (I presume).
>
> Junk tokens will increase database size.  Whenever they're reused, they'll
> be recognized.
>
> David
>
>
> ---------------------------------------------------------------------
> FAQ: http://bogofilter.sourceforge.net/bogofilter-faq.html
> To unsubscribe, e-mail: bogofilter-unsubscribe at aotto.com
> For summary digest subscription: bogofilter-digest-subscribe at aotto.com
> For more commands, e-mail: bogofilter-help at aotto.com





More information about the Bogofilter mailing list