New software uploaded [was: Problems with Asian Spam]

David Relson relson at osagesoftware.com
Wed Nov 22 13:54:27 CET 2006


On Wed, 22 Nov 2006 07:41:30 -0500
dhottinger at harrisonburg.k12.va.us wrote:

> Arrrrgh, I refuse to live with spam.  Must be an answer.  The one  
> thats got me concerned are the spams with the inline images and the  
> random text behind the images.  Bogofilter doesnt seem to catch
> those. Furthermore, Im not so sure that the random text (usually
> snippets from some book) seem to be skewing my wordlist.  Im getting
> a few more messages caught as spam that shouldnt be.  Although all of
> these are ticket reminders, etc.  Which shouldnt be sent to my mail
> server by users anyway.  If anyone has a good way to kill these spam
> messages that would be great.

My main source of "unsures" is asian spam sent to a mailing list.  

My main group of false negatives is messages titled "New software
uploaded by ... on ...date...time..."  

The software messages have a hunk of software related text.  I've
already received 70 of them this month.  Anybody else seeing these?

I suspect part of bogofilter's slowness in learning these are spam
is that my wordlist has approx 500,000 messages in it and this
causes learning to be slow.  

I'm thinking of adding a "--scale" option to bogoutil that would allow
counts to be scaled.  For example, scaling to 10,000 would scale counts
from 1...N to 1..10000. 

Whether this helps can be tested by registering a bunch of false
negatives with old wordlist and again with scaled wordlist and seeing
if messages scores are more appropriate.

David





More information about the Bogofilter mailing list