New software uploaded [was: Problems with Asian Spam]

Nigel Henry cave.dnb at tiscali.fr
Wed Nov 22 21:15:47 CET 2006


On Wednesday 22 November 2006 17:40, Nigel Henry wrote:
> On Wednesday 22 November 2006 13:54, David Relson wrote:
> > On Wed, 22 Nov 2006 07:41:30 -0500
> >
> > dhottinger at harrisonburg.k12.va.us wrote:
> > > Arrrrgh, I refuse to live with spam.  Must be an answer.  The one
> > > thats got me concerned are the spams with the inline images and the
> > > random text behind the images.  Bogofilter doesnt seem to catch
> > > those. Furthermore, Im not so sure that the random text (usually
> > > snippets from some book) seem to be skewing my wordlist.  Im getting
> > > a few more messages caught as spam that shouldnt be.  Although all of
> > > these are ticket reminders, etc.  Which shouldnt be sent to my mail
> > > server by users anyway.  If anyone has a good way to kill these spam
> > > messages that would be great.
> >
> > My main source of "unsures" is asian spam sent to a mailing list.
> >
> > My main group of false negatives is messages titled "New software
> > uploaded by ... on ...date...time..."
> >
> > The software messages have a hunk of software related text.  I've
> > already received 70 of them this month.  Anybody else seeing these?
>
> Hi David. I've been getting these turning up in the unsure box, and have
> been training with them. I'm not sure if any are now being correctly
> identified as spam, as I've had bogofilter sending the spam directly to the
> wastebin for a while, but I've just recreated a spamcheck folder to check
> it before trashing it. I'll see what the results are in the next few days.
> My wordlist only has 1824 spam by the way.
>
> > I suspect part of bogofilter's slowness in learning these are spam
> > is that my wordlist has approx 500,000 messages in it and this
> > causes learning to be slow.
> >
> > I'm thinking of adding a "--scale" option to bogoutil that would allow
> > counts to be scaled.  For example, scaling to 10,000 would scale counts
> > from 1...N to 1..10000.
> >
> > Whether this helps can be tested by registering a bunch of false
> > negatives with old wordlist and again with scaled wordlist and seeing
> > if messages scores are more appropriate.
> >
> > David
>
> The worst spams I've been getting recently are those sent to mailing lists
> that I'm on. The worst being the video4linux list. Most end up in the
> unsure box now, but I'm still getting the odd one in the inbox.
>
> Using bogofilter-1.0.2 on FC2, and directly processing mail downloaded to
> Kmail.
>
> Nigel.

Hi David. Just an update. I've just received one of these "new software 
uploaded" straight into my new "spamcheck" box, so since training, this one 
at least is ending up in the spam box, rather than the unsure. I don't know 
how much use this is, but bogofilter seems to be doing it's job.

Nigel.



More information about the Bogofilter mailing list