multiple types of spam

Dave Lovelace dave at firstcomp.biz
Thu Jul 3 15:32:50 CEST 2003


John McCain wrote:
> 
> 419 letters tend to be quite long for current generation spam.  Maybe there's 
> an issue with your training corpus size.  How large is your training corpus?
> 
> On Thursday 03 July 2003 07:41 am, Dave Lovelace wrote:
> > David Relson wrote, in part:
> > > I haven't been keeping close track on Nigerian scam messages.  I know
> > > that bogofilter is catching some and missing some.  It _would_ make an
> > > interesting experiment to create wordlists to see if that's detectable.
> >
> > At times I seem to get at least one or two a day, & bogofilter never
> > seems to catch them.  I retrain with them, but I can't help wondering
> > what it is that enables them to get through so consistently.
> >
> > I admit, they aren't cookie-cutter jobs like so much spam, though.
> 

I don't know what a "419 letter" is or why it's called that.
The training corpus is pretty large, from my POV - using -u and retraining
when messages are mis-classified, so it's all available messages for
a couple of people over several months, maybe a year (as I started with
a lot of saved mail, both spam & ham).  I don't think I could do anything
to make it larger.

-- 
- Dave Lovelace
  dave at firstcomp.biz
  davel at cyberspace.org




More information about the Bogofilter mailing list