multiple types of spam
Dave Lovelace
dave at firstcomp.biz
Thu Jul 3 15:32:50 CEST 2003
John McCain wrote:
>
> 419 letters tend to be quite long for current generation spam. Maybe there's
> an issue with your training corpus size. How large is your training corpus?
>
> On Thursday 03 July 2003 07:41 am, Dave Lovelace wrote:
> > David Relson wrote, in part:
> > > I haven't been keeping close track on Nigerian scam messages. I know
> > > that bogofilter is catching some and missing some. It _would_ make an
> > > interesting experiment to create wordlists to see if that's detectable.
> >
> > At times I seem to get at least one or two a day, & bogofilter never
> > seems to catch them. I retrain with them, but I can't help wondering
> > what it is that enables them to get through so consistently.
> >
> > I admit, they aren't cookie-cutter jobs like so much spam, though.
>
I don't know what a "419 letter" is or why it's called that.
The training corpus is pretty large, from my POV - using -u and retraining
when messages are mis-classified, so it's all available messages for
a couple of people over several months, maybe a year (as I started with
a lot of saved mail, both spam & ham). I don't think I could do anything
to make it larger.
--
- Dave Lovelace
dave at firstcomp.biz
davel at cyberspace.org
More information about the Bogofilter
mailing list