html comment processing
David Relson
relson at osagesoftware.com
Sun Mar 30 17:12:31 CEST 2003
At 09:58 AM 3/30/03, Greg Louis wrote:
>On 20030330 (Sun) at 0912:55 -0500, David Relson wrote:
> > At 07:58 AM 3/30/03, Greg Louis wrote:
> >
> > >and make sure there's an improvement. I've got two more s/mindev scans
> > >going at present, but after that I can find clock cycles for this
> > >purpose.
>
> > Of course, an experiment would be of value. Can you crank up your fast
> > machine?
> >
>It should be finished the current run by Tuesday night, I hope. What I
>don't know is whether I've actually got spams in the kitty that contain
>the bogus tags, nor how many. It would be convenient to create a
>corpus consisting of nothing else, split it between training and test
>in the same proportion as the rest of the spam, and then see how well
>the strict processing measures up against the loose. Has anyone a
>supply of these that they could zip up for me to fetch with anon ftp?
>
>I too found 0.10.1 to work well, but the population of spam has changed
>somewhat since then (more circumvention attempts), so I don't know how
>relevant that finding is.
It's easy to convert valid comments to the other kind:
sed s at -->@>@ < valid > invalid
If you only want some of the tags changed, it's a bit harder to do.
More information about the Bogofilter
mailing list