Markup.
David Relson
relson at osagesoftware.com
Sat May 10 13:23:42 CEST 2003
At 01:03 AM 5/10/03, michael at optusnet.com.au wrote:
>David Relson <relson at osagesoftware.com> writes:
> > Michael,
> >
> > Nice results! It looks like your additional symbols _are_ of value.
> >
> > I'll see about adding your changes to bogofilter. If you don't mind,
> > I'll call the option "html_markup" and create tokens in form
> > "html:comment:4".
>
>It might be an idea to leave it at just 'markup'. I know
>that I started with just the html tags, but the next step
>is to do things like notice if the subject line has
>extended whitespace, or the email address, etc etc. Things
>that don't have much to do with html.
I'll consider it.
>The other thing I struggled with slightly was being able to
>insert tokens when the message ends. I wasn't able to find
>some place that noticed the end of an email that wasn't
>a reset point.
One of TODO items is tokenizing items within html comments. For example
"th<!--junk-->is" would return "this" and "junk". A way of doing this is
to extract the comments and process them at the end. I'll take a look and
see if I can figure out what's needed. We already have two birds to kill
with the EOF stone.
>(What I'm looking to do here is collect statistics of the course
>of an email, and at the end check them an insert appropriate tokens.
>Didn't seem easily do-able tho).
I'm sure there's a way. Currently function is_from() in lexer.c provides a
special check for "^From ". The hook needs to go in/near that function.
> > Like you, I wouldn't worry too much about it. The benefits seem
> > pretty clear and there's always the occasional message that's
> > virtually impossible to classify - even for a human. I see some
> > computer related messages, for example WinXPnews and TigerDirect, that
>
>Oh, is WinXPnews spam? I've been called it ham! *sigh*.
>(I'm using spam in this sense to mean "any auto-generated
>email that the user didn't ask for").
>
> > would be ham if directed to me. For whatever reason, they're sent to
> > my 10 yr old. Because of that, I classify them as spam.
I'm sure she never asked for it.
More information about the Bogofilter
mailing list