Markup.

David Relson relson at osagesoftware.com
Sat May 10 13:23:42 CEST 2003


At 01:03 AM 5/10/03, michael at optusnet.com.au wrote:

>David Relson <relson at osagesoftware.com> writes:
> > Michael,
> >
> > Nice results!  It looks like your additional symbols _are_ of value.
> >
> > I'll see about adding your changes to bogofilter.  If you don't mind,
> > I'll call the option "html_markup" and create tokens in form
> > "html:comment:4".
>
>It might be an idea to leave it at just 'markup'. I know
>that I started with just the html tags, but the next step
>is to do things like notice if the subject line has
>extended whitespace, or the email address, etc etc. Things
>that don't have much to do with html.

I'll consider it.

>The other thing I struggled with slightly was being able to
>insert tokens when the message ends. I wasn't able to find
>some place that noticed the end of an email that wasn't
>a reset point.

One of TODO items is tokenizing items within html comments.  For example 
"th<!--junk-->is" would return "this" and "junk".  A way of doing this is 
to extract the comments and process them at the end.  I'll take a look and 
see if I can figure out what's needed.  We already have two birds to kill 
with the EOF stone.

>(What I'm looking to do here is collect statistics of the course
>of an email, and at the end check them an insert appropriate tokens.
>Didn't seem easily do-able tho).

I'm sure there's a way.  Currently function is_from() in lexer.c provides a 
special check for "^From ".  The hook needs to go in/near that function.

> > Like you, I wouldn't worry too much about it.  The benefits seem
> > pretty clear and there's always the occasional message that's
> > virtually impossible to classify - even for a human.  I see some
> > computer related messages, for example WinXPnews and TigerDirect, that
>
>Oh, is WinXPnews spam? I've been called it ham! *sigh*.
>(I'm using spam in this sense to mean "any auto-generated
>email that the user didn't ask for").
>
> > would be ham if directed to me.  For whatever reason, they're sent to
> > my 10 yr old.  Because of that, I classify them as spam.

I'm sure she never asked for it.





More information about the Bogofilter mailing list