mass processing with mutt and Fcc

Jesse Meyer meyer at btinet.net
Wed Apr 2 18:49:47 CEST 2003


On Tue, Apr 01, 2003 at 10:23:32PM +0200, Boris 'pi' Piwinger wrote:
> David Relson <relson at osagesoftware.com> wrote:
> 
> >Actually it takes extra work to recognize html tags (and comments) and 
> >throw them away.  When processing normal text, pretty much all that's kept 
> >is letters and digits and a few special characters like period, hyphen, 
> >underscore, apostrophe, etc.  It's trivially easy to apply the normal text 
> >mode to html.
> 
> The problem is that we need HTML processing to avoid the
> spammers' tricks with tags in the middle of words. So it
> would be nice to do that and also evaluate the content of
> the tags.

Wouldn't it be rather easy (although probably not very elegant) to 
make a short script that runs any html message through lynx -dump 
first, then gives it to bogofilter to analyse, and, if that succeeds, 
then passing the original message through?

-- 
        ...crying "Tekeli-li! Tekeli-li!"... ~ HPL
 icq : 34583382              |     === ascii ribbon campaign ===
 msn : dasunt at hotmail.com    |  ()  - against html mail
 yim : tsunad                |  /\  - against proprietary attachments




More information about the Bogofilter mailing list