ignore text/plain part of multipart/alternative messages?

David Relson relson at osagesoftware.com
Wed Aug 13 01:42:19 CEST 2003


At 07:20 PM 8/12/03, Matthias Andree wrote:
>On Tue, 12 Aug 2003, David Flanagan wrote:
>
> >
> > > Would you care to send me one of those "text/plain is book excerpt,
> > > text/html is UCE" mails so I can have a look? If so, please save the
> > > whole mail to a file ("export") and zip it before you attach, so it
> > > doesn't get filtered out here. You can omit non-MIME headers if you want
> > > to protect your privacy, all I need are MIME-Version: and Content-*:
> > > headers.
> >
> > I use Emacs RMAIL, which is not mime-aware, so attaching zipped files is
> > a pain.  Instead, I've posted a sample spam here:
> >
> >     http://www.djf.net/spam.txt
>
>This document scored at 0.500004 on the first run, and after training on
>that message (bogofilter -s), it scored 1.000000 -- so apparently
>bogofilter learns this quite fast. On my machine, it has some distinct
>options, among them (top 7). This is with a pre-0.14.4 CVS version of
>bogofilter (0.14.4 is hardly different) with mostly default settings and
>huge data base that didn't have such spam yet.
>
>"Thank"                              9  0.000000  0.002283  0.999143 +
>"seminars"                          10  0.000000  0.002536  0.999229 +
>"Verdana"                           16  0.000000  0.004058  0.999518 +
>"Helvetica"                         21  0.000000  0.005326  0.999633 +
>"Lines"                             41  0.000000  0.010398  0.999812 +
>"valued"                            77  0.000000  0.019528  0.999900 +
>"payment"                          188  0.000000  0.047679  0.999959 +
>
>Can you post the output of "bogofilter -vvv <spam.txt", too?

Actually, excluding lines ending with "-" would be good (as those are the 
ones excluded because of min_dev).  With the exclusion, you'd get all the 
used tokens (whose lines end with "+") and you'd get the headers and 
trailers, too.






More information about the Bogofilter mailing list