hitzinger at phobos.fphil.uniba.sk
Thu Mar 4 04:22:27 EST 2004
On Thu, 3 Mar 2004, Tom Anderson wrote:
> When I brought up this subject about two weeks ago in my "headers"
> discussion, I advocated bogofilter ignoring certain headers and
> emphasizing others. I've since changed my mind. Bogofilter is, and
> should remain, a statistical filter without adhoc heuristics. Instead,
> I now believe that any massaging of headers should be done prior to
> bogofilter, such that bogofilter is just a highly tuned member of an
> email assembly line which happens to include other steps.
I'd like bogogilter to learn and rule based on messages without headers
(only Subject preserved), but then I need the _original_ message to get
the X-Bogosity header. Like
cat message | bogofilter -p --only-subject | maildrop
In your scenario, I'd need to "fork" the message, strip one copy and let
bogofilter chew on it; then somehow get the result back and add the
header to original.
bogofilter already does this "fork" internaly (has to keep the whole
message while content is checked, then adds the header and outputs
original message - the -p option) so adding the header filter just before
the data get sliced into tokens seems a better approach than creating all
this again outside of bogofilter.
This applies to any pre-filtering, so providing a hook for such filter may
be nice, although it means running another process - internal filter would
> These other steps may be SpamAssassin, virus scanners, procmail recipes,
> etc. Bogofilter should not try to be the end-all-be-all solution for
> filtering email, but just a tool that may be used toward that end.
> This is the Unix way.
> In this vein, I'm currently building a program which will strip out
> x-headers, dates, etc., and add emphasis to important bits. This will
> sit right in front of bogofilter. You could use SpamAssassin or other
> rule-based filters in a similar way. But adding this directly to
> bogofilter will just clutter the code and the purpose of the project.
> That is the Microsoft way, which I assume we want to avoid. Let's push
> feature creep into seperate sub-projects.
So do you deliver messages with headers stripped, emphasis added and
suchlike, or how have you overcame this issue? I'm not saying you're doing
something bad, I'm just hoping maybe you found an easy way around this.
More information about the Bogofilter