Teching bogofilter by forwarding messages

Michal Wieja mwieja at poczta.onet.pl
Fri Dec 19 17:02:19 CET 2003


On Thursday 18 December 2003 22:34, David Relson wrote:
> On Thu, 18 Dec 2003 22:39:10 +0100
> Michal Wieja <mwieja at poczta.onet.pl> wrote:
>
> ...[snip]...
>
> > 	Yes, but remember that e-mail software also puts 'FW:', 'Fwd:"
> > 	into the
> > subject line, to mark forwarded messages, some of them takes all
> > subject into brackets '[' ']'.
> >
> > 	Actually, as far as I understand bogofilter is statistical
> > 	analysis tool, so
> > in theory FW, Fwd words should occur in both spams and hams, so weight
> > of these words shouldn't have much input into final score.
> >
> > --
> > Mike
>
> Mike,
>
> Your comments about statistical analysis are correct.  Bogofilter gets a
> lot of its information from a message's headers.  Forwarding a message
> puts a new set of headers in place and it's important to remove those
> headers before passing the message to bogofilter for training.
>
> David

David,

        thanks a lot, I've forgot about removing users headers from forwarded 
e-mailis, I've prepared python script that gets attachements from forwarded 
e-mail.

        #!/usr/bin/python
        import email.Parser
        import sys

        emaildata = sys.stdin.read()
        parse = email.Parser.Parser()
        message = parse.parsestr(emaildata)
        att =  message.get_payload()
        for it in range(len(att)):
                sys.stdout.write(str(att[it]))

        I believe that this is enough.

        when users forward messages as attachement, bogofilter gets only the 
attachement with all spam headers.

--
Michal Wieja





More information about the Bogofilter mailing list