Teching bogofilter by forwarding messages
Michal Wieja
mwieja at poczta.onet.pl
Fri Dec 19 17:02:19 CET 2003
On Thursday 18 December 2003 22:34, David Relson wrote:
> On Thu, 18 Dec 2003 22:39:10 +0100
> Michal Wieja <mwieja at poczta.onet.pl> wrote:
>
> ...[snip]...
>
> > Yes, but remember that e-mail software also puts 'FW:', 'Fwd:"
> > into the
> > subject line, to mark forwarded messages, some of them takes all
> > subject into brackets '[' ']'.
> >
> > Actually, as far as I understand bogofilter is statistical
> > analysis tool, so
> > in theory FW, Fwd words should occur in both spams and hams, so weight
> > of these words shouldn't have much input into final score.
> >
> > --
> > Mike
>
> Mike,
>
> Your comments about statistical analysis are correct. Bogofilter gets a
> lot of its information from a message's headers. Forwarding a message
> puts a new set of headers in place and it's important to remove those
> headers before passing the message to bogofilter for training.
>
> David
David,
thanks a lot, I've forgot about removing users headers from forwarded
e-mailis, I've prepared python script that gets attachements from forwarded
e-mail.
#!/usr/bin/python
import email.Parser
import sys
emaildata = sys.stdin.read()
parse = email.Parser.Parser()
message = parse.parsestr(emaildata)
att = message.get_payload()
for it in range(len(att)):
sys.stdout.write(str(att[it]))
I believe that this is enough.
when users forward messages as attachement, bogofilter gets only the
attachement with all spam headers.
--
Michal Wieja
More information about the Bogofilter
mailing list