Idea for improving the learning stage

Andrew aremo at
Thu Sep 6 15:23:37 CEST 2007

Hello, I would like to submit an idea which I think would improve the 
accuracy and the learning stage of any statistical spam filter.

The concept: learn where the "giveaway" is by watching user behaviour.

It basically comes down to having the filter take note of this: did the 
user need to open the email before flagging it as spam?

If the answer is "no", then concentrate your stats on the subject line 
and ignore the body (which might be full of random words used by the 
spammer to pollute the filter's database).

If the answer is "yes", the reverse applies: ignore the subject, which 
must have looked "legitimate" to the user, and concentrate on the body, 
which is what clued the user in about the email being spam.

By analyzing only the subject OR the body, you analyze only what 
actually looks like spam, thus ignoring the parts of the email that are 
there to deceive.

What do you think?


More information about the bogofilter-dev mailing list