Idea for improving the learning stage
Andrew
aremo at ngi.it
Thu Sep 6 15:23:37 CEST 2007
Hello, I would like to submit an idea which I think would improve the
accuracy and the learning stage of any statistical spam filter.
The concept: learn where the "giveaway" is by watching user behaviour.
It basically comes down to having the filter take note of this: did the
user need to open the email before flagging it as spam?
If the answer is "no", then concentrate your stats on the subject line
and ignore the body (which might be full of random words used by the
spammer to pollute the filter's database).
If the answer is "yes", the reverse applies: ignore the subject, which
must have looked "legitimate" to the user, and concentrate on the body,
which is what clued the user in about the email being spam.
By analyzing only the subject OR the body, you analyze only what
actually looks like spam, thus ignoring the parts of the email that are
there to deceive.
What do you think?
Regards,
Andrew
More information about the bogofilter-dev
mailing list