What is spam? (was: [bogofilter] ESF and redundancy)

Tue May 11 22:58:53 CEST 2004

From: "David Relson" <relson at osagesoftware.com>
> Sorry Tom, bogofilter doesn't check the contents of X-Bogosity: when
> registering messages.  There's no way to tell if the X-Bogosity line is
> legit or spoofed.  With a low thresh_update value like 0.01, which
> excludes messages scoring 0.01 and below or scoring 0.99 and above, the
> need to correct is really, really low.

If a person is sending all of their email through bogofilter, and bogofilter
strips out any X-Bogosity line before classifying, and bogofilter adds a new
X-Bogosity line before delivering, then it follows that any email in a
person's inbox has a legitimate X-Bogosity line.  If they subsequently try
to correct a classification, then bogofilter should be able to assume that
an X-Bogosity line present was in fact put there by bogofilter.  Why would
you ever use a -S or -N switch if you're not correcting a previous
registration?  You wouldn't.  Therefore, if an X-Bogosity line is present,
and the spamicity in that line is in the thresh_update range, then any -S
or -N switches should be ignored.  If the X-Bogosity line was fake, then
there would be no adverse effects from ignoring these switches anyway, since
the email hadn't passed through bogofilter a first time.

The only time I can see this not being true is if a copy of an email is
saved prior to delivery via bogofilter, and the copy is then used for
corrections.  I think this is probably an abnormal usage.  Nonetheless, if
that is the case, then there could possibly be a config variable to trust or
not trust the X-Bogosity line during corrections.

While corrections may be limited in the thresh_update range, they do occur.
Unregistering an email that was never registered may not be such a horrible
idea in the event that the classification was so completely wrong.  However,
it would be inconsistent with the theory of how the system works.  Moreover,
what if it is found that larger thresh_update values are useful?  What would
happen if a token was never seen before and now is attempted to be
decremented?

I may add this logic to bfproxy when I get a chance, unless it's going to be
added directly to bogofilter.  Bfproxy already assumes that the X-Bogosity
line is legitimate in order to make corrections.  Comparing the
thresh_update value first is trivial.

Tom