spam temporality

Elijah Saxon elijah at riseup.net
Sat Jul 5 23:52:20 CEST 2003


If I was a spammer, I would maximize my bandwidth by sending spam
continually, 24 hours a day.

Our postfix graphs confirm this: spam comes at a constant rate all day
while real mail traffic peaks at noon and drops very low at night.  This
graph of spam is just based on the mail which postfix rejects for RBL
reasons (and we only use very conservative RBLs), yet at 3 AM it looks
like almost *all* mail is spam.

So how about this for an idea:

Parse the "Received" header for time values. Then bucket these to the
hour (GMT) and create a word prefixed by 'hour:'. So, this:

  Sat,  5 Jul 2003 14:32:09 -0700 (PDT)

would become:

  hour:7

Ok, so I imagine a few more spammy tokens won't make the biggest
difference, but it can't hurt. The assumption is that daytime hours would
be strongly associated with ham and late night hours strongly associated
with spam (unless you are a nightowl or have friends on other continents).

-elijah





More information about the Bogofilter mailing list