Including html-tag contents may be unnecessary

Tony L. Svanstrom tony at moon.pp.se
Mon May 12 15:44:26 CEST 2003


On Mon, 12 May 2003 the voices made David Relson write:

DR> At 08:07 AM 5/12/03, Tony L. Svanstrom wrote:

DR> JavaScript has a limited number of keywords which will come to be
DR> recognized.  Function and variable identifiers can be whatever the spammer
DR> wants.  Again, new and different identifiers will be treated just like new
DR> and different words and bogofilter will deal with them.

 Well... you could write HTML+CSS+JavaScript so that if you ignore the
JavaScript you get one text, but the CSS+JavaScript will position characters
A-Za-z0-9._ with a background set so that those characters will cover the
"real" text.
 The result would be an innocent looking text put together using the most
common words/phrases, and then lists telling the JavaScript where to position
the characters. Those lists could consist themselfs of only the most common
words, which the JavaScript turns into positions for the CSS.
 There'd be a lot of noice though, so these spam would be quick long for a
short message.

 You'd be drowning in good tokens, with only a limited few bad ones; and the
worst part is that it'd lower the accuracy of any bayesian filter which is
learning these spam as spam.

DR> We know which headers Paul Graham thinks are important.  Which ones do you
DR> think are important?

 Different ones depending on the route the e-mails take etc, I'd like to be
able to control that myself; maybe setting a list of headers like this: To,
From, Received(1), Received(-5). Where the positive numbers are counting from
the first server that added the header, and negative numbers counting from the
last.

DR> Right now we have 1,3,5 and label them as Yes/No/Unsure.  The meanings of 2
DR> & 4 aren't given in sufficient detail.

 Sorry, I guess I didn't explain it well enough... I meant like an "incoming
score", giving bogofilter a nudge towards, or from, spaminess.

 I could do that today using different config-files with different values for
what is to be considered spam, it just would be a lot easier if I could attach
a value to the -u switch instead; telling bogofilter how mean/nice it should be
to that particular e-mail.

DR> If you'd like to write some code to implement your idea and post a patch to
DR> the list, people can try it and see how well it works for them.

 Not a C-programmer, nor do I currently have the time to become one, so unless
I can figure out a quick way to hide perlcode inside c-code I just have to keep
on complaining every now and then on this list. ;-)


-- 
  .-------------------------------------------------------------------.
  | Per scientiam ad libertatem! (Through knowledge towards freedom!) |
  `-------------------------------------------------------------------´
                   << ©1998-2003 tony at svanstrom.com >>





More information about the Bogofilter mailing list