Just saw a new spam tactic
Nick Simicich
njs at scifi.squawk.com
Thu Jan 30 17:44:48 CET 2003
At 12:08 AM 2003-01-30 -0800, Chris Wilkes wrote:
>On Wed, Jan 29, 2003 at 11:40:28PM -0800, Max Rible wrote:
> > I just got a piece of spam that's full of bogus HTML tags-- lots
> > of </k> tags inserted in the middle of words. The tags will be
> > ignored by most HTML renderers, but will break up the text for
> > spam parsing.
>
>Could you post what version of bogofilter you're using? The latest one
>does include code to through out HTML comments like this:
> to<!-- -->ner cart<!-- -->ridge
>and give you what you would see in a browser, mainly:
> toner cartridge
>
>I'm not sure about bogus HTML tags though. It would be nice to get some
>sort of number representing how poorly writen an HTML page is.
I am beginning to wonder if bogofilter's actions should not be to simply
remove all tags and then break the remaining string on white
space. Nothing will allow you to reconstruct the mail that arranges parts
of words as tables (other than rendering and then interpreting by
"eyespace"), but my mailer won't render those anyway (on purpose). The
point is well made, by Max: Does it really matter whether the thing we are
looking at is split by a comment or by a tag that will be ignored? Just
pull anything out of the string that will be tokenized that is between < >
rather than <!-- -->.
--
SPAM: Trademark for spiced, chopped ham manufactured by Hormel.
spam: Unsolicited, Bulk E-mail, where e-mail can be interpreted generally
to mean electronic messages designed to be read by an individual, and it
can include Usenet, SMS, AIM, etc. But if it is not all three of
Unsolicited, Bulk, and E-mail, it simply is not spam. Misusing the term
plays into the hands of the spammers, since it causes confusion, and
spammers thrive on confusion. Spam is not speech, it is an action, like
theft, or vandalism. If you were not confused, would you patronize a spammer?
Nick Simicich - njs at scifi.squawk.com - http://scifi.squawk.com/njs.html
Stop by and light up the world!
More information about the Bogofilter
mailing list