korean spam

Graham Wilson bob at decoy.wox.org
Thu Oct 10 03:53:44 CEST 2002


On Wed, Oct 09, 2002 at 09:34:15PM -0400, David Relson wrote:
> FWIW, I got curious the other day about a couple of bunches of korean
> spam that I had from ???.co.kr and ???.hanmail.net.  I added a "-c"

i have been getting chinese (gb2312) and japenese spam (iso-2202-jp)
spam.

> Certainly, my wordlists (both good and spam) have thousands of words
> that I can't read at all.  Offhand, I'd guess that most of those words
> qualify as "high", though some of them likely contain 1 or 2 normal
> characters.  I don't like having my lists be filled with stuff that's
> totally junk and am wondering if bogofilter should do anything about
> this.  On the other hand, they may not have any measurable effect on
> bogofilter.

i figured it was good to have those tokens in the database because i
figured they were words in chinese or japenese that would give the
message away as spam. this doesnt really seem to be happening though.

it might be more useful if the lexer had the ability to produce tokens
from messages written in eastern languages that could be used like we
use the tokens in english (and other western language) messages. like i
said, i dont feel like bogofilter is doing a good job at that.

--
gram
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 482 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20021009/8faf9104/attachment.sig>
-------------- next part --------------
For summay digest subscription: bogofilter-digest-subscribe at aotto.com


More information about the Bogofilter mailing list