spam conference report

Matthias Andree matthias.andree at gmx.de
Mon Jan 20 00:05:12 CET 2003


Gyepi SAM <gyepi at praxis-sw.com> writes:

> One way I can think of is to cluster all proximate words into a single database value
> whose key is some root of all the words, perhaps the stem. Presuming that the list is not too
> long, we still maintain order n log n. So to look up a word:

"stem" pretty much sounds like "get aware of the language" and clearly
heads the artificial intelligence direction.

How would you define a stem? the pure-alphanumerical part?

> 1. compute the word's stem
> 2. use the stem as a lookup into the database
> 3. get back a list of words and their counts.
> 4. walk the list, looking for an exact match
> 5. perhaps if no exact match is found, use the first word's count.

Other than that, this gets difficult if someone deliberately misspells a
word early.

-- 
Matthias Andree




More information about the bogofilter-dev mailing list