spam conference report
Matthias Andree
matthias.andree at gmx.de
Mon Jan 20 00:05:12 CET 2003
Gyepi SAM <gyepi at praxis-sw.com> writes:
> One way I can think of is to cluster all proximate words into a single database value
> whose key is some root of all the words, perhaps the stem. Presuming that the list is not too
> long, we still maintain order n log n. So to look up a word:
"stem" pretty much sounds like "get aware of the language" and clearly
heads the artificial intelligence direction.
How would you define a stem? the pure-alphanumerical part?
> 1. compute the word's stem
> 2. use the stem as a lookup into the database
> 3. get back a list of words and their counts.
> 4. walk the list, looking for an exact match
> 5. perhaps if no exact match is found, use the first word's count.
Other than that, this gets difficult if someone deliberately misspells a
word early.
--
Matthias Andree
More information about the bogofilter-dev
mailing list