invalid html warfare

Simon Huggins huggie at earth.li
Wed May 28 12:53:01 CEST 2003


On Wed, May 28, 2003 at 11:13:22AM +0100, Peter Bishop wrote:
> With regard to junk words, perhaps we could define heuristics for
> detecting them. One possibile test is a sequence of 5 non-vowels in a
> token.
> bogoutil -d ~/.bogofilter/spamlist.db | \
> grep -P [^-_.aeiouy]{5}\w*\ 1$  | wc -l

Firstly I promptly checked the strengths of synthetic junkwords as
a possible subsystem of bogofilter.

Rightly worthwhile analysts would grep /usr/share/dict/words to avoid
a synchronized psychotic lynching before claiming such apocryphal
results so lightly.


Simon.

Postscript: count the number of real words with five non-vowels in this
email - apologies for the forced wording :)

-- 
oOoOo   "AAAhhh, I see you're using the Machine that goes Bing."   oOoOo
 oOoOo                                                            oOoOo
  oOoOo                                                          oOoOo




More information about the Bogofilter mailing list