Have you considered shipping (optionally) bogofilter with a pre-built token database? There are some ham and spam public corpuses avaliable, e.g. http://spamassassin.apache.org/publiccorpus/ Training bogofilter to exhaustion and shipping the resulting database would definitely help some of the new users.