histogram of wordlist.db
David Relson
relson at osagesoftware.com
Sat Jan 3 15:26:00 CET 2004
Greetings,
Have you ever wondered what it would look like if you had a histogram of
the spamicity scores of all the tokens in your wordlist? Mine looks
like:
score count pct histogram
0.00 545757 47.13 ################################################
0.05 3099 0.27 #
0.10 3054 0.26 #
0.15 3128 0.27 #
0.20 4015 0.35 #
0.25 2112 0.18 #
0.30 4395 0.38 #
0.35 5326 0.46 #
0.40 2122 0.18 #
0.45 3178 0.27 #
0.50 2093 0.18 #
0.55 10681 0.92 #
0.60 2509 0.22 #
0.65 3163 0.27 #
0.70 6122 0.53 #
0.75 4926 0.43 #
0.80 3891 0.34 #
0.85 4324 0.37 #
0.90 5004 0.43 #
0.95 539119 46.56 ################################################
tot 1158018
hapaxes: ham 359544 (31.05%), spam 376536 (32.52%)
pure: ham 542992 (46.89%), spam 535376 (46.23%)
The numbers at the end are counts of tokens that appear with counts 0/1
or 1/0 (also known as hapaxes) and counts that trained solely from ham
or spam messages, i.e. have counts of h/0 or 0/s.
The attached patch, applied to 0.16.0, will enable the feature. To use
it, run "bogoutil -H /your/bogofilter/dir"
Enjoy!
David
--
David Relson Osage Software Systems, Inc.
relson at osagesoftware.com Ann Arbor, MI 48103
www.osagesoftware.com tel: 734.821.8800
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch.bogohist.0103
Type: application/octet-stream
Size: 7607 bytes
Desc: not available
URL: <https://www.bogofilter.org/pipermail/bogofilter-dev/attachments/20040103/8061b0da/attachment.obj>
More information about the bogofilter-dev
mailing list