Sorting by bogosity, looking for false positives
Bill McClain
wmcclain at salamander.com
Wed Nov 10 17:56:43 CET 2004
The attached archive contains (1) spamsort.py and (2) example output.
I use this program to quickly scan the spamicity values of my spam
folders in an attempt to catch false positives. Since good mail
misclassified as spam is likely to have a lower spamicity than most
other spam, it will appear near the top of a list of spam messages
sorted by increasing spamicity.
The program recursively scans a directory tree, reads the contents of
all mailboxes, extracts the X-Bogosity lines added by bogofilter (where
they exist), sorts by increasing bogosity and for each message displays:
bogosity
the mailbox directory or file name
the subject line
I've only tested it with "mh" directories and Unix "mbx" files, but it
should work with other formats supported by the Python library. The
legal values for the --format= option are:
mbx
mmdf
mh
maildir
babyl
Access to the mailboxes is read-only, so experimentation should be
harmless. But: USE AT YOUR OWN RISK. The Python library classes seem to
silently skip non-mailbox files and directories, so having other files
in the directory tree should be ok.
Example:
spamsort.py --format=mh ~/Mail/Spam
-Bill
--
Sattre Press In the Quarter
http://sattre-press.com/ by Robert W. Chambers
info at sattre-press.com http://sattre-press.com/itq.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spamsort.tar.gz
Type: application/x-gzip
Size: 2446 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20041110/7cdf6d92/attachment.bin>
More information about the Bogofilter
mailing list