Sorting by bogosity, looking for false positives

Bill McClain wmcclain at salamander.com
Wed Nov 10 17:56:43 CET 2004


The attached archive contains (1) spamsort.py and (2) example output.

I use this program to quickly scan the spamicity values of my spam
folders in an attempt to catch false positives. Since good mail
misclassified as spam is likely to have a lower spamicity than most
other spam, it will appear near the top of a list of spam messages
sorted by increasing spamicity.

The program recursively scans a directory tree, reads the contents of
all mailboxes, extracts the X-Bogosity lines added by bogofilter (where
they exist), sorts by increasing bogosity and for each message displays:

    bogosity
    the mailbox directory or file name
    the subject line

I've only tested it with "mh" directories and Unix "mbx" files, but it
should work with other formats supported by the Python library. The
legal values for the --format= option are:

    mbx
    mmdf
    mh
    maildir
    babyl

Access to the mailboxes is read-only, so experimentation should be
harmless. But: USE AT YOUR OWN RISK. The Python library classes seem to
silently skip non-mailbox files and directories, so having other files
in the directory tree should be ok.

Example:

    spamsort.py --format=mh ~/Mail/Spam

-Bill
-- 
Sattre Press                                    In the Quarter
http://sattre-press.com/                 by Robert W. Chambers
info at sattre-press.com         http://sattre-press.com/itq.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: spamsort.tar.gz
Type: application/x-gzip
Size: 2446 bytes
Desc: not available
URL: <http://www.bogofilter.org/pipermail/bogofilter/attachments/20041110/7cdf6d92/attachment.bin>


More information about the Bogofilter mailing list