Quick Question

Hui Zhou zhouhui at wam.umd.edu
Fri Mar 25 03:26:06 CET 2005


On Thu, Mar 24, 2005 at 07:45:18PM -0500, Greg Louis wrote:
>On 20050325 (Fri) at 0021:33 -0000, Jamie Burns wrote:
>> Hi there!
>>
>> Can anybody tell me if Bogofilter can divide email into one (or more?) of multiple classifications?

Interesting. I am thinking in the same direction.
>
>> So instead of simply [Non-Spam|Spam] I could have [Family|Friends|Lists|Work]?
>
>I believe you would have to undertake extensive and difficult
>revisions, and resource consumption as well as uncertainty would
>skyrocket by comparison with the two-class case.

Essentially one need presort his email archive into multiple
categories and have bogofilter register token probability in each
classes. Just by thought experimenting :), I don't see many obstacls
in extending bogofilter's current ability into multi-classification.

As for uncertainty, it is mostly determined by the volume of archive
and charasteristics of mails in each classes. For many classes, I see
its uncertainty can be very small even with limited learning. For
example, family emails coming from limited persons and often refer to
relative's name, and Work emails refer to a few terminologies again
and again. As for resources, if we treat every classes of email as
spam, the resouces might very likely skyrocket, as spams are so
diversive and dynamic evolving, one has to learn through thousands of
spams to reach certain uncertainty level. But for many other specific
classifications, I find they often very characteristic (typically,
the sender just don't even try to defeat the filter as spammers do) I
expect one just need learn through a limited archive and may require
much less tweaking as we did to the spam filtering. So, I won't
readily accept the argument that the resources will skyrocket without
actual implementing it and benchtest it. That is on learning process.

On the filtering process, the bogofilter have to calculated the
probability of each classes, so I see the overhead will increase
linearly as the number of classes, not likely to skyrocketing at least
for personal useages.

--
Hui Zhou
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list