Quick Question
Jamie Burns
fantasticjamieburns at hotmail.com
Fri Mar 25 14:40:29 CET 2005
Hi David/List,
Those were just example categories - I guess procmail could be used there
with some success.
Let me tell you what I am actually trying to do...
I am actually trying to classify Web Pages. What I want is a document
classifier that can A) classify Web Pages into resective classes; B)
classify Web Pages based on their likelihood of being of "value".
It may be helpful if you consider the Google News page, as an example, as it
has a number of classes of news articles (which must be automatically
classified somehow), and it also assigns "value" to some news articles (by
exposing them on the news home page to varying degrees).
I am thus looking to find a *fast* Open Source solution to this problem. I
am thinking that a filter written in C, by a respectable development team,
should give me better performance than some of the Perl/Python based systems
around (on that note - does anyone have benchmarks for the popular
classifiers?).
I certainly think I could run a web page through multiple bogofilters to
find out if a web page fits into classes in real time. The problem is that
there is obviously a point where the number of classifications would hinder
the ability to do this in real time. Going forward I do plan to have a
rather large number of classifications (the more the merrier for my
application - which isn't actually a news aggregator btw). This is why I was
hoping to find that bogofilter would classify between multiple classes and
thus maybe make the whole process a lot faster (I am not relishing the
thought of building a cluster of machines to acrt as a "classifier farm"!).
I appreciate all your thoughts - and if anyone has any other ideas or knows
of some other great classifiying software do share!
Jamie.
----- Original Message -----
From: "David N Murray" <dmurray at jsbsystems.com>
To: "Greg Louis" <glouis at dynamicro.on.ca>
Cc: <bogofilter at bogofilter.org>
Sent: Friday, March 25, 2005 1:17 AM
Subject: Re: Quick Question
> Isn't this what procmail's for? You have to tell whatever software who is
> in the [Family|Friends|Lists|Work] groups. procmail already has that
> ability with its pattern matching.
>
> On Mar 24, Greg Louis scribed:
>
>> On 20050325 (Fri) at 0021:33 -0000, Jamie Burns wrote:
>> > Hi there!
>> >
>> > Can anybody tell me if Bogofilter can divide email into one (or more?)
>> > of multiple classifications?
>>
>> Not as currently written.
>>
>> > So instead of simply [Non-Spam|Spam] I could have
>> > [Family|Friends|Lists|Work]?
>>
>> I believe you would have to undertake extensive and difficult
>> revisions, and resource consumption as well as uncertainty would
>> skyrocket by comparison with the two-class case.
>>
>> Sorry to be such a pessimist..........
>> --
>> | G r e g L o u i s | gpg public key: 0x400B1AA86D9E3E64 |
>> | http://www.bgl.nu/~glouis | (on my website or any keyserver) |
>> | http://wecanstopspam.org in signatures helps fight junk email. |
>> _______________________________________________
>> Bogofilter mailing list
>> Bogofilter at bogofilter.org
>> http://www.bogofilter.org/mailman/listinfo/bogofilter
>>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
>
_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the Bogofilter
mailing list