Quick Question

Jamie Burns fantasticjamieburns at hotmail.com
Fri Mar 25 14:40:29 CET 2005


Hi David/List,

Those were just example categories - I guess procmail could be used there 
with some success.

Let me tell you what I am actually trying to do...

I am actually trying to classify Web Pages. What I want is a document 
classifier that can A) classify Web Pages into resective classes; B) 
classify Web Pages based on their likelihood of being of "value".

It may be helpful if you consider the Google News page, as an example, as it 
has a number of classes of news articles (which must be automatically 
classified somehow), and it also assigns "value" to some news articles (by 
exposing them on the news home page to varying degrees).

I am thus looking to find a *fast* Open Source solution to this problem. I 
am thinking that a filter written in C, by a respectable development team, 
should give me better performance than some of the Perl/Python based systems 
around (on that note - does anyone have benchmarks for the popular 
classifiers?).

I certainly think I could run a web page through multiple bogofilters to 
find out if a web page fits into classes in real time. The problem is that 
there is obviously a point where the number of classifications would hinder 
the ability to do this in real time. Going forward I do plan to have a 
rather large number of classifications (the more the merrier for my 
application - which isn't actually a news aggregator btw). This is why I was 
hoping to find that bogofilter would classify between multiple classes and 
thus maybe make the whole process a lot faster (I am not relishing the 
thought of building a cluster of machines to acrt as a "classifier farm"!).

I appreciate all your thoughts - and if anyone has any other ideas or knows 
of some other great classifiying software do share!

Jamie.

----- Original Message ----- 
From: "David N Murray" <dmurray at jsbsystems.com>
To: "Greg Louis" <glouis at dynamicro.on.ca>
Cc: <bogofilter at bogofilter.org>
Sent: Friday, March 25, 2005 1:17 AM
Subject: Re: Quick Question


> Isn't this what procmail's for?  You have to tell whatever software who is
> in the [Family|Friends|Lists|Work] groups.  procmail already has that
> ability with its pattern matching.
>
> On Mar 24, Greg Louis scribed:
>
>> On 20050325 (Fri) at 0021:33 -0000, Jamie Burns wrote:
>> > Hi there!
>> >
>> > Can anybody tell me if Bogofilter can divide email into one (or more?) 
>> > of multiple classifications?
>>
>> Not as currently written.
>>
>> > So instead of simply [Non-Spam|Spam] I could have 
>> > [Family|Friends|Lists|Work]?
>>
>> I believe you would have to undertake extensive and difficult
>> revisions, and resource consumption as well as uncertainty would
>> skyrocket by comparison with the two-class case.
>>
>> Sorry to be such a pessimist..........
>> --
>> | G r e g  L o u i s         | gpg public key: 0x400B1AA86D9E3E64 |
>> |  http://www.bgl.nu/~glouis |   (on my website or any keyserver) |
>> |  http://wecanstopspam.org in signatures helps fight junk email. |
>> _______________________________________________
>> Bogofilter mailing list
>> Bogofilter at bogofilter.org
>> http://www.bogofilter.org/mailman/listinfo/bogofilter
>>
> _______________________________________________
> Bogofilter mailing list
> Bogofilter at bogofilter.org
> http://www.bogofilter.org/mailman/listinfo/bogofilter
> 

_______________________________________________
Bogofilter mailing list
Bogofilter at bogofilter.org
http://www.bogofilter.org/mailman/listinfo/bogofilter



More information about the Bogofilter mailing list