Bogofilter to filter porn site

David Relson relson at osagesoftware.com
Wed Mar 26 13:59:56 CET 2003


At 05:01 AM 3/26/03, Cedric Foll wrote:

>Hi,
>
>I'm network/system admin and i have to avoid the visit of porn site, for
>that i'm using squidGuard with a database of porn site. But because the
>database is incomplete i have to add porn which have bean visited each
>day.
>Each day 16 000 differents site have been visited so i've dvl several
>script to help me to find porn.
>I primary does a "key-word" search on these sites (90% are false
>positive) and then, I run bogofilter on the sites which have matched (10
>% of false positive)
>
>The results are quite good, but not as good as for spam.
>How can I improve the result with bogofilter for this specific use ?
>Does anybody have also tried to use bogofilter on this way ?
>
>Regards.

Greetings Cedric,

You've undoubtedly thought more about your problem than have I, but here're 
my thoughts.

You could skip your keyword search and let bogofilter do _its_ word search 
and spam/ham classification.  Bogofilter's job is to distinguish between 
good and page messages.  With bogofilter we of good and bad as being spam 
and non-spam.  There's no reason why bogofilter can't use web pages (rather 
than email) to distinguish porn from non-porn (rather than spam/non-spam).

Using the record of sites visited, retrieve the page (using wget or lynx or 
other such program) and feed it to bogofilter.  bogofilter will do its 
normal parsing, word lookup, and spamicity calculation.  For pages that 
show up as "spam" (which means "porn" in this case), you can add the site 
to the black list.

I think you should find it pretty easy to train bogofilter to be a 
porn/non-porn classifier.

David





More information about the Bogofilter mailing list