Idea for improving the learning stage

mouss mlist.only at free.fr
Fri Sep 7 23:53:06 CEST 2007


David Relson wrote:
> On Fri, 7 Sep 2007 10:14:56 +0000 (UTC)
> Andrew wrote:
> 
>> On Thu, 6 Sep 2007 21:33:42 -0400,
>> David Relson <relson at osagesoftware.com> wrote:
>>
>>> The intelligence you suggest belongs in a script driving bogofilter.
>>> With claws-mail I have two actions "classify as spam" and "classify
>>> as ham".  These actions forward the messages to special addresses
>>> on my mail server and procmail spots the messages and passes them
>>> to a reclassify script.  The reclassify script looks at the
>>> forwarding address and the message's X-Bogosity line then invokes
>>> bogofilter with appropriate flags.  For example, since "X-Bogosity:
>>> Spam" and "forward as ham" indicates a "False Positive" bogofilter
>>> gets run with "-S -n".  Note that all the decision making is
>>> _outside_ of bogofilter.
>>
>> So how could an external script tell bogofilter to "ignore the
>> subject" or "ignore the body" ?
>>
>>
>> Regards,
>> Andrew
> 
> Bogofilter doesn't have such capabilities, nor does it need them.  If
> you want part of a message to be excluded, a copy of the message needs
> to be created without that part.  Tools that you should consider are
> formail, awk, and grep.  
> 
> formail is a very powerful tool for working with email messages.  Read
> it man page. 
> 
> grep can be used for simple exclusion tasks.  For example, to exclude
> only the subject: 
> 
>    grep -v ^Subject: < message | bogofilter ...


[body only]
Isn't "Subject" a token and that removing it will make it no more 
neutral? I mean, suppose you remove Subject from thousand spam messages, 
then "Subject" may become a ham sign, which it should not be.

[subject only]
and if you only train by subject, you will miss the spammy body tokens. 
It would be more interesting to "duplicate" the message and train 
multiple times, once with body+subject and once with subject. however, 
one should then train ham messages N times (N>=2) to avoid skewing the 
filter.






More information about the bogofilter-dev mailing list