cannot filter virus letters

Tom Anderson tanderson at orderamidchaos.com
Thu Jan 29 23:22:51 CET 2009


If indeed it's mostly viruses, I would recommend putting a virus scanner 
such as ClamAV in front of Bogofilter in your tool chain.  That has been 
pretty effective for me.

However, I also developed "spamitarium" for exactly this case... very 
limited bodies putting extra emphasis on headers.  Spamitarium basically 
performs SpamAssassin type of analysis, but instead of generating a 
"score", it removes unnecessary, questionable noise and it generates 
extra tokens which makes Bogofilter more effective.

To try it out, download it here:
http://orderamidchaos.com/bogofilter/spamitarium

I've had a lot of success in eliminating the type of spams you describe 
with this script in front of Bogofilter.

Tom


Dmitry wrote:
>  On Суббота 24 января 2009, Tom Anderson wrote:
>> Dmitry wrote:
>>> After training with the command `bogofilter -s < virus-letter`
>>> spamicity is still very low to be identified as spam. I repeat
>>> training with similar letters (different subject, different document
>>> name in the attachment), but nothing helps to stop this kind of spam.
>>>
>>> This is the output of the command `bogofilter -vvv`:
>>>
>>> X-Bogosity: Unsure, tests=bogofilter, spamicity=0.519097,
>>> version=1.1.5 n    pgood     pbad      fw     U "document"            
>>>                2  0.021739  0.000065  0.007563 + "rcvd:lovepresent.ru"
>>>                90  0.500000  0.004387  0.008798 +
>> Looks like you've got too many friends over at lovepresent.ru!
>>
>> I prefer to do "exhaustive" training, which means to keep training the
>> same spam over and over again until it classifies as spammy.  Then
>> you'll be assured not to receive one too similar again.
> 
> Sorry, exhaustive training doesn't change anything in my case. Spamicity 
> value is still less than 0.52. Tuning robx/robs gives me strange results. 
> Some good letters become spammy after that. I think the algorithm has to 
> be changed somehow for small letters with a few words in the mesage body. 
> Otherwise, hammy headers always  get greater value and never let the 
> spamicity score to be high enough.
> 




More information about the Bogofilter mailing list