cannot filter virus letters
Tom Anderson
tanderson at orderamidchaos.com
Thu Jan 29 23:22:51 CET 2009
If indeed it's mostly viruses, I would recommend putting a virus scanner
such as ClamAV in front of Bogofilter in your tool chain. That has been
pretty effective for me.
However, I also developed "spamitarium" for exactly this case... very
limited bodies putting extra emphasis on headers. Spamitarium basically
performs SpamAssassin type of analysis, but instead of generating a
"score", it removes unnecessary, questionable noise and it generates
extra tokens which makes Bogofilter more effective.
To try it out, download it here:
http://orderamidchaos.com/bogofilter/spamitarium
I've had a lot of success in eliminating the type of spams you describe
with this script in front of Bogofilter.
Tom
Dmitry wrote:
> On Суббота 24 января 2009, Tom Anderson wrote:
>> Dmitry wrote:
>>> After training with the command `bogofilter -s < virus-letter`
>>> spamicity is still very low to be identified as spam. I repeat
>>> training with similar letters (different subject, different document
>>> name in the attachment), but nothing helps to stop this kind of spam.
>>>
>>> This is the output of the command `bogofilter -vvv`:
>>>
>>> X-Bogosity: Unsure, tests=bogofilter, spamicity=0.519097,
>>> version=1.1.5 n pgood pbad fw U "document"
>>> 2 0.021739 0.000065 0.007563 + "rcvd:lovepresent.ru"
>>> 90 0.500000 0.004387 0.008798 +
>> Looks like you've got too many friends over at lovepresent.ru!
>>
>> I prefer to do "exhaustive" training, which means to keep training the
>> same spam over and over again until it classifies as spammy. Then
>> you'll be assured not to receive one too similar again.
>
> Sorry, exhaustive training doesn't change anything in my case. Spamicity
> value is still less than 0.52. Tuning robx/robs gives me strange results.
> Some good letters become spammy after that. I think the algorithm has to
> be changed somehow for small letters with a few words in the mesage body.
> Otherwise, hammy headers always get greater value and never let the
> spamicity score to be high enough.
>
More information about the Bogofilter
mailing list