some questions about bogofilter 0.13.6&0.15.7

Boris 'pi' Piwinger 3.14 at logic.univie.ac.at
Tue Nov 4 12:35:27 CET 2003


Mike Lykov wrote:

> В сообщении от Вторник 04 Ноябрь 2003 14:19 Boris 'pi' Piwinger написал:

Thanks for helping me freshing up my Russin, it has been
many years since;-)

>> > How i can re-create wordlists for using it ?
>> Build them from scratch with your mail collection.
> 
> Hmm. I think best choice is use already catched spam (I put it to special 
> mailbox on server ). 

You should also have some ham.

> But rebuilding database follow to break all learning  (by hand or auto) %((

That is right. And it cleans all the wrong learning. If you
have mail collections spam and ham (which are corrected) the
have all the learnin in them.

>> > Or i can use -H option - it will be same as above ?
>> That won't be enough I believe.
> 
> On new (re-created) database i can see the same errors, what I can fixed in 
> current - with attaches, for example..

???

>> >  PARSING OPTIONS
>> > Where i must use it?
>> Either on the command line or in the config file. 
> 
> in command line with "classification" or "registration" options ? 

With everything you do.

>> It is
>> strongly recommended that you leave them alone, they might
>> go soon and the defaults will most likely work best.
> 
>  When I exec 'bogofilter -Q' i do not see defaults for this "parsing options" 
> %(
> In default confilg it's also missed.

Another hint not to use them;-)

> You suppose not to use it ?

Yes.

> I wanted to exec bogofilter -Ph -Pt ...

Why?

>> > --cqxox3fnlpmgjstp-- 1 20030526
>> > --TB36FDmn/VVEgNH/ 2 20030708
>> > --TB36FDmn/VVEgNH/-- 2 20030708
>> I guess the rationale is that those can be specific to spam
>> software.
> 
> I think that base64 sets of letters mostly random , isn't it?

In some cases yes, in some cases not. The latter help
identifying spam.

> What about ver 0.15.8, where 
> "	* Modified handling of mime attachments to decode rfc822 and
> 	  to discard applications and images." ? 
> What is that mean? Why discard images ?

Because bogofilter cannot understand them. But of course
they are not discarded but ignored.

> In my character of email, better way is to discard documents such as Microsoft 
> Office files rather images

Bogofilter won't read those, too.

>> > Often i see that the letter with attached file and a little piece if text
>> > (two-three words) classified as spam, but attach can't be spam!
>> Why not? I often get spam with attachments.
> 
> See above. I have real spam with little images, but I have never seen spam 
> with big office documents ;)

I have.

> But i often see false positives on such office documents!

I don't, so maybe your training wasn't that good.

> Who need to tokenize headers ?

They contain a lot of information.

> In spam received: subject: from: and other 
> mostly random - spammers try to avoid filters in MTA, but MTA is best for 
> filtering on headers! ;)

Actually, in practice they try to look real which is caught.

> If bogofilter is a content-filter, it must rely on content of letters, not on 
> random information like a headers ;)

That's by no means random.

pi





More information about the Bogofilter mailing list