some questions about bogofilter 0.13.6&0.15.7

Mike Lykov combr at vesna.ru
Tue Nov 4 12:24:10 CET 2003


В сообщении от Вторник 04 Ноябрь 2003 14:19 Boris 'pi' Piwinger написал:

> > How i can re-create wordlists for using it ?
> Build them from scratch with your mail collection.

Hmm. I think best choice is use already catched spam (I put it to special 
mailbox on server ). 
But rebuilding database follow to break all learning  (by hand or auto) %((

> > Or i can use -H option - it will be same as above ?
> That won't be enough I believe.

On new (re-created) database i can see the same errors, what I can fixed in 
current - with attaches, for example..

> >  PARSING OPTIONS
> > Where i must use it?
> Either on the command line or in the config file. 

in command line with "classification" or "registration" options ? 
When updating wordlists or when classifying letter ?

It is
> strongly recommended that you leave them alone, they might
> go soon and the defaults will most likely work best.

 When I exec 'bogofilter -Q' i do not see defaults for this "parsing options" 
%(
In default confilg it's also missed.
You suppose not to use it ?

I wanted to exec bogofilter -Ph -Pt ...

> > --cqxox3fnlpmgjstp-- 1 20030526
> > --TB36FDmn/VVEgNH/ 2 20030708
> > --TB36FDmn/VVEgNH/-- 2 20030708
> I guess the rationale is that those can be specific to spam
> software.

I think that base64 sets of letters mostly random , isn't it?
What about ver 0.15.8, where 
"	* Modified handling of mime attachments to decode rfc822 and
	  to discard applications and images." ? 
What is that mean? Why discard images ?
In my character of email, better way is to discard documents such as Microsoft 
Office files rather images - some spam letters contain image where spammer 
write his information (to avoid content-filtering).

> > Often i see that the letter with attached file and a little piece if text
> > (two-three words) classified as spam, but attach can't be spam!
> Why not? I often get spam with attachments.

See above. I have real spam with little images, but I have never seen spam 
with big office documents ;)
But i often see false positives on such office documents!

> > I think bogofilter must not classify attaches at all! (same about headers
> > .. %)
> This has been discussed. It shows that it is useful. Like
> getting all those viruses, images etc.

Right, i subscribe thus maillist to discuss ;)

Who need to tokenize headers ? In spam received: subject: from: and other 
mostly random - spammers try to avoid filters in MTA, but MTA is best for 
filtering on headers! ;)
If bogofilter is a content-filter, it must rely on content of letters, not on 
random information like a headers ;)

-- 
Mike
registered linux user #315334
jabber id: combr at jabber.ru




More information about the Bogofilter mailing list