some questions about bogofilter 0.13.6&0.15.7
Boris 'pi' Piwinger
3.14 at logic.univie.ac.at
Tue Nov 4 12:35:27 CET 2003
Mike Lykov wrote:
> В сообщении от Вторник 04 Ноябрь 2003 14:19 Boris 'pi' Piwinger написал:
Thanks for helping me freshing up my Russin, it has been
many years since;-)
>> > How i can re-create wordlists for using it ?
>> Build them from scratch with your mail collection.
>
> Hmm. I think best choice is use already catched spam (I put it to special
> mailbox on server ).
You should also have some ham.
> But rebuilding database follow to break all learning (by hand or auto) %((
That is right. And it cleans all the wrong learning. If you
have mail collections spam and ham (which are corrected) the
have all the learnin in them.
>> > Or i can use -H option - it will be same as above ?
>> That won't be enough I believe.
>
> On new (re-created) database i can see the same errors, what I can fixed in
> current - with attaches, for example..
???
>> > PARSING OPTIONS
>> > Where i must use it?
>> Either on the command line or in the config file.
>
> in command line with "classification" or "registration" options ?
With everything you do.
>> It is
>> strongly recommended that you leave them alone, they might
>> go soon and the defaults will most likely work best.
>
> When I exec 'bogofilter -Q' i do not see defaults for this "parsing options"
> %(
> In default confilg it's also missed.
Another hint not to use them;-)
> You suppose not to use it ?
Yes.
> I wanted to exec bogofilter -Ph -Pt ...
Why?
>> > --cqxox3fnlpmgjstp-- 1 20030526
>> > --TB36FDmn/VVEgNH/ 2 20030708
>> > --TB36FDmn/VVEgNH/-- 2 20030708
>> I guess the rationale is that those can be specific to spam
>> software.
>
> I think that base64 sets of letters mostly random , isn't it?
In some cases yes, in some cases not. The latter help
identifying spam.
> What about ver 0.15.8, where
> " * Modified handling of mime attachments to decode rfc822 and
> to discard applications and images." ?
> What is that mean? Why discard images ?
Because bogofilter cannot understand them. But of course
they are not discarded but ignored.
> In my character of email, better way is to discard documents such as Microsoft
> Office files rather images
Bogofilter won't read those, too.
>> > Often i see that the letter with attached file and a little piece if text
>> > (two-three words) classified as spam, but attach can't be spam!
>> Why not? I often get spam with attachments.
>
> See above. I have real spam with little images, but I have never seen spam
> with big office documents ;)
I have.
> But i often see false positives on such office documents!
I don't, so maybe your training wasn't that good.
> Who need to tokenize headers ?
They contain a lot of information.
> In spam received: subject: from: and other
> mostly random - spammers try to avoid filters in MTA, but MTA is best for
> filtering on headers! ;)
Actually, in practice they try to look real which is caught.
> If bogofilter is a content-filter, it must rely on content of letters, not on
> random information like a headers ;)
That's by no means random.
pi
More information about the Bogofilter
mailing list