[Bogofilter]UNSURE messages end up in inbox folder

Daniel Moyne daniel.moyne at neuf.fr
Thu Aug 21 15:07:21 CEST 2008


Le Tuesday 12 August 2008, Nigel Henry a écrit :
> On Tuesday 12 August 2008 13:01, Daniel Moyne wrote:
> > Le Saturday 09 August 2008, Nigel Henry a écrit :
> > > On Saturday 09 August 2008 00:55, Daniel Moyne wrote:
> > > > I scrupulously followed Nigel's how-to with all these folders :
> > > > Spam
> > > > NonSpam
> > > > nonspamnew
> > > > spam
> > > > unsure
> > > > in KMail (kde-4)
> > > >
> > > > Apprently everything works fine except that all "unsure" prefixed in
> > > > subjet as ???UNSURE??? end-up in my inbox folder though fliter-4 is
> > > > upposed to get them dumped into folder unsure !
> > > >
> > > > Regards.
> > >
> > > I'll come back on this problem tomorrow, as it's getting late. My
> > > version of bogofilter is quite old, but don't think that that is the
> > > problem.
> > >
> > > I'm also using an old version of Kmail on Fedora Core 2 (KDE 3.2.2),
> > > but should not affect the filters.
> > >
> > > Of course I may not be able to help much. I have bogofilter working
> > > with Kmail with no problems, but perhaps someone directly working with
> > > the bogofilter project may be able to offer better advice.
> > >
> > > Nigel.
> > > _______________________________________________
> > > Bogofilter mailing list
> > > Bogofilter at bogofilter.org
> > > http://www.bogofilter.org/mailman/listinfo/bogofilter
> >
> > Nigel,
> > so far no other UNSURE mails so difficult to say whether it works fine or
> > not. Nigel still on you how-to
> >
> > Can we say that once you have done a run on "Spam" and "NonSpam" folders
> > there are no reasons to keep their content that can be dumped ; we still
> > have to keep these folders to dump in "Spam" some messages of "unsure"
> > after making sure they must be processed as spam, and dumping in
> > "NonSpam" some messages of "unsure" after making sure they must be
> > processed as non spam.
> >
> > Once the content of "Spam" and "NonSpam" folders hs been updated we can
> > ru you script on them.
> >
> > I am still wondering abot the use of "nonspamnew" folder.
> > Regards.
>
> Bonjour Daniel. Apologies for not replying. I got sidetracked with a KDE4
> problem on my Archlinux install, and forgot all about replying to you.
>
> This is only my way of using the Spam, and NonSpam folders, and I'm sure
> others may do things differently, but this is how it goes. Having created
> the Spam, and NonSpam folders, and before starting to use bogofilter, I
> started to fill the Spam folder with spam that was coming into the inbox.
> At the same time I put the same amount of genuine mail in the NonSpam
> folder. Now there are about 200 emails in both the Spam, and NonSpam
> folders.
>
> Now having downloaded, and installed bogofilter, and having created
> the .bogofilter directory in my /home/user directory, and also having
> created the bogofilter filter, and the the filters for ham, spam, and
> unsure, for the first, and only time I run bogofilter -sv -B Mail/Spam/cur,
> and bogofilter -nv -B Mail/Nonspam/cur.
>
> Now when you next check the mail, bogofilter now has a bunch of spam, and
> ham to work with, and can decide, based on the spam, and ham in the
> ~/.bogofilter/wordlist.db, where to send the incoming mail. Mail that is
> obviously ham will be sent to the inbox. Mail that is obviously spam, will
> be sent to the wastebin, but personally I set up another folder, which I've
> named spamcheck, so that the spam goes there first, and I can make sure
> that no genuine messages havn't been wrongly identified as spam. After that
> I can empty the spamcheck contents into the wastebin.
>
> Now onto the unsure folder. Some spam can look like genuine email, or the
> spammers are trying new ways of getting past the spam filters, and if
> bogofilter isn't sure it will send it to the unsure folder. Of course the
> more you train bogofilter on new spam which turns up in the unsure folder,
> the less errors it makes. Some ham (nonspam) also at times can look a bit
> spammy. For example some genuine emails may have words included, that
> normally turn up in spam emails. Bogofilter again isn't sure, so sends
> these to the unsure folder.
>
> Now I said earlier on that I only run bogofilters training once on the
> Spam, and Nonspam folders, and you could after doing that just send all
> your spam, and ham in these folders to the wastebin. then after doing that
> you could sort the unsure mails out, send the spam to the Spam folder, and
> the ham to the Nonspam folder, and run the training script on both again.
>
> Doing it this way though, if for some reason or other your wordlist.db
> should become corrupted, you have to start all over again, which is why I
> train just once on the 200 emails in the Spam, and Nonspam folders, and
> leave all the ham, and spam in these folders. As I still have all my spam,
> and ham in these 2 folders, if the wordlist.db should become corrupted, and
> I have to delete it, all I have to do is rerun the training program for
> both folders, and the wordlist.db will be recreated.
>
> This is why I also created the folders "spam", and "Nonspamnew". I use
> these for the mail that is in the unsure folder. Each day I check the
> unsure folder, and send the spam to the "spam" folder, and the ham to the
> "Nonspamnew" folder. Now I don't run the training program on these 2
> folders every day, and usually wait until there are about 100 spam mails in
> the "spam" folder, then run the following.
> bogofilter -sv -B Mail/spam/cur
> bogofilter -nv -B Mail/Nonspamnew/cur
>
> Now I want to move these spam, and ham newly trained emails to the Spam,
> and Nonspam folders, so that if the wordlist.db should become corrupted,
> all the latest ham, and spam will be available to recreate the wordlist.db.
>
> Ctrl +A will highlight all the mails in the spam folder, and right click,
> and move to Spam, will add these latest spam emails to the Spam folder.
>
> Back to your problem with the unsure filter. This is what you said, see
> below.
>
> <quote>
> Apprently everything works fine except that all "unsure" prefixed in
>  subjet as ???UNSURE??? end-up in my inbox folder though fliter-4 is
>  upposed to get them dumped into folder unsure !
> <end quote>
>
> I'm not sure if I understand you here. Could you show how you have this
> filter setup.
>
> The first line should show:
> X-Bogosity          contains        Unsure
>
> Depending on how many e-mails you receive each day, you should be getting
> some in the unsure box. the more you train bogofilter, the less mail should
> be in the unsure box, but this takes some time, and when spammers change
> their methods, you may well find more spam in the unsure box again.
>
Nigel,
ok I found what was wrong in my set-up :
the first filter bogofilter contained action :
Pipe Through  bogofilter
rather than :
Pipe Through  bogofilter -epv

So at the end of the day when trying to apply your how-to I have made 2 
mistakes that are now corrected ; now everything is apparently (carefull !) 
running fine.

See my comments on what David said at one time to check X-Bogosity value ; I 
think my answer is correct as to the non existence of such line in final 
messages after the bogofilter process.

Last point I will try to add a cron line that could each week run a script 
doing the following :
a) run your commands on both "Spam" and "NonSpam" folders,
b) delete(*) all messages contained in these folders afterwards.

(*) I do not agree with the advantage of keeping the content of "Spam" and 
"NonSpam" folders as anyways you can easily go to the same process of filling 
both folders with appropriate messages (we have so many available of ham type 
and the spammer are so good at sending some of spam type) to train bogofilter 
+ the fact that anyways most of the "NonSpam" messages are sorted out not by 
bogofilter but by the collection of filters that do their expected job 
upstream and therefore running script on them brings no value. 

Some other points regarding your last message :
- you do not have to create the ~/.bogofilter directory as this should be done 
by bogofilter automatically if not mistaken,
- rather than create a "spamcheck" directory that basically is no different 
than "spam" I would set a 30 days period of latence on "spam" directory 
because is is still (and will ever be) up to the user to check from time to 
time through a quick visual check the content of the "spam" folder to possibly 
recover some unexpected ham messages to be first copied in "Spam" folder 
before being transfered in the "inbox" folder.
- by definition what is in "spam" folder has been correctly processed by 
bogofilter after user final check ; the user SHOULD NEVER transfer manually 
messages in this folder only REMOVE some of them if considered necessary as of 
ham type) ; same philosophy for "unsure" folder ; consequently there is no 
reason AT ALL to run a script on "spam" folder and of course on "nonspam".
- sorry but I do not see your point in creating "N(n)onspamnew" folder simply 
because you loose the logics you set a the very beginning with "Spam" and 
"NonSpam" folders : at the beginning but also from time to time spam messages 
manually sorted out by user are transfered to the "Spam" folder ready for 
script process ; the same way your non-spam messages manually sorted out by 
user are copied to the "inbox" before being transferred to "NonSpam" folder ; 
both these folders "Spam" and "NonSpam" CONTAIN what is MANUALLY transfered by 
user meaning that bogofilter failed and that some further training is needed 
as original setting dictates ; therefore script are upposed to be ru ONLY on 
these 2 folders.

Last point at all regarding "bogofilter.cf" setting ; my original setting from 
bogofilter package was all commented out ; can you send me your setting file 
at my email-address : Daniel Moyne <daniel.moyne at neuf.fr> 

Regards.

-- 
Daniel Moyne (Nulix)---------------------------------------------------------
Distribution : Ubuntu 8.04 Hardy Heron    \\|||// Machine : x86_64
               kernel 2.6.24-19-generic   / --- \ ATI Radeon X300 Express
               KDE 3.5.9 + 4.1 (test)    (' o-o ')
----------------------------------------oOO-(_)-OOo--------------------------




More information about the Bogofilter mailing list