token pairs [was: Algorithm limitations]
David Relson
relson at osagesoftware.com
Wed Apr 14 02:32:58 CEST 2004
On Tue, 13 Apr 2004 14:12:21 +0200
Boris 'pi' Piwinger wrote:
> David Relson wrote:
>
> > I'm not willing to include word pairs until after the 1.0 release,
> > but am willing to let users experiment with the technique. Attached
> > is a patch from a couple of months ago and updated to work with
> > 0.17.5. Below is a sample of the output using it:
> >
> > [relson at osage src]$ echo this is a test of word pairs | bogofilter
> > -C -H-vvv
>
> > [relson at osage src]$ echo this is a test of word pairs | bogofilter
> > -C -H-vvv -P
>
> >From that I understand that you need to call -P to make use
> of the feature. Could you or someone else please give a
> brief explanation which pairs are chosen? Is it only
> adjacent tokens (in your example the short words are not
> tokens) or can you jump over a word? The example output
> suggests that this does not happen.
>
> Can you do instead of -P a config file option?
The patch below with give you "word-pairs=yes/no" ...
--- bogoconfig.c 18 Mar 2004 21:05:56 -0000 1.170
+++ bogoconfig.c 14 Apr 2004 00:31:38 -0000
@@ -108,6 +108,7 @@
{ "use-syslog", N, 0, 'l' },
{ "register-ham", N, 0, 'n' },
{ "passthrough", N, 0, 'p' },
+ { "word-pairs", R, 0, 'P' },
{ "register-spam", N, 0, 's' },
{ "update-as-classed", N, 0, 'u' },
{ "timestamp-date", N, 0, 'y' },
More information about the Bogofilter
mailing list