bug using passthrough option on opensuse - solved
Matthias Andree
matthias.andree at gmx.de
Tue May 21 20:42:07 CEST 2024
Am 21.05.24 um 16:22 schrieb Manvendra Bhangui:
> On Tue, 21 May 2024 at 11:54, Matthias Andree <matthias.andree at gmx.de> wrote:
>> Two things:
>> 1 We still need to understand where the bug is, we must not abort the passthrough output early even if the charset contains garbage because we have only fixed the "convertible charset" case yet,
>> 2 it is a runtime requirement, too,not just build-time!
>>
> The same issue happened in a mageia8 docker container. There was no
> package glibc-locale-base or equivalent on mageia8. So I did some
> debugging and fixed the issue. Here is what is happening
>
> the function bf_iconv_open() in src/convert_unicode.c gets called with as
> bf_conv_open("UTF-8", "iso-8859-1").
> This is because DEFAULT_CHARSET is defined as iso-8859-1 in configure.ac
> iconv_open() fails with EINVAL and bf_conv_open again calls iconv_open as
> iconv_open(to_charset, default_charset). Since default_charset is
> iso-8859-1, it again fails.
>
> configure.ac has the option --with-charset=name option, but that has a
> bug. Even if you pass the option it is not getting set in
> src/config.h. I made the following change to configure.ac and now
> bogofilter works on leap15.5, leap15.6 and mageia8 docker containers
> with the following configure options
>
> ./configure --prefix=/usr --with-charset=utf-8
>
> This is the change I made
>
> diff --git a/bogofilter/configure.ac b/bogofilter/configure.ac
> index 84fcf8ad..fc6e5ece 100644
> --- a/bogofilter/configure.ac
> +++ b/bogofilter/configure.ac
> @@ -300,17 +300,15 @@ fi
> dnl Allow the user to specify a default charset
> AC_ARG_WITH(charset,
> AS_HELP_STRING([--with-charset=name],
> - [use specified charset (overrides --enable-russian)
> [[iso-8859-1]]]),
> - AC_DEFINE_UNQUOTED(DEFAULT_CHARSET,
> - ["$withval"],
> - [Use specified default charset instead of iso-8859-1])
> + [use specified charset instead of iso-8859-1 (overrides
> --enable-russian) [[iso-8859-1]]]),
> + [ DEFAULT_CHARSET=$withval ]
> )
>
> AC_SUBST(ENCODING)
> AC_SUBST(DEFAULT_CHARSET)
> AC_DEFINE_UNQUOTED(DEFAULT_CHARSET,
> ["$DEFAULT_CHARSET"],
> - [Use specified charset])
> + [Use specified charset instead of iso-8859-1])
>
> dnl Allow the user to enable memory usage debugging methods
Manvendra,
Thank you. Also thank you very much again for taking the time to report
and debug this and continuing to correspond with short turnaround. Very
helpful indeed.
So the whole affair is a bit dodgy (involved).
What I figured is that we are looking at a set of subtle bugs, which,
when happening in combination, break the lexing (tokenization) of the
input message and also break the passthrough mode.
The minimal reproducer for me is:
cat >mail.txt <<_EOF
From: test at example.com
Date: Mon, 20 May 2024 00:00:00 +0000
Test message
_EOF
bogofilter -C --charset-default=nonexist -I mail.txt -p # this truncates
and omits the Date line and body
bogolexer -C --charset-default=nonexist -I mail.txt # this shows 0
tokens read.
That being said, I am hunting down each of the individual bugs you have
shown me or that I see in the context.
My git stash currently holds this patch which I haven't yet committed
because it will mask some other subtle bugs I need to fix first so we
get [rid of] 'em all.
Once we have those in place, bogofilter 1.2.6 it shall be. This is a
critical bug IMO.
diff --git a/bogofilter/src/iconvert.c b/bogofilter/src/iconvert.c
index 1d8d5e9f..786b962a 100644
--- a/bogofilter/src/iconvert.c
+++ b/bogofilter/src/iconvert.c
@@ -194,7 +194,8 @@ static void copy(buff_t *restrict src, buff_t
*restrict dst)
void iconvert(buff_t *restrict src, buff_t *restrict dst)
{
assert(src->t.u.text != dst->t.u.text);
- if (cd == NULL)
+ BOGO_ASSERT(cd != NULL, "cd should have been initialized, and if to
-1 for failure"); /* this should not happen */
+ if (cd == (iconv_t)-1)
copy(src, dst);
else
convert(cd, src, dst);
Cheers,
Matthias
More information about the bogofilter
mailing list