bug using passthrough option on opensuse - solved
Matthias Andree
matthias.andree at gmx.de
Tue May 21 20:44:03 CEST 2024
Am 21.05.24 um 20:42 schrieb Matthias Andree via bogofilter:
> Am 21.05.24 um 16:22 schrieb Manvendra Bhangui:
>> On Tue, 21 May 2024 at 11:54, Matthias Andree
>> <matthias.andree at gmx.de> wrote:
>>> Two things:
>>> 1 We still need to understand where the bug is, we must not abort
>>> the passthrough output early even if the charset contains garbage
>>> because we have only fixed the "convertible charset" case yet,
>>> 2 it is a runtime requirement, too,not just build-time!
>>>
>> The same issue happened in a mageia8 docker container. There was no
>> package glibc-locale-base or equivalent on mageia8. So I did some
>> debugging and fixed the issue. Here is what is happening
>>
>> the function bf_iconv_open() in src/convert_unicode.c gets called
>> with as
>> bf_conv_open("UTF-8", "iso-8859-1").
>> This is because DEFAULT_CHARSET is defined as iso-8859-1 in configure.ac
>> iconv_open() fails with EINVAL and bf_conv_open again calls
>> iconv_open as
>> iconv_open(to_charset, default_charset). Since default_charset is
>> iso-8859-1, it again fails.
>>
>> configure.ac has the option --with-charset=name option, but that has a
>> bug. Even if you pass the option it is not getting set in
>> src/config.h. I made the following change to configure.ac and now
>> bogofilter works on leap15.5, leap15.6 and mageia8 docker containers
>> with the following configure options
>>
>> ./configure --prefix=/usr --with-charset=utf-8
>>
>> This is the change I made
>>
>> diff --git a/bogofilter/configure.ac b/bogofilter/configure.ac
>> index 84fcf8ad..fc6e5ece 100644
>> --- a/bogofilter/configure.ac
>> +++ b/bogofilter/configure.ac
>> @@ -300,17 +300,15 @@ fi
>> dnl Allow the user to specify a default charset
>> AC_ARG_WITH(charset,
>> AS_HELP_STRING([--with-charset=name],
>> - [use specified charset (overrides --enable-russian)
>> [[iso-8859-1]]]),
>> - AC_DEFINE_UNQUOTED(DEFAULT_CHARSET,
>> - ["$withval"],
>> - [Use specified default charset instead of iso-8859-1])
>> + [use specified charset instead of iso-8859-1 (overrides
>> --enable-russian) [[iso-8859-1]]]),
>> + [ DEFAULT_CHARSET=$withval ]
>> )
>>
>> AC_SUBST(ENCODING)
>> AC_SUBST(DEFAULT_CHARSET)
>> AC_DEFINE_UNQUOTED(DEFAULT_CHARSET,
>> ["$DEFAULT_CHARSET"],
>> - [Use specified charset])
>> + [Use specified charset instead of iso-8859-1])
>>
>> dnl Allow the user to enable memory usage debugging methods
>
>
> Manvendra,
>
>
> Thank you. Also thank you very much again for taking the time to report
> and debug this and continuing to correspond with short turnaround. Very
> helpful indeed.
>
>
> So the whole affair is a bit dodgy (involved).
>
> What I figured is that we are looking at a set of subtle bugs, which,
> when happening in combination, break the lexing (tokenization) of the
> input message and also break the passthrough mode.
>
>
> The minimal reproducer for me is:
>
> cat >mail.txt <<_EOF
>
> From: test at example.com
> Date: Mon, 20 May 2024 00:00:00 +0000
>
> Test message
>
> _EOF
Whoops, remove the outer blank lines here:
cat >mail.txt <<_EOF
From: test at example.com
Date: Mon, 20 May 2024 00:00:00 +0000
Test message
_EOF
>
> bogofilter -C --charset-default=nonexist -I mail.txt -p # this truncates
> and omits the Date line and body
>
> bogolexer -C --charset-default=nonexist -I mail.txt # this shows 0
> tokens read.
>
>
> That being said, I am hunting down each of the individual bugs you have
> shown me or that I see in the context.
>
> My git stash currently holds this patch which I haven't yet committed
> because it will mask some other subtle bugs I need to fix first so we
> get [rid of] 'em all.
>
> Once we have those in place, bogofilter 1.2.6 it shall be. This is a
> critical bug IMO.
>
>
> diff --git a/bogofilter/src/iconvert.c b/bogofilter/src/iconvert.c
> index 1d8d5e9f..786b962a 100644
> --- a/bogofilter/src/iconvert.c
> +++ b/bogofilter/src/iconvert.c
> @@ -194,7 +194,8 @@ static void copy(buff_t *restrict src, buff_t
> *restrict dst)
> void iconvert(buff_t *restrict src, buff_t *restrict dst)
> {
> assert(src->t.u.text != dst->t.u.text);
> - if (cd == NULL)
> + BOGO_ASSERT(cd != NULL, "cd should have been initialized, and if to
> -1 for failure"); /* this should not happen */
> + if (cd == (iconv_t)-1)
> copy(src, dst);
> else
> convert(cd, src, dst);
>
>
> Cheers,
> Matthias
>
> _______________________________________________
> bogofilter mailing list
> bogofilter at bogofilter.org
> https://www.bogofilter.org/mailman/listinfo/bogofilter
More information about the bogofilter
mailing list