Discussion:
xerces trunk on openbsd 5.1
Simon Elbaz
2012-09-27 22:18:10 UTC
Permalink
Hi,

I wanted to try using xerces on openbsd 5.1.

After compilation, DOMCount was always returning:
unknow reason.

After reading the code, it turns out that the end of conversion by
wcsrtombs and mbsrtowcs is based on a test on source pointer (source
pointer should point on null character).

The problem is that this behaviour is not implemented. Source pointer
points on the character following the last converted character leading
xerces binary to a risky memory access.

Below, there is a patch based on values returned by the functions (-1 in
case of error, >= 0 in case of complete/incomplete conversion) that fixes
the problem.

Regards,
Simon Elbaz

$ svn diff xercesc/util/Transcoders/Iconv/IconvTransService.cpp
Index: xercesc/util/Transcoders/Iconv/IconvTransService.cpp
===================================================================
--- xercesc/util/Transcoders/Iconv/IconvTransService.cpp (revision
1387785)
+++ xercesc/util/Transcoders/Iconv/IconvTransService.cpp (working
copy)
@@ -429,7 +429,7 @@
srcBuffer[gTempBuffArraySize - 1] = 0;
const wchar_t *src = 0;

- while (toTranscode[srcCursor] || src)
+ while (toTranscode[srcCursor])
{
if (src == 0) // copy a piece of the source string into a local
// buffer, converted to wchar_t and NULL-terminated.
@@ -454,7 +454,7 @@
break;
}
dstCursor += len;
- if (src != 0) // conversion not finished. This *always* means there
+ if (len == (resultSize - dstCursor)) // conversion not finished.
This *always* means there
// was not enough room in the destination buffer.
{
reallocString<char>(resultString, resultSize, manager,
resultString != localBuffer);
@@ -512,9 +512,9 @@
break;
}
dstCursor += len;
- if (src == 0) // conversion finished
+ if ((len >= 0) && (len < (resultSize - dstCursor))) // conversion
finished
break;
- if (dstCursor >= resultSize - 1)
+ if (len == (resultSize - dstCursor))
reallocString<wchar_t>(tmpString, resultSize, manager,
tmpString != localBuffer);
}
// make a final copy, converting from wchar_t to XMLCh:
s***@e-z.net
2012-09-27 22:28:30 UTC
Permalink
FYI

Be careful with type wchar_t for code validation.

GNU implements wchar_t as 32-bit.
Windows implements wchar_t as 16-bit.
Other platforms may also have 16/32, mixed, or undefined.

The type XMLCh is a 16-bit type. The internal data storage
is UTF-16.

Sincerely,
Steven J. Hathaway
Post by Simon Elbaz
Hi,
I wanted to try using xerces on openbsd 5.1.
unknow reason.
After reading the code, it turns out that the end of conversion by
wcsrtombs and mbsrtowcs is based on a test on source pointer (source
pointer should point on null character).
The problem is that this behaviour is not implemented. Source pointer
points on the character following the last converted character leading
xerces binary to a risky memory access.
Below, there is a patch based on values returned by the functions (-1 in
case of error, >= 0 in case of complete/incomplete conversion) that fixes
the problem.
Regards,
Simon Elbaz
$ svn diff xercesc/util/Transcoders/Iconv/IconvTransService.cpp
Index: xercesc/util/Transcoders/Iconv/IconvTransService.cpp
===================================================================
--- xercesc/util/Transcoders/Iconv/IconvTransService.cpp (revision
1387785)
+++ xercesc/util/Transcoders/Iconv/IconvTransService.cpp (working
copy)
@@ -429,7 +429,7 @@
srcBuffer[gTempBuffArraySize - 1] = 0;
const wchar_t *src = 0;
- while (toTranscode[srcCursor] || src)
+ while (toTranscode[srcCursor])
{
if (src == 0) // copy a piece of the source string into a local
// buffer, converted to wchar_t and
NULL-terminated.
@@ -454,7 +454,7 @@
break;
}
dstCursor += len;
- if (src != 0) // conversion not finished. This *always* means there
+ if (len == (resultSize - dstCursor)) // conversion not finished.
This *always* means there
// was not enough room in the destination buffer.
{
reallocString<char>(resultString, resultSize, manager,
resultString != localBuffer);
@@ -512,9 +512,9 @@
break;
}
dstCursor += len;
- if (src == 0) // conversion finished
+ if ((len >= 0) && (len < (resultSize - dstCursor))) // conversion
finished
break;
- if (dstCursor >= resultSize - 1)
+ if (len == (resultSize - dstCursor))
reallocString<wchar_t>(tmpString, resultSize, manager,
tmpString != localBuffer);
}
Ben/RS
2012-09-28 07:49:51 UTC
Permalink
I thought the internal format was UCS-2; is it actually UTF-16 ?

-b.
Post by s***@e-z.net
The type XMLCh is a 16-bit type. The internal data storage
is UTF-16.
Sincerely,
Steven J. Hathaway
s***@e-z.net
2012-09-28 21:40:40 UTC
Permalink
I am still recently new to the current Xerces. I use it through the Xalan
project.

If it is UCS-2, then that explains the apparent ambiguity when 2 XMLCh are
required to render some large Unicode codepoints.

- Steve
Post by Ben/RS
I thought the internal format was UCS-2; is it actually UTF-16 ?
-b.
Post by s***@e-z.net
The type XMLCh is a 16-bit type. The internal data storage
is UTF-16.
Sincerely,
Steven J. Hathaway
---------------------------------------------------------------------
Alberto Massari
2012-09-28 14:27:52 UTC
Permalink
Hi Simon,
it looks that libc in OpenBSD 5.1 is not obeying to the documentation
for wcsrtombs/mbsrtowcs.

If_d__s__t_ is not a null pointer, the pointer object pointed to
by_s__r__c_ is assigned either a null pointer (if conversion
stopped due to reaching a terminating null wide-character)
or the address just past the last wide-character converted
(if any).

Instead of hacking the code to try to detect whether the conversion
actually wrote a NULL character in the converted string, I chose to
modify the 'configure' script to detect this behaviour and disable the
usage of the re-entrant functions if it doesn't match how the Xerces
code uses them.

Thank you for reporting this issue,
Alberto
Post by Simon Elbaz
Hi,
I wanted to try using xerces on openbsd 5.1.
unknow reason.
After reading the code, it turns out that the end of conversion by
wcsrtombs and mbsrtowcs is based on a test on source pointer (source
pointer should point on null character).
The problem is that this behaviour is not implemented. Source pointer
points on the character following the last converted character leading
xerces binary to a risky memory access.
Below, there is a patch based on values returned by the functions (-1
in case of error, >= 0 in case of complete/incomplete conversion) that
fixes the problem.
Regards,
Simon Elbaz
$ svn diff xercesc/util/Transcoders/Iconv/IconvTransService.cpp
Index: xercesc/util/Transcoders/Iconv/IconvTransService.cpp
===================================================================
--- xercesc/util/Transcoders/Iconv/IconvTransService.cpp (revision
1387785)
+++ xercesc/util/Transcoders/Iconv/IconvTransService.cpp (working copy)
@@ -429,7 +429,7 @@
srcBuffer[gTempBuffArraySize - 1] = 0;
const wchar_t *src = 0;
- while (toTranscode[srcCursor] || src)
+ while (toTranscode[srcCursor])
{
if (src == 0) // copy a piece of the source string into a local
// buffer, converted to wchar_t and
NULL-terminated.
@@ -454,7 +454,7 @@
break;
}
dstCursor += len;
- if (src != 0) // conversion not finished. This *always* means there
+ if (len == (resultSize - dstCursor)) // conversion not
finished. This *always* means there
// was not enough room in the destination buffer.
{
reallocString<char>(resultString, resultSize, manager,
resultString != localBuffer);
@@ -512,9 +512,9 @@
break;
}
dstCursor += len;
- if (src == 0) // conversion finished
+ if ((len >= 0) && (len < (resultSize - dstCursor))) //
conversion finished
break;
- if (dstCursor >= resultSize - 1)
+ if (len == (resultSize - dstCursor))
reallocString<wchar_t>(tmpString, resultSize, manager,
tmpString != localBuffer);
}
Simon Elbaz
2012-09-29 23:08:02 UTC
Permalink
Hi Alberto,

your modification of configure script solves my problem.
Thanks.

On Fri, Sep 28, 2012 at 4:27 PM, Alberto Massari <
Post by Alberto Massari
Hi Simon,
it looks that libc in OpenBSD 5.1 is not obeying to the documentation for
wcsrtombs/mbsrtowcs.
If *d**s**t* is not a null pointer, the pointer object pointed to
by *s**r**c* is assigned either a null pointer (if conversion
stopped due to reaching a terminating null wide-character)
or the address just past the last wide-character converted
(if any).
Instead of hacking the code to try to detect whether the conversion
actually wrote a NULL character in the converted string, I chose to modify
the 'configure' script to detect this behaviour and disable the usage of
the re-entrant functions if it doesn't match how the Xerces code uses them.
Thank you for reporting this issue,
Alberto
Hi,
I wanted to try using xerces on openbsd 5.1.
unknow reason.
After reading the code, it turns out that the end of conversion by
wcsrtombs and mbsrtowcs is based on a test on source pointer (source
pointer should point on null character).
The problem is that this behaviour is not implemented. Source pointer
points on the character following the last converted character leading
xerces binary to a risky memory access.
Below, there is a patch based on values returned by the functions (-1 in
case of error, >= 0 in case of complete/incomplete conversion) that fixes
the problem.
Regards,
Simon Elbaz
$ svn diff xercesc/util/Transcoders/Iconv/IconvTransService.cpp
Index: xercesc/util/Transcoders/Iconv/IconvTransService.cpp
===================================================================
--- xercesc/util/Transcoders/Iconv/IconvTransService.cpp (revision
1387785)
+++ xercesc/util/Transcoders/Iconv/IconvTransService.cpp (working
copy)
@@ -429,7 +429,7 @@
srcBuffer[gTempBuffArraySize - 1] = 0;
const wchar_t *src = 0;
- while (toTranscode[srcCursor] || src)
+ while (toTranscode[srcCursor])
{
if (src == 0) // copy a piece of the source string into a local
// buffer, converted to wchar_t and NULL-terminated.
@@ -454,7 +454,7 @@
break;
}
dstCursor += len;
- if (src != 0) // conversion not finished. This *always* means there
+ if (len == (resultSize - dstCursor)) // conversion not finished.
This *always* means there
// was not enough room in the destination buffer.
{
reallocString<char>(resultString, resultSize, manager,
resultString != localBuffer);
@@ -512,9 +512,9 @@
break;
}
dstCursor += len;
- if (src == 0) // conversion finished
+ if ((len >= 0) && (len < (resultSize - dstCursor))) // conversion
finished
break;
- if (dstCursor >= resultSize - 1)
+ if (len == (resultSize - dstCursor))
reallocString<wchar_t>(tmpString, resultSize, manager,
tmpString != localBuffer);
}
Loading...