0x1A Character

Jesse Pelton

2005-02-03 15:01:11 UTC

The spec is hard to read on this point. So-called "restricted"
characters are not allowed. See the discussion beginning at
http://www.stylusstudio.com/xmldev/200410/post30210.html for an
explanation.

________________________________

From: ***@micron.com [mailto:***@micron.com]
Sent: Thursday, February 03, 2005 9:39 AM
To: xerces-c-***@xml.apache.org
Subject: 0x1A Character

Hi All,

I have a question regarding an Invalid XML character and Xerces
behavior pertaining to it. Apologize since my questions are not
completely Xerces specific.

Recently, we found out that some of our XML text nodes contain
the 0x1A character. This causes the Xerces parser to throw a Invalid
character (Unicode: 0x1A) error.

Upon investigating the XML specs, the XML 1.0 Spec does not show
that in the list of valid characters. However, the XML 1.1 spec points
to it as a valid, but restricted (not sure what that means).

On reading this, I changed my XML prolog version to 1.1. The
DOMCount sample that I used still throws an exception. I used Xerces
2.5.0 for my testing.

Questions I have:

1. What does the "restricted" char mean?

2. Will this be supported in the future? Is that part of 1.1
spec not supported by Xerces yet?

3. Xerces does not throw an exception when I create a DOM
document and add a text node with that character. Is there a method for
me to check validity of my data before adding it to the text node?

4. Any suggestions on getting around this issue?

Thanks,

Ravin