Discussion:
[jira] [Created] (XERCESC-2016) XML 1.0 5th edition support
Rob Cameron (JIRA)
2013-06-18 16:28:20 UTC
Permalink
Rob Cameron created XERCESC-2016:
------------------------------------

Summary: XML 1.0 5th edition support
Key: XERCESC-2016
URL: https://issues.apache.org/jira/browse/XERCESC-2016
Project: Xerces-C++
Issue Type: Improvement
Components: Non-Validating Parser
Affects Versions: 3.1.1
Environment: All
Reporter: Rob Cameron
Fix For: 3.1.2


Xerces-C currently applies XML 1.0 4th edition rules to name characters
in XML 1.0 documents. XML 1.0 5th edition permits a broader class
of name characters, based on those permitted in XML 1.1.

Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0
5th edition.

Although our main work is with icXML, we've looked at making this change
in Xerces-C original code base so that icXML support for XML 1.0 5e is
compatible with us.

I'm not entirely sure that I've handled everything, but the following change
works in our test. The change plan is below and a svn diff file is
attached.

Here is the change plan.
----------------------------------


(1) internal/CharTypeTables.hpp

Rename gFirstNameChars1_1 to be gFirstNameChars
Rename gNameChars1_1 to be gNameChars

(2) util/XMLChar.cpp
(2a)
Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars
Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1()
to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask.
//
// Name characters are special. A name is made up of a number of
// different tables and some special case characters.
//
initOneTable(gNameChars, gNameCharMask);

//
// Name characters are special. A name is made up of a number of
// different tables and some special case characters.
//
initOneTable(gNameChars, gNCNameCharMask);
gTmpCharTable[chColon] &= ~gNCNameCharMask;

//
// Then do the first name char
//
initOneTable(gFirstNameChars, gFirstNameCharMask);

(2b) #define NEED_TO_GEN_TABLE
compile and do a sample run of a Xerces app, generate table.out

(2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp
with that from table.out.

(3) XMLChar.hpp
Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar,
XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar
to each check for and allow characters in the #x10000-#xEFFFF range

else {
if ((toCheck >= 0xD800) && (toCheck <= 0xDB7F))
if ((toCheck2 >= 0xDC00) && (toCheck2 <= 0xDFFF))
return true;
}


(4) Modify XMLReader::getName and XMLReader::getNCName
to allow surrogate pairs in Names and NCNames
(i.e., use the version 1.1 logic for both 1.0 and 1.1).


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Rob Cameron (JIRA)
2013-06-18 16:30:20 UTC
Permalink
[ https://issues.apache.org/jira/browse/XERCESC-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rob Cameron updated XERCESC-2016:
---------------------------------

Attachment: diff5e

Here is the patch to add XML 1.05e support.
Post by Rob Cameron (JIRA)
XML 1.0 5th edition support
---------------------------
Key: XERCESC-2016
URL: https://issues.apache.org/jira/browse/XERCESC-2016
Project: Xerces-C++
Issue Type: Improvement
Components: Non-Validating Parser
Affects Versions: 3.1.1
Environment: All
Reporter: Rob Cameron
Fix For: 3.1.2
Attachments: diff5e
Xerces-C currently applies XML 1.0 4th edition rules to name characters
in XML 1.0 documents. XML 1.0 5th edition permits a broader class
of name characters, based on those permitted in XML 1.1.
Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0
5th edition.
Although our main work is with icXML, we've looked at making this change
in Xerces-C original code base so that icXML support for XML 1.0 5e is
compatible with us.
I'm not entirely sure that I've handled everything, but the following change
works in our test. The change plan is below and a svn diff file is
attached.
Here is the change plan.
----------------------------------
(1) internal/CharTypeTables.hpp
Rename gFirstNameChars1_1 to be gFirstNameChars
Rename gNameChars1_1 to be gNameChars
(2) util/XMLChar.cpp
(2a)
Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars
Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1()
to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask.
//
// Name characters are special. A name is made up of a number of
// different tables and some special case characters.
//
initOneTable(gNameChars, gNameCharMask);
//
// Name characters are special. A name is made up of a number of
// different tables and some special case characters.
//
initOneTable(gNameChars, gNCNameCharMask);
gTmpCharTable[chColon] &= ~gNCNameCharMask;
//
// Then do the first name char
//
initOneTable(gFirstNameChars, gFirstNameCharMask);
(2b) #define NEED_TO_GEN_TABLE
compile and do a sample run of a Xerces app, generate table.out
(2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp
with that from table.out.
(3) XMLChar.hpp
Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar,
XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar
to each check for and allow characters in the #x10000-#xEFFFF range
else {
if ((toCheck >= 0xD800) && (toCheck <= 0xDB7F))
if ((toCheck2 >= 0xDC00) && (toCheck2 <= 0xDFFF))
return true;
}
(4) Modify XMLReader::getName and XMLReader::getNCName
to allow surrogate pairs in Names and NCNames
(i.e., use the version 1.1 logic for both 1.0 and 1.1).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Alberto Massari (JIRA)
2013-08-26 10:37:51 UTC
Permalink
[ https://issues.apache.org/jira/browse/XERCESC-2016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alberto Massari resolved XERCESC-2016.
--------------------------------------

Resolution: Fixed
Fix Version/s: (was: 3.1.2)
3.2.0
Assignee: Alberto Massari

I have gone through the 5th edition specs and implemented a good part of it; the XML Test suite still reports 16 failures mainly due to URI rules
Post by Rob Cameron (JIRA)
XML 1.0 5th edition support
---------------------------
Key: XERCESC-2016
URL: https://issues.apache.org/jira/browse/XERCESC-2016
Project: Xerces-C++
Issue Type: Improvement
Components: Non-Validating Parser
Affects Versions: 3.1.1
Environment: All
Reporter: Rob Cameron
Assignee: Alberto Massari
Fix For: 3.2.0
Attachments: diff5e
Xerces-C currently applies XML 1.0 4th edition rules to name characters
in XML 1.0 documents. XML 1.0 5th edition permits a broader class
of name characters, based on those permitted in XML 1.1.
Proposal: that Xerces-C 3.2.0 be updated to include support for XML 1.0
5th edition.
Although our main work is with icXML, we've looked at making this change
in Xerces-C original code base so that icXML support for XML 1.0 5e is
compatible with us.
I'm not entirely sure that I've handled everything, but the following change
works in our test. The change plan is below and a svn diff file is
attached.
Here is the change plan.
----------------------------------
(1) internal/CharTypeTables.hpp
Rename gFirstNameChars1_1 to be gFirstNameChars
Rename gNameChars1_1 to be gNameChars
(2) util/XMLChar.cpp
(2a)
Update initCharFlagTable1_1() to use the gFirstNameChars, gNameChars
Update initCharFlagTable() to use the set-ups from initCharFlagTable1_1()
to define gNameCharMask, gNCNameCharMask, and gFirstNameCharMask.
//
// Name characters are special. A name is made up of a number of
// different tables and some special case characters.
//
initOneTable(gNameChars, gNameCharMask);
//
// Name characters are special. A name is made up of a number of
// different tables and some special case characters.
//
initOneTable(gNameChars, gNCNameCharMask);
gTmpCharTable[chColon] &= ~gNCNameCharMask;
//
// Then do the first name char
//
initOneTable(gFirstNameChars, gFirstNameCharMask);
(2b) #define NEED_TO_GEN_TABLE
compile and do a sample run of a Xerces app, generate table.out
(2c) Replace the XMLChar1_0::fgCharCharsTable1_0 definition pf XMLChar.cpp
with that from table.out.
(3) XMLChar.hpp
Modify XMLChar1_0::isFirstNameChar, XMLChar1_0::isFirstNCNameChar,
XMLChar1_0::isNameChar, XMLChar1_0::isNCNameChar
to each check for and allow characters in the #x10000-#xEFFFF range
else {
if ((toCheck >= 0xD800) && (toCheck <= 0xDB7F))
if ((toCheck2 >= 0xDC00) && (toCheck2 <= 0xDFFF))
return true;
}
(4) Modify XMLReader::getName and XMLReader::getNCName
to allow surrogate pairs in Names and NCNames
(i.e., use the version 1.1 logic for both 1.0 and 1.1).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Loading...