Discussion:
Regarding Xercesc++ performance
Baxi, Rinil Rushabh
2013-06-13 09:07:04 UTC
Permalink
Hi All,

I have 2 Xerces-C++ libraries available on my platform (2.4 and 3.1). Both are built without threads. I am trying to compare performance of both of them. To compare performance I am using different sized xml files to parse using the samples (1kb, 65kb, 256Kb, 1Mb, 2Mb, 5Mb and 15Mb). I have put each sample in a script and run the same sample 1000 times to compare the parsing time.

We observed that till 1Mb xml file size performance of Xerces-C++ 3.1 is better after that it starts deteriorating. With 15Mb xml file 3.1 sample takes almost 30% more time than with 2.4 same sample.

Please let me know whether this is the right method to measure performance or not. If no then how can we measure that. One more question is Why such performance degradation?

Thanks in advance.

Best Regards,
Rinil
Huantes, Dan F (TASC)
2013-06-13 12:49:44 UTC
Permalink
Nice work.

I'm curious as to whether your performance testing is DOM based, SAX based, or both.

I ask because my anecdotal experience is that files exceeding 1MB experience large performance hits due to the inherent nature of the DOM model. Under these scenarios, I have used SAX because it's several orders of magnitude faster (i.e. seconds vs minutes). We used 2.8 before but never thought to compare the difference in performance between different versions. You may be on to something. Thanks.

Dan


From: Baxi, Rinil Rushabh [mailto:***@hp.com]
Sent: Thursday, June 13, 2013 4:07 AM
To: c-***@xerces.apache.org
Subject: Regarding Xercesc++ performance

Hi All,

I have 2 Xerces-C++ libraries available on my platform (2.4 and 3.1). Both are built without threads. I am trying to compare performance of both of them. To compare performance I am using different sized xml files to parse using the samples (1kb, 65kb, 256Kb, 1Mb, 2Mb, 5Mb and 15Mb). I have put each sample in a script and run the same sample 1000 times to compare the parsing time.

We observed that till 1Mb xml file size performance of Xerces-C++ 3.1 is better after that it starts deteriorating. With 15Mb xml file 3.1 sample takes almost 30% more time than with 2.4 same sample.

Please let me know whether this is the right method to measure performance or not. If no then how can we measure that. One more question is Why such performance degradation?

Thanks in advance.

Best Regards,
Rinil

CONFIDENTIALITY NOTICE: This message and any attachments or files transmitted with it (collectively, the "Message") are intended only for the addressee and may contain information that is privileged, proprietary and/or prohibited from disclosure by law or contract. If you are not the intended recipient: (a) please do not read, copy or retransmit the Message; (b) permanently delete and/or destroy all electronic and hard copies of the Message; (c) notify us by return email; and (d) you are hereby notified that any dissemination, distribution or copying of the Message is strictly prohibited.
Baxi, Rinil Rushabh
2013-06-14 05:22:17 UTC
Permalink
Hi Dan,

I have checked with both the parsers SAX and DOM and almost same result I got.

Best Regards,
Rinil

From: Huantes, Dan F (TASC) [mailto:***@tasc.com]
Sent: Thursday, June 13, 2013 6:20 PM
To: c-***@xerces.apache.org
Subject: RE: Regarding Xercesc++ performance

Nice work.

I'm curious as to whether your performance testing is DOM based, SAX based, or both.

I ask because my anecdotal experience is that files exceeding 1MB experience large performance hits due to the inherent nature of the DOM model. Under these scenarios, I have used SAX because it's several orders of magnitude faster (i.e. seconds vs minutes). We used 2.8 before but never thought to compare the difference in performance between different versions. You may be on to something. Thanks.

Dan


From: Baxi, Rinil Rushabh [mailto:***@hp.com]
Sent: Thursday, June 13, 2013 4:07 AM
To: c-***@xerces.apache.org<mailto:c-***@xerces.apache.org>
Subject: Regarding Xercesc++ performance

Hi All,

I have 2 Xerces-C++ libraries available on my platform (2.4 and 3.1). Both are built without threads. I am trying to compare performance of both of them. To compare performance I am using different sized xml files to parse using the samples (1kb, 65kb, 256Kb, 1Mb, 2Mb, 5Mb and 15Mb). I have put each sample in a script and run the same sample 1000 times to compare the parsing time.

We observed that till 1Mb xml file size performance of Xerces-C++ 3.1 is better after that it starts deteriorating. With 15Mb xml file 3.1 sample takes almost 30% more time than with 2.4 same sample.

Please let me know whether this is the right method to measure performance or not. If no then how can we measure that. One more question is Why such performance degradation?

Thanks in advance.

Best Regards,
Rinil

CONFIDENTIALITY NOTICE: This message and any attachments or files transmitted with it (collectively, the "Message") are intended only for the addressee and may contain information that is privileged, proprietary and/or prohibited from disclosure by law or contract. If you are not the intended recipient: (a) please do not read, copy or retransmit the Message; (b) permanently delete and/or destroy all electronic and hard copies of the Message; (c) notify us by return email; and (d) you are hereby notified that any dissemination, distribution or copying of the Message is strictly prohibited.
Rob Cameron
2013-06-14 13:57:16 UTC
Permalink
Hi, Rinil.

What is your goal? If you are considering choosing Xerces 2.4 vs 3.1, here are
some other things to think about.
(a) Xerces 3.1 has better support for later XML standards
(b) Xerces 3.1 has bug fixes over 2.4
(c) Xerces 3.1 has support for 64-bit architectures
(d) Any future developments and improvements will likely only be made to the
Xerces 3.1 line.

If performance is critical, you may want to consider icXML. This is
a highly accelerated version of Xerces 3.1.1 that we are building
based on the systematic incorporation of parallel bit stream technology
in the underlying engine. icXML substantially speeds up both SAX-based
and DOM-based parsing.

We will be presenting our work with icXML at Balisage 2013 in Montreal
this August.

Rob Cameron
CTO, International Characters, Inc.
Post by Baxi, Rinil Rushabh
Hi Dan,
I have checked with both the parsers SAX and DOM and almost same result I got.
Best Regards,
Rinil
Sent: Thursday, June 13, 2013 6:20 PM
Subject: RE: Regarding Xercesc++ performance
Nice work.
I’m curious as to whether your performance testing is DOM based, SAX based,
or both.
I ask because my anecdotal experience is that files exceeding 1MB experience
large performance hits due to the inherent nature of the DOM model. Under
these scenarios, I have used SAX because it’s several orders of magnitude
faster (i.e. seconds vs minutes). We used 2.8 before but never thought to
compare the difference in performance between different versions. You may
be on to something. Thanks.
Dan
Sent: Thursday, June 13, 2013 4:07 AM
Subject: Regarding Xercesc++ performance
Hi All,
I have 2 Xerces-C++ libraries available on my platform (2.4 and 3.1). Both
are built without threads. I am trying to compare performance of both of
them. To compare performance I am using different sized xml files to parse
using the samples (1kb, 65kb, 256Kb, 1Mb, 2Mb, 5Mb and 15Mb). I have put
each sample in a script and run the same sample 1000 times to compare the
parsing time.
We observed that till 1Mb xml file size performance of Xerces-C++ 3.1 is
better after that it starts deteriorating. With 15Mb xml file 3.1 sample
takes almost 30% more time than with 2.4 same sample.
Please let me know whether this is the right method to measure performance
or not. If no then how can we measure that. One more question is Why such
performance degradation?
Thanks in advance.
Best Regards,
Rinil
CONFIDENTIALITY NOTICE: This message and any attachments or files
transmitted with it (collectively, the "Message") are intended only for the
addressee and may contain information that is privileged, proprietary and/or
prohibited from disclosure by law or contract. If you are not the intended
recipient: (a) please do not read, copy or retransmit the Message; (b)
permanently delete and/or destroy all electronic and hard copies of the
Message; (c) notify us by return email; and (d) you are hereby notified that
any dissemination, distribution or copying of the Message is strictly
prohibited.
Boris Kolpackov
2013-06-14 07:22:19 UTC
Permalink
Hi Rinil,
Post by Baxi, Rinil Rushabh
I have 2 Xerces-C++ libraries available on my platform (2.4 and 3.1).
Both are built without threads. I am trying to compare performance
of both of them. To compare performance I am using different sized
xml files to parse using the samples (1kb, 65kb, 256Kb, 1Mb, 2Mb,
5Mb and 15Mb). I have put each sample in a script and run the same
sample 1000 times to compare the parsing time.
Hm, I wouldn't do it like that. I would make the test itself perform
1000 iterations and also include a few warm up iterations. If you are
interested, CodeSynthesis XSD[1], which is based on Xerces-C++, includes
'performance' examples that show how to do this. They also show how
to configure Xerces-C++ parsers for optimal performance (things like
schema preloading, etc). The one for DOM is in examples/cxx/tree/ and
the one for SAX2 -- examples/cxx/parser/.
Post by Baxi, Rinil Rushabh
We observed that till 1Mb xml file size performance of Xerces-C++ 3.1
is better after that it starts deteriorating.
We definitely tested the performance difference between 2 and 3-series
and I am pretty sure Xerces-C++ 3 did consistently better.

[1] http://www.codesynthesis.com/products/xsd

Boris
--
Boris Kolpackov, Code Synthesis http://codesynthesis.com/~boris/blog
Compiler-based ORM system for C++ http://codesynthesis.com/products/odb
Open-source XML data binding for C++ http://codesynthesis.com/products/xsd
XML data binding for embedded systems http://codesynthesis.com/products/xsde
Loading...