November 11, 2014 · javacsript

How to evaluate Xpath when DOCTYPE present in XML declaration

At the end of the article, the complete source code, compiled in one place, is available.

I had problem when loading and evaluating Xml document with declaration. Parsing breaks when you try to parse Xml document with a declaration on top.

The first idea was to remove the declaration from Xml until we finish manipulating DOM Nodes.
You can just preprocess your xml as following way:

var rexCleanDoctype = /<!DOCTYPE.*?>/gm;  
content = content.replace(rexCleanDoctype, "");  
XmlDocument.loadXML(content);  

Now you can evaluate XPath expression and everything works like a charm.

Node = XmlDocument.selectSingleNode("//docroot/dir");  

This is not a very clean and proper solution. So let's get down to business and see what is happening inside. The first thing to remember is: "Always check for parsing errors"! They can give you a clue of what is happening. The following function will report whether there have been parsing errors.

 function printParseError(xmlDoc) {
     if (xmlDoc.parseError.errorCode != 0) {
         var myErr = xmlDoc.parseError;
         WScript.Echo("Xml parse error: " + myErr.reason);
     }
 }

Now when we try to load Xml file with declaration, we get the following error message:

 Xml parse error: DTD is prohibited

What to do now?

After few more hours of reading documentation I have found out were the problem lays.

The problem occurs when ProhibitDTD property is set to true. Microsoft changed default version of ProhibitDTD property to true in MSXML 6.0.

Earlier versions of MSXML(3.0, 4.0 and 5.0) had this property set to false by default. This can lead to various security implications, such as denial of service attack. There is great article on MSDN Magazine site on this subject XML Denial of Service Attacks and Defenses.

If you want to read more about security issues regarding ProhibitDTD property, fine reading material is found on the two following urls: ProhibitDTD Property, MSXML Security Overview.

It is essential for you to set ProhibitDTD property to false.

 XmlDocument.setProperty('ProhibitDTD', false);

There are two more important properties - resolveExternals and validateOnParse.

If you want to validate against DTD then set XmlDocument properties like this

 XmlDocument.setProperty('ProhibitDTD', false);
 XmlDocument.resolveExternals = true;
 XmlDocument.validateOnParse = true;

Otherwise, if you just want to parse well formed Xml and skip validation ignoring declaration

 XmlDocument.setProperty('ProhibitDTD', false);
 XmlDocument.resolveExternals = false;
 XmlDocument.validateOnParse = false;

There is one more case. If resolveExternals is accidently set to false and validateOnParse to true following error occurs:

 The element docroot is used but not declared in the DTD/Schema

Complete source code listing

 //@author Stankovic Vlada [svlada@gmail.com]
 var fs = new ActiveXObject("Scripting.FileSystemObject");
 var inputFilePath = "c:\\js\\test.xml";
 var stream = fs.openTextFile(inputFilePath, 1);
 var content = stream.readAll();
 stream.close();
 // var rexCleanDoctype = /<!DOCTYPE.*?>/gm;
 // content = content.replace(rexCleanDoctype, "");
 var XmlDocument;
 try {
     XmlDocument = new ActiveXObject("msxml2.DOMDocument.6.0");
     XmlDocument.async = false;
     // Allow DTD
     XmlDocument.setProperty('ProhibitDTD', false);
     XmlDocument.resolveExternals = false;
     XmlDocument.validateOnParse = false;
 } catch(e) {
     WScript.echo("Error while creating DOM Object: " + e.description);
 }
 // Load an XML file into the DOM instance
 try {
     // To load from file path call function XmlDocument.load(inputFilePath);
     XmlDocument.loadXML(content); 
     printParseError(XmlDocument);
 } catch(e) {
     WScript.echo("Error while loading XML from String: " 
         + e.description);
 }
 var Node;
 try {
     Node = XmlDocument.selectSingleNode("//docroot/dir"); 
     printParseError(XmlDocument);
 } catch(e) {
     WScript.echo("Error resolving Node value from XPath: " 
         + e.description);
 }
 if (Node != null) {
     WScript.echo(Node.text);
 }

 function printParseError(xmlDoc) {
     if (xmlDoc.parseError.errorCode != 0) {
         var myErr = xmlDoc.parseError;
         WScript.Echo("Xml parse error: " + myErr.reason);
     }
 }

test.xml

 <?xml version="1.0" encoding="UTF-8"?>
 <!DOCTYPE docroot SYSTEM "test.dtd" >
 <docroot>
     <dir>/springapp/index.jsp</dir>
     <url>link</url>
 </docroot>

test.dtd

 <!ELEMENT docroot (dir,url)>
 <!ELEMENT dir (#PCDATA)>
 <!ELEMENT url (#PCDATA)>

Useful references:

  1. XmlDocument.validateOnParse
  2. XmlDocument.resolveExternals
  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket
Comments powered by Disqus