I'm trying to write a javascript routine to crawl an XML document (of unknown elements) and parse out name/value pairs where

"<nodename>nodevalue</nodename>"

would come out as

"nodename=nodevalue".

What I have works great through the second element, then it stops traversing. I feel like I'm close but I just cant get past this point. Any help would be appreciated.

<html>
<head>
<script>
    function main() {           
        function dom(xml) { 
            if (window.DOMParser) {
                parser=new DOMParser();
                parser.preserveWhiteSpace=false;
                doc=parser.parseFromString(xml,"text/xml");
            } else {
                doc=new ActiveXObject("Microsoft.XMLDOM");
                doc.async=false;
                doc.preserveWhiteSpace=false;
                doc.loadXML(xml); 
            } 
            return crawl(doc);
        } 
        function crawl(node) {
            if (typeof node=='object') {    
                if (node.hasChildNodes()) {
                    var x=node.childNodes;
                    for (i=0;i<x.length;i++) {
                        if (x[i].hasChildNodes()) {
                            alert(x[i].nodeName + '=' + x[i].childNodes[0].nodeValue);
                        }   
                        crawl(x[i]);
                    }
                }   
            }
        }   
        return dom(document.body.innerHTML);    
    }
</script>
</head>
<body onload="main();">
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<AccountQueryBalanceResponse xmlns="https://live.domainbox.net/">
<AccountQueryBalanceResult>
<ResultCode>100</ResultCode>
<ResultMsg>Account Balance Queried Successfully</ResultMsg>
<TxID>8ffc9a39-bf7f-477d-a780-76de9fdd49dd</TxID>
<Balance>81.84</Balance>
<CreditLimit>0.00</CreditLimit>
<FundsHeld>0.00</FundsHeld>
<AvailableBalance>81.84</AvailableBalance>
<CurrencyCode>USD</CurrencyCode>
<AccountType>Prepayment</AccountType>
</AccountQueryBalanceResult>
</AccountQueryBalanceResponse>
</soap:Body>
</soap:Envelope>
</body>
</html>

The double test for hasChildNodes(), at parent and child levels, looks a bit odd.

I would have thought you want the parse out the nodes that don't have children, in which case your loop will be as follows:

for (i=0; i<x.length; i++) {
    if (!x[i].hasChildNodes()) {
        alert(x[i].nodeName + '=' + x[i].childNodes[0].nodeValue);
    }
    crawl(x[i]);
}

Within the loop, you could put crawl(x[i]) in an else clause but it should be OK without, as any node without cildren will simply drop straight through the next recursion of crawl().

The only thing I'm not sure about is whether or not .hasChildNodes() detects text nodes. If it does, then the if (!x[i].hasChildNodes()) test needs to be more rigorous.

Thanks, but that adjustment never drops into the ALERT and finds nothing.

if (!x[i].hasChildNodes()) {

If i remove the hasChildNodes test it fails on "property nodename does not exist"

The current code traverses correctly until the first sibling of the first child with a value, then loops on that node forever.

I finally figured this out. This self contained function will take as input text/xml and recursively convert it to an object of arrays. I used arrays instead of simple properties to accomodate duplicate element nodes.

It will convert this text/xml...

<xml>
    <element>value</element>
    <duplicate>dup1</duplicate>
    <duplicate>dup2</duplicate>
</xml>

Into this object...

{"element":["value"],"duplicate":["dup1","dup2"]}

I hope someone else is able to save some time using this code if they have the same problem. I'll send $10 through PayPal to the first person who can substantually improve this solution!

function parse(xml) { 

    var o={};

    function chknull(str) { return str==undefined || str.replace(/\s/gim,'') === '' || str === null ? true : false; };

    function dom(xml) { 
        if (window.DOMParser) {
            parser=new DOMParser();
            parser.preserveWhiteSpace=false;
            doc=parser.parseFromString(xml,"text/xml");
        } else {
            doc=new ActiveXObject("Microsoft.XMLDOM");
            doc.async=false;
            doc.preserveWhiteSpace=false;
            doc.loadXML(xml); 
        } 
        return doc;
    } 

    function crawl(node) {
        if (node.hasChildNodes) {
            var c=node.firstChild;
            while (c) { 
                if (c.nodeType==1) { 
                    if (c.hasChildNodes) {
                        try {
                            if (!chknull(c.firstChild.nodeValue)) {
                                if (o[c.nodeName]) { 
                                    o[c.nodeName].push(c.firstChild.nodeValue);
                                } else {
                                    o[c.nodeName]=[c.firstChild.nodeValue];
                                }   
                            }   
                        } catch(e) {}    
                    }   
                    crawl(c);
                }
                c=c.nextSibling;
            }   
        }
    } 

    crawl(dom(xml));    
    return o;

}

Would you accept a jQuery solution?

Sure, why not.

OK, here's a jQuery solution in 11 lines.

function parse(xml) {
    var o = {};
    $(xml).find('*').andSelf().contents().filter(function() {
        return this.nodeType == 3 && $.trim(this.nodeValue);
    }).each(function() {
        var p = this.parentNode.nodeName;
        if (!o[p]) o[p] = [];
        o[p].push(this.nodeValue);
    });
    return o;
}

DEMO

You can delete .andSelf() if there's guaranteed to be no text nodes at root level.

I haven't run a performance comparison but would expect your own code to be faster despite its greater length. No need to pay the reward but I would be interested in the results of a performance comparison if you do one.

Impressive! Here is my in no way "scientifical" comparison...

I put both solutions side by side with the same XML doc used in my original post. I did notice your strategy uses NodeType==3 then gets the parent nodeName instead of using nodeType==1 and getting the child nodeValue... I think this strategy is better because I might not have to check hasChildNodes which will reduce the lines-of-code.

I included the JQUERY source directy in the solution to reduce any latency caused by loading the code remotely.

With my solution I get 1 to 2 milliseconds runtime and with your solution I get 3 to 4 milliseconds runtime. Do you get similar results? You can check here...

http://dawbin.com/benchmark.htm

So, JQUERY has a slightly larger footprint and it runs a bit longer. However given the ongoing exponential increase in processor speeds and exponential cost reductions in disk space and bandwidth these are effectively irrelevent metrics (IMO) for this case.

Personally I prefer native JavaScript but I concede that your JQuery solution is more concise and you improved the strategy by using nodeType==3. Nice Job - I will absolutely pay the reward, just let me know where to send it!

Updated crawl function (native JavaSript) based on input from Airshow - thanks.

function crawl(node) {
    var c=node.firstChild;
    while (c) { 
        if (c.nodeType==3) {
            if (c.nodeValue.trim()!='') { 
                if (!o[c.parentNode.nodeName]) { o[c.parentNode.nodeName]=[]; }
                o[c.parentNode.nodeName].push(c.nodeValue);
            }   
        }   
        crawl(c);
        c=c.nextSibling;
    }
} 

Dawbin, results of your benchmark are inconclusive here. Most runs I get 0ms for both sets of code, but with a random sprinkling of higher values up to 45ms. I expect that anything other than 0ms is due to other processes grabbing my processor.

I would guess that your revised crawl() will give best performance of all. If so, then we can both take some credit - co-operative effort.

Will be interesting to see if anyone comes up with better ideas. Could happen.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.