Processing XML with php5 and DOM extension

Tuesday, 17 June 2008


Depending on what you are using php for, parsing XML documents may not be something you do regularly, but it's something that's quite useful to know.

The following is a great article which is a good starting point for processing XML documents using both simple XML and the DOM extension.

http://devzone.zend.com/node/view/id/1713


The basic idea is as follows:
  • Load the document
  • Grab whatever elements your processing i.e Books or Shares etc.
  • Loop through the elements and extract whatever info you want from them

For loading in our document you can provide either a local file or a URL to an XML file.


$dom = new DOMDocument('1.0', 'UTF-8');

$doc = "http://www.example.com/test.xml";

$dom->load($doc);


Once our document had been loaded into the $doc variable you can then grab the elements you are after.

$books = $dom->getElementsByTagName("book");

In this case we are grabbing all the book elements in the document. Our XML document may look something like this:


<library>
<book>
<name>Red Riding Hood</name>
</book>
<book>
<name>3 Little Pigs</name>
</book>
</library>


Its important to note that $books is not an array of elements but a special grouping of nodes called a DomNodeList. If you want to get elements from the list you cannot just access them like this:


echo $books[0]['title'];


Although we can still access individual elements by calling $books->item(0) but again this returns a particular node element and not an array.

Our next step is looping through our book elements and getting the required information we want.



$data = array();

foreach ($books as $book) {
foreach ($book->childNodes as $info) {
$info_array = array();
if ($info->nodeType == 1) {
$info_array[$info->nodeName]['name'] = $info->nodeName;
$info_array[$info->nodeName]['value'] = $info->nodeValue;
$data[] = $info_array;
}
}
}

We first loop through each of our books, then through each books child nodes. If a child node is type 1 which is an element node we want to extract the information from that node. The full list of node types can be found here http://au2.php.net/manual/en/domxml.constants.php.
We can then just grab the child node's name and value.

Hope that gives you a basic idea of how it's done.

TR
blog comments powered by Disqus