PHP and XML

Russell J.T. Dyer

XML is very useful in standardizing data that's shared between diverse systems. Most databases and programs can export and import plain text and that's all XML is. So if you've been asked to interface data from your organization's system with the system of another organization, consider using XML for the exchange. It's an agreeable medium.

In this installment of my series on PHP, I will explore a basic script (books-xml.php) that retrieves data on books from a fictitious library database stored in MySQL and generates XML tags in which to wrap the data. I'll follow this with an explanation of another script (books-xml-extract.php) that extracts data from an XML source, from the first script.

Retrieving Data from MySQL

The first task of the XML generation script is to get the data from MySQL. To do that, the script will begin like this:

<?php

   require_once('DB.php');

   $user = 'russell';
   $pw = 'dyer';
   $host = '127.0.0.1';
   $db = 'library';

   $dbh = DB::connect("mysql://$user:$pw@$host/$db");

   if(DB::iserror($dbh)) {
      die($dbh->getMessage());
   }

   $sql = "SELECT book_id, title, country, author_name
           FROM books, authors
           WHERE books.author_id = authors.author_id
           LIMIT 2";
   $q = $dbh->query($sql);

   if(DB::iserror($q)) {
      die($q->getMessage());
   }

After announcing the start of a PHP script, PHP is told to call in the PEAR package DB.php using the require_once() function. If you don't have DB.php installed on your server, you can obtain a copy from:

http://pear.php.net

The following stanza creates some variables that contain the login information for connecting to MySQL. The next couple of lines of code make the connection and test for success. After that, an SQL statement is put together, and the database is queried with it. This was a rather quick run through the code, but in last month's issue of Sys Admin magazine I provided more details on how to interface PHP with MySQL (http://www.samag.com/documents/s=9238/sam0408c/). In this article, I'm mainly focusing on PHP with XML.

Generating XML

There are many ways to generate XML code with PHP. After all, XML is just plain text. You could simply print out the tags with a series of loop statements, for instance, but that could be tedious. Fortunately, there's a much more professional method with PHP that makes use of DOM (Document Object Model). It requires version 4.3 or higher of PHP and libxml. A recent version of PHP can be obtained from:

http://www.php.net

You can obtain a current version of libxml here:

http://www.xmlsoft.org

Generating XML using DOM is fairly easy. One reason is because it's an object-oriented method. You just start off with a document root, a main object, and then create children of it. With each child you can add attributes. Objects are just piled up until the end, and then they're all dumped into a tidy XML document:

$dom = domxml_new_doc('1.0');

$root = $dom->create_element('books');
$books = $dom->append_child($root);

These lines of code follow the MySQL excerpt from earlier. The first line generates an XML header. Incidentally, the names of the variables (e.g., $dom) can be anything. The next line creates an object, the root element of the new document. All other elements after this will be enclosed in this root tag. The last line above is necessary to append the element created. One creates; the other appends. Before proceeding, let's look at the tenative results of this bit of a script:

<?xml version="1.0"?>
<books>
...
</books>

It's not much, but it's a start. The ellipses that I've manually inserted are where the data for each book will go. To work through the data from MySQL, a while statement will be deployed.

while($q->fetchInto($row)) {
   list($book_id, $book_title, $country, $author_name) = $row;

   $book = $books->append_child($dom->create_element('book'));

   $title = $book->append_child($dom->create_element('title'));
   $title_text = $dom->create_text_node("$book_title");
   $title->append_child($title_text);
   $title->set_attribute('book_id',"$book_id");

   $author = $book->append_child($dom->create_element('author'));
   $author->append_child($dom->create_text_node("$author_name"));
   $author->set_attribute('country',"$country");
}

Using the object $q that was created earlier and that contains the results of the query(), the function fetchInto() is used to extract one row of data at a time, and each row is placed into a temporary array ($row). The first line of the statement block for the while statement parses the array $row using list().

Next, we create and append the XML element <book>, which will contain the data for a book. Notice that this time the two processes are combined into one statement. Either way is fine. Now PHP is ready for the data. First, it generates an XML tag for each book's title. Second, it sets an attribute using the set_attribute() function to store the book's identification number. Notice that the variable $book_id is enclosed in double-quotes. This is necessary to interpolate the data. Single-quotes won't work.

To place the title of the book within the opening and closing tags of <title>, the function create_text_node() is used. The same set of functions is used to generate tags for the author, but with a couple of additionally steps consolidated. Once all of the rows of data have been fetched and the DOM objects created, the while statement will terminate. Using a simple echo and the dump_menu() function, the stored XML objects are dumped:

echo $dom->dump_mem(true);

$dbh->disconnect();

?>

For good form, PHP is instructed to disconnect from MySQL. This concludes the script -- without a bit of HTML. The results of the script will look something like this:

<?xml version="1.0"?>
<books>
  <book>
    <title book_id="10">If on a winter's night a traveler</title>
    <author country="Italy">Italo Calvino</author>
  </book>
  <book>
    <title book_id="12">Marcovaldo</title>
    <author country="Italy">Italo Calvino</author>
  </book>
</books>

The nice layout shown here is how DOM organized it, including the indents. If you experiment with this script, open it in a Web browser, and then view the source. Of course, you'll have to make adjustments for your database.

Extracting Data from XML

The other goal of this article is to develop a script that will extract data from an XML document. To do this easily, DOM will be enlisted again. For this script, it will be assumed that the script will be run from the command line, either manually or through cron:

#!/usr/bin/php -q

<?

... // Connect to MySQL

$dom = domxml_open_file('http://127.0.0.1/php-xml.php');

$books = $dom->document_element();

The first line provides the path and all for PHP because it's going to be run without the direct assistance of the Apache Web server. The next set of lines, substituted with ellipses, are the same as in the last script for connecting to MySQL. This is in preparation to save data extracted from the XML document to MySQL. The next line opens the XML document by way of the script that we just reviewed.

Notice that the file path includes an IP address. For testing, I'm just using the localhost address, but this could just as easily be an Internet address or a domain name. Incidentally, PHP is accessing a dynamically created XML document here. It doesn't have to be a flat text file for XML to be extracted. After making the connection to the XML document, PHP then captures the document root and all of its child elements by using the document_element() function.

With the data in hand, it's just a matter of processing each node to extract the data. There are several ways that this may be done. Because I know basically what the data looks like and the order in which it will be obtained, I've taken a more straightforward method using a foreach loop with a set of if statements to process the data differently depending on the type of XML node and on which node:

foreach ($books->child_nodes() as $book) {

   if($book->has_child_nodes()) {

      foreach($book->child_nodes() as $child_node) {

         if($child_node->node_type()  ==   XML_ELEMENT_NODE) {
            $child_node_name = $child_node->node_name();
            $data_array[] = $child_node->get_content();
         }

         if($child_node->has_attributes()) {
            if($child_node_name == 'title') {
               $attribute_name =
               $child_node->get_attribute_node('book_id');
               if($attribute_name) {
                  $data_array[] = $attribute_name->value();
               }
            }
         }

         if($child_node->has_attributes()) {
            if($child_node_name == 'author') {
               $attribute_name =
               $child_node->get_attribute_node('country');
               if($attribute_name) {
                  $data_array[] = $attribute_name->value();
               }
            }
         }
      }

The initial node of the books node is going to be a book node. The foreach loop extracts one book node at a time and places the object in the variable $book. The first statement of the foreach block tests whether the node has children, nodes under it, using the has_child_nodes() function.

After that, for each child node found, it will check to see what kind of node it's examining. There are couple ways of doing this. One is the first method shown -- PHP tests whether the node is an element. The title and author nodes are in this category, so the first if statement will process those nodes and add them to an array that PHP will use to create an SQL statement later.

Another simple method of testing a node is to see whether its name equals a certain value. For attributes, PHP is using that method. It first tests whether the child node has attributes and then determines the child node's name. That name will dictate the value for the argument of the function get_attribute_node() that PHP will use to get the particular attribute node. From there, the function value() is used to extract the value of the attribute. Once the nodes for a book have been processed, the data can be saved to MySQL like so:

      $title = mysql_escape_string($data_array[0]);

      $sql = "INSERT INTO xml_test
              VALUES('$data_array[1]', '$title',
              '$data_array[2]', '$data_array[3]')";
      $q = $dbh->query($sql);

      if(DB::iserror($q)) {
         die($q->getMessage());
      }

      unset($data_array);
   }
}
   $dbh->disconnect();

?>

Because one of the book titles has an apostrophe in the name, the first line here uses mysql_escape_string() to escape the apostrophes with a back-slash contained in the element of the $data_array array that holds the book's title. The rest of the elements are added to the SQL statement in the next line without being pre-processed.

Next, PHP sends the SQL statement to MySQL using the database handle set up at the beginning of the script and with the function query(). Provided no errors are generated, the array is dropped using unset() so that more data is not appended to the end of the array during the next loop through the foreach statement, causing the first book to be re-inserted into the database.

Conclusion

There are many ways in which XML documents can be generated with PHP, and there are probably just as many ways to extract data from an XML document. There are several XML functions in PHP and more are added each year. Getting what you want out of PHP just requires a little bit of experimentation.

Russell Dyer is a Perl programmer, a MySQL developer, and a Web designer living and working on a consulting basis in New Orleans. He is also an adjunct instructor at a technical college where he teaches Linux and other open source software. He can be reached at russell@dyerhouse.com.