PHP
and XML
Russell J.T. Dyer
XML is very useful in standardizing data that's shared between
diverse systems. Most databases and programs can export and import
plain text and that's all XML is. So if you've been asked to interface
data from your organization's system with the system of another
organization, consider using XML for the exchange. It's an agreeable
medium.
In this installment of my series on PHP, I will explore a basic
script (books-xml.php) that retrieves data on books from
a fictitious library database stored in MySQL and generates XML
tags in which to wrap the data. I'll follow this with an explanation
of another script (books-xml-extract.php) that extracts data
from an XML source, from the first script.
Retrieving Data from MySQL
The first task of the XML generation script is to get the data
from MySQL. To do that, the script will begin like this:
<?php
require_once('DB.php');
$user = 'russell';
$pw = 'dyer';
$host = '127.0.0.1';
$db = 'library';
$dbh = DB::connect("mysql://$user:$pw@$host/$db");
if(DB::iserror($dbh)) {
die($dbh->getMessage());
}
$sql = "SELECT book_id, title, country, author_name
FROM books, authors
WHERE books.author_id = authors.author_id
LIMIT 2";
$q = $dbh->query($sql);
if(DB::iserror($q)) {
die($q->getMessage());
}
After announcing the start of a PHP script, PHP is told to call in
the PEAR package DB.php using the require_once() function.
If you don't have DB.php installed on your server, you can obtain
a copy from:
http://pear.php.net
The following stanza creates some variables that contain the login
information for connecting to MySQL. The next couple of lines of code
make the connection and test for success. After that, an SQL statement
is put together, and the database is queried with it. This was a rather
quick run through the code, but in last month's issue of Sys Admin
magazine I provided more details on how to interface PHP with MySQL
(http://www.samag.com/documents/s=9238/sam0408c/). In this
article, I'm mainly focusing on PHP with XML.
Generating XML
There are many ways to generate XML code with PHP. After all,
XML is just plain text. You could simply print out the tags with
a series of loop statements, for instance, but that could be tedious.
Fortunately, there's a much more professional method with PHP that
makes use of DOM (Document Object Model). It requires version 4.3
or higher of PHP and libxml. A recent version of PHP can be obtained
from:
http://www.php.net
You can obtain a current version of libxml here:
http://www.xmlsoft.org
Generating XML using DOM is fairly easy. One reason is because it's
an object-oriented method. You just start off with a document root,
a main object, and then create children of it. With each child you
can add attributes. Objects are just piled up until the end, and then
they're all dumped into a tidy XML document:
$dom = domxml_new_doc('1.0');
$root = $dom->create_element('books');
$books = $dom->append_child($root);
These lines of code follow the MySQL excerpt from earlier. The first
line generates an XML header. Incidentally, the names of the variables
(e.g., $dom) can be anything. The next line creates an object,
the root element of the new document. All other elements after this
will be enclosed in this root tag. The last line above is necessary
to append the element created. One creates; the other appends. Before
proceeding, let's look at the tenative results of this bit of a script:
<?xml version="1.0"?>
<books>
...
</books>
It's not much, but it's a start. The ellipses that I've manually inserted
are where the data for each book will go. To work through the data
from MySQL, a while statement will be deployed.
while($q->fetchInto($row)) {
list($book_id, $book_title, $country, $author_name) = $row;
$book = $books->append_child($dom->create_element('book'));
$title = $book->append_child($dom->create_element('title'));
$title_text = $dom->create_text_node("$book_title");
$title->append_child($title_text);
$title->set_attribute('book_id',"$book_id");
$author = $book->append_child($dom->create_element('author'));
$author->append_child($dom->create_text_node("$author_name"));
$author->set_attribute('country',"$country");
}
Using the object $q that was created earlier and that contains
the results of the query(), the function fetchInto()
is used to extract one row of data at a time, and each row is placed
into a temporary array ($row). The first line of the statement
block for the while statement parses the array $row
using list().
Next, we create and append the XML element <book>,
which will contain the data for a book. Notice that this time the
two processes are combined into one statement. Either way is fine.
Now PHP is ready for the data. First, it generates an XML tag for
each book's title. Second, it sets an attribute using the set_attribute()
function to store the book's identification number. Notice that
the variable $book_id is enclosed in double-quotes. This
is necessary to interpolate the data. Single-quotes won't work.
To place the title of the book within the opening and closing
tags of <title>, the function create_text_node()
is used. The same set of functions is used to generate tags for
the author, but with a couple of additionally steps consolidated.
Once all of the rows of data have been fetched and the DOM objects
created, the while statement will terminate. Using a simple
echo and the dump_menu() function, the stored XML
objects are dumped:
echo $dom->dump_mem(true);
$dbh->disconnect();
?>
For good form, PHP is instructed to disconnect from MySQL. This concludes
the script -- without a bit of HTML. The results of the script will
look something like this:
<?xml version="1.0"?>
<books>
<book>
<title book_id="10">If on a winter's night a traveler</title>
<author country="Italy">Italo Calvino</author>
</book>
<book>
<title book_id="12">Marcovaldo</title>
<author country="Italy">Italo Calvino</author>
</book>
</books>
The nice layout shown here is how DOM organized it, including the
indents. If you experiment with this script, open it in a Web browser,
and then view the source. Of course, you'll have to make adjustments
for your database.
Extracting Data from XML
The other goal of this article is to develop a script that will
extract data from an XML document. To do this easily, DOM will be
enlisted again. For this script, it will be assumed that the script
will be run from the command line, either manually or through cron:
#!/usr/bin/php -q
<?
... // Connect to MySQL
$dom = domxml_open_file('http://127.0.0.1/php-xml.php');
$books = $dom->document_element();
The first line provides the path and all for PHP because it's going
to be run without the direct assistance of the Apache Web server.
The next set of lines, substituted with ellipses, are the same as
in the last script for connecting to MySQL. This is in preparation
to save data extracted from the XML document to MySQL. The next line
opens the XML document by way of the script that we just reviewed.
Notice that the file path includes an IP address. For testing,
I'm just using the localhost address, but this could just as easily
be an Internet address or a domain name. Incidentally, PHP is accessing
a dynamically created XML document here. It doesn't have to be a
flat text file for XML to be extracted. After making the connection
to the XML document, PHP then captures the document root and all
of its child elements by using the document_element() function.
With the data in hand, it's just a matter of processing each node
to extract the data. There are several ways that this may be done.
Because I know basically what the data looks like and the order
in which it will be obtained, I've taken a more straightforward
method using a foreach loop with a set of if statements
to process the data differently depending on the type of XML node
and on which node:
foreach ($books->child_nodes() as $book) {
if($book->has_child_nodes()) {
foreach($book->child_nodes() as $child_node) {
if($child_node->node_type() == XML_ELEMENT_NODE) {
$child_node_name = $child_node->node_name();
$data_array[] = $child_node->get_content();
}
if($child_node->has_attributes()) {
if($child_node_name == 'title') {
$attribute_name =
$child_node->get_attribute_node('book_id');
if($attribute_name) {
$data_array[] = $attribute_name->value();
}
}
}
if($child_node->has_attributes()) {
if($child_node_name == 'author') {
$attribute_name =
$child_node->get_attribute_node('country');
if($attribute_name) {
$data_array[] = $attribute_name->value();
}
}
}
}
The initial node of the books node is going to be a book
node. The foreach loop extracts one book node at a time
and places the object in the variable $book. The first statement
of the foreach block tests whether the node has children, nodes
under it, using the has_child_nodes() function.
After that, for each child node found, it will check to see what
kind of node it's examining. There are couple ways of doing this.
One is the first method shown -- PHP tests whether the node is an
element. The title and author nodes are in this category,
so the first if statement will process those nodes and add
them to an array that PHP will use to create an SQL statement later.
Another simple method of testing a node is to see whether its
name equals a certain value. For attributes, PHP is using that method.
It first tests whether the child node has attributes and then determines
the child node's name. That name will dictate the value for the
argument of the function get_attribute_node() that PHP will
use to get the particular attribute node. From there, the function
value() is used to extract the value of the attribute. Once
the nodes for a book have been processed, the data can be saved
to MySQL like so:
$title = mysql_escape_string($data_array[0]);
$sql = "INSERT INTO xml_test
VALUES('$data_array[1]', '$title',
'$data_array[2]', '$data_array[3]')";
$q = $dbh->query($sql);
if(DB::iserror($q)) {
die($q->getMessage());
}
unset($data_array);
}
}
$dbh->disconnect();
?>
Because one of the book titles has an apostrophe in the name, the
first line here uses mysql_escape_string() to escape the apostrophes
with a back-slash contained in the element of the $data_array
array that holds the book's title. The rest of the elements are added
to the SQL statement in the next line without being pre-processed.
Next, PHP sends the SQL statement to MySQL using the database
handle set up at the beginning of the script and with the function
query(). Provided no errors are generated, the array is dropped
using unset() so that more data is not appended to the end
of the array during the next loop through the foreach statement,
causing the first book to be re-inserted into the database.
Conclusion
There are many ways in which XML documents can be generated with
PHP, and there are probably just as many ways to extract data from
an XML document. There are several XML functions in PHP and more
are added each year. Getting what you want out of PHP just requires
a little bit of experimentation.
Russell Dyer is a Perl programmer, a MySQL developer, and a
Web designer living and working on a consulting basis in New Orleans.
He is also an adjunct instructor at a technical college where he
teaches Linux and other open source software. He can be reached
at russell@dyerhouse.com. |