Cover V13, i10

Article

oct2004.tar

Using PHP to Back Up MySQL

Russell J.T. Dyer

The importance of backing up data can't be stressed enough, and this also applies to backing up MySQL databases. One of the keys to making regular backups is to automate the process. You could buy a backup program that will copy the data files from a filesystem to archive media. However, for more dependable backups, MySQL would have to be shut down before a backup program were run, which could make MySQL unavailable to users for quite a while. A better alternative is to write a PHP script that will use a utility like mysqldump.

In this article, I will work through the development of a few PHP scripts used for backing up MySQL -- one that will accumulate statistics on the usage of MySQL tables to determine which ones should be backed up daily, another that will make weekly backups of all of the databases on a server, and a third script to perform daily backups based on the statistics from the first.

Table Usage Statistics

When developing a backup script, it's important to decide what you want to back up and how frequently. For the purposes of this article, I will assume that only tables in which the data is changed a few times a day are to be backed up. To determine this information, I will keep a log in MySQL of the usage of tables. To set up a statistics log, I'll create a new database and a table with the mysql client like so:

CREATE DATABASE mysql_stats;
USE mysql_stats;

CREATE TABLE table_stats
  (stat_id INT AUTO_INCREMENT PRIMARY KEY,
  stat_time DATETIME, db_name VARCHAR(50),
  table_name VARCHAR(50),
  file_size INT, file_modify DATE);
The first two lines above create the database mysql_stats and switch the client session to it. Next the table table_stats is created. In this table, I will store statistics on MySQL tables gathered by the script I'm about to review, mysql_stats.php. This script will be run by cron each hour to accumulate an adequate amount of statistics:

#!/usr/bin/php -q
<?
mysql_connect('localhost', 'root', 'password');

$databases = mysql_list_dbs();
$db_count = mysql_num_rows($databases);
$db_ignore = array('mysql','test','mysql_stat');
$today = date('Y-m-d');
$hour = date('h');
In this script, the first line signifies to the shell that what follows is a PHP script. This is necessary to execute the script from the command line instead of through Apache and the Web browser. On the third line, PHP connects to MySQL, providing the host, the user name, and the password. Incidentally, for the PHP scripts in this article, I'm using several MySQL functions that come with PHP and don't require a PEAR library to be loaded. The next line of code above uses the function mysql_list_dbs() to retrieve a list of databases in MySQL. An identifier for the results set is stored in the variable $databases. The function mysql_num_rows() counts the number of rows in $datbases so that PHP knows how many databases there are in the results set. It will have one database name per row, the value of which will be stored in $db_count. With this count, PHP can use a for statement to loop through $databases. To be able to exclude certain databases from processing, on the next line above I've set up an array containing a list of databases to ignore.

for($cnt = 0; $cnt < $db_count; $cnt++) {
  $db = mysql_db_name($databases, $cnt);
  if(in_array($db, $db_ignore)){ continue; }

  $tables = mysql_list_tables($db);
  $tb_count = mysql_num_rows($tables);
Here we have the for statement mentioned before. Using the function mysql_db_name(), the name of one database from $databases is extracted. Which one is determined by the second argument of the function. The counter $cnt, which starts with a value of 0, is incremented ($cnt++) with each pass through the for statement, not to exceed the value of $db_count. The second line of the statement block uses the in_array() function to see whether the database name is contained in the array $db_ignore. If it is, then the remainder of the for statement block is skipped and PHP makes another loop through to process the next database name. Otherwise, using the mysql_list_tables() function, PHP retrieves a list of tables from MySQL associated with the database named as the function's argument. To prepare for an embedded for statement that follows, the results set that contains the list of tables for the database is counted with mysql_num_rows():

  for($c = 0; $c < $tb_count; $c++) {
    $table = mysql_tablename($tables, $c);
    $file_name = "/var/mysql/$db/$table.MYD";

    if(file_exists($file_name)) {
     $file_size = filesize($file_name);
     $file_modify = date('Y-m-d', filemtime($file_name));
     $file_hour = date('h', filemtime($file_name));

    if($file_modify == $today & $file_hour == $hour) {
       $sql_stmnt = "INSERT INTO table_stats
                     (db_name, table_name, stat_time,
                     file_size, file_modify)
                     VALUES('$db', '$table', NOW(),
                     '$file_size', '$file_modify')";

       $results = mysql_db_query('mysql_stats', $sql_stmnt)
                or die(mysql_error());
     }
    }
  }
}

mysql_close();
?>
A different variable ($c) is used so PHP does not confuse the value of the counter of the outer for statement with the counter of the embedded one. The first line of the statement block extracts the table name from $tables and stores it temporarily in $table. The next line sets up a variable that will contain the file path to where the live MySQL tables are stored. This bit of information can be found in the my.cnf file, usually located in the /etc directory on Unix systems. It's contained in the value of datadir. The additional path information comes from the values of the variables determined by the script. MySQL stores the data for a table in a file with the same name as the table, with the extension of .MYD, in a directory with the same name as the database.

Using the function file_exists(), PHP tests whether the data file for the table exists. On the first line of this if statement, PHP gets the table's data file size. It then captures the date it was last modified on the next line. The function filemtime() returns Unix time. To reformat that value to the MySQL date format (i.e., yyyy-mm-dd), the function date() is used with the desired format given. Using the same function, we get the hour to compare to the current hour ($hour).

If the modification date is the same as today's date and hour, then PHP will query MySQL to insert the data with mysql_db_query() in which the database name is given as the first argument and the SQL statement as the second. Since this concludes the script, the connection to MySQL is closed. Incidentally, the trailing closing curly braces close the if statements and the embedded for statement in this code excerpt, as well as the for statement from the previous code excerpt.

A Complete Backup

Before going through the script for selectively backing up tables based on statistics collected by the previous script, I'll look at a PHP script that can be used to back up all of the databases in MySQL. Such a script might be set to run once a week by cron. It will back up each database to a separate tar file using mysqldump. With mysqldump, the backups are performed without shutting down MySQL. In addition to backing up the data, this script will also back up the configuration file (my.cnf) and MySQL itself.

To prepare for this script, I'll first create another table in MySQL. This one will be used to log information on each backup. I'll do this from the mysql client:

CREATE TABLE backup_log
  (log_id INT AUTO_INCREMENT PRIMARY KEY,
  backup_date DATETIME, type ENUM('database','table'),
  cycle ENUM('weekly','daily'),
  tar_file VARCHAR(50), file_path VARCHAR(50));
To generate reports on when backups were made, I've included a column for the date of the backups. To distinguish whether a backup was of a database or a table, I've added an enumerated column called type. Another numerated column indicates the cycle of the backup. Finally, the name of the tar file along with its file path will be saved. I could figure all of this out by just looking at the backup files; however, this is a tidier and more organized method. With this table created, I can now go through the script, mysql_backup_weekly.php:

#!/usr/bin/php -q

<?
mysql_connect('localhost', 'root', 'password');

$databases = mysql_list_dbs();
$db_count = mysql_num_rows($databases);

$dir = '/mnt/backup';
$date_prefix = date('Ymd');
The opening code here is very much like the beginning of the previous script. After starting PHP and connecting to MySQL, the script gets a list of databases. After this, a variable containing the path for the backup directory is given. Next, a variable that contains the numeric date without dashes or slashes is created. This will be used as a prefix for the backup file names so that PHP doesn't overwrite previous backups. Incidentally, by not giving a second argument to the date() function, the current date is used. PHP is now ready to loop through the list of databases and to export the data from each database:

for($cnt = 0; $cnt < $db_count; $cnt++) {
  $db = mysql_db_name($databases, $cnt);

  system("mysqldump -u root -ppassword -h localhost
      -x -e -A $db > $dir/$date_prefix-$db.sql");

  $sql_stmnt = "INSERT INTO backup_log
                (backup_date, type, cycle, tar_file, file_path)
                VALUES(NOW(), 'database', 'weekly',
                '$date_prefix-$db.sql', '$dir')";
  mysql_db_query('mysql_stats', $sql_stmnt);
}
mysql_close();

  system("tar cfz mysql.tar.gz /etc/my.cnf /usr/local/mysql
      /usr/lib/mysql /usr/bin/mysql /usr/sbin/mysqld");
?>
This for statement is much like the two found in the previous script. What's different is the system() function that executes what is contained within the parentheses, and the double-quotes at the system level. For details on mysqldump, see my article on MySQL backups at: http://www.unixreview.com/documents/s=8989/ur0408d/.

Next, an SQL statement is set up to insert information regarding the backup. After closing out the MySQL session, I end the script by creating another tar file containing the MySQL configuration file and all of the various libraries and binaries. Instead of doing this, though, it might be better to just keep a clean copy (from mysql.com) of the installation file that you used to install MySQL. Just be sure to back up my.cnf if you've modified it.

Selective Backups

Which statistics should be kept on databases to determine which tables to back up is debatable and should be determined by the sys admin. As a simple example, I've chosen to determine daily backups based on whether a table's data file is modified at least four times during the day. Based on this policy, I've put together an SQL statement that will check for records where the value of the column file_modify is the same as today's date and where there are more than four records. Let's look at the script, mysql_backup_daily.php, which will be run late each night during a low-traffic time for the fictitious databases:

#!/usr/bin/php -q

<?
mysql_connect('localhost', 'root', 'password');
$dir = '/mnt/backup';

$sql_stmnt = "SELECT db_name, table_name, stat_date
       FROM (
          SELECT db_name, table_name,
          DATE_FORMAT(file_modify, '%Y-%m-%d') AS stat_date,
          COUNT(table_name) AS recs
          FROM table_stats
          WHERE file_modify = CURDATE()
          GROUP BY table_name
          ) AS frequency
       WHERE recs > 4;";

$results = mysql_db_query('mysql_stats', $sql_stmnt);
The opening here is much like the previous scripts. One notable difference is that the SELECT statement involves a sub-query. Sub-queries are available as of version 4.1 of MySQL. See my article on MySQL sub-queries at:

http://www.unixreview.com/documents/s=8969/ur0407e/
The sub-query here is contained within the parentheses and creates a temporary table from which the outer query selects data. Once a results set that contains a list of databases and tables has been generated, it's merely a matter of looping through the data for processing:

while($row = mysql_fetch_object($results)) {
 $db = $row->db_name;
 $table = $row->table_name;
 system("mysqldump -u root -ppassword -h localhost
     -x -e $db $table > $dir/$db.$table.sql");

 $sql_stmnt = "INSERT INTO backup_log
        (backup_date, type, cycle, tar_file, file_path)
        VALUES(NOW(), 'table', 'daily', '$db.$table.sql',
        '$dir')";
 mysql_db_query('mysql_stats', $sql_stmnt);
}
mysql_close();
?>
Using the function mysql_fetch_object(), each row is extracted as an object. To access the value of each component, the field name of the results set is used as the identifier. With the name of the database and related table, the utility mysqldump backs up the appropriate tables. An entry is then made to the backup_log table. I'm recording that the backup was of a table and a daily one.

Conclusion

To construct a backup system like the one discussed here, you may want to consider the criteria by which you will decide whether a table will be backed up daily. If your databases are small enough and their usage is low, then you may want to perform a complete backup daily regardless of a table's file modification date. The important thing is to back up the data regularly and appropriately.

Russell Dyer is a Perl programmer, a MySQL developer, and a Web designer living and working on a consulting basis in New Orleans. He is also an adjunct instructor at a technical college where he teaches Linux and other open source software. He can be reached at russell@dyerhouse.com.