Using
PHP to Back Up MySQL
Russell J.T. Dyer
The importance of backing up data can't be stressed enough,
and this also applies to backing up MySQL databases. One of the
keys to making regular backups is to automate the process. You could
buy a backup program that will copy the data files from a filesystem
to archive media. However, for more dependable backups, MySQL would
have to be shut down before a backup program were run, which could
make MySQL unavailable to users for quite a while. A better alternative
is to write a PHP script that will use a utility like mysqldump.
In this article, I will work through the development of a few
PHP scripts used for backing up MySQL -- one that will accumulate
statistics on the usage of MySQL tables to determine which ones
should be backed up daily, another that will make weekly backups
of all of the databases on a server, and a third script to perform
daily backups based on the statistics from the first.
Table Usage Statistics
When developing a backup script, it's important to decide
what you want to back up and how frequently. For the purposes of
this article, I will assume that only tables in which the data is
changed a few times a day are to be backed up. To determine this
information, I will keep a log in MySQL of the usage of tables.
To set up a statistics log, I'll create a new database and
a table with the mysql client like so:
CREATE DATABASE mysql_stats;
USE mysql_stats;
CREATE TABLE table_stats
(stat_id INT AUTO_INCREMENT PRIMARY KEY,
stat_time DATETIME, db_name VARCHAR(50),
table_name VARCHAR(50),
file_size INT, file_modify DATE);
The first two lines above create the database mysql_stats and
switch the client session to it. Next the table table_stats
is created. In this table, I will store statistics on MySQL tables
gathered by the script I'm about to review, mysql_stats.php.
This script will be run by cron each hour to accumulate an
adequate amount of statistics:
#!/usr/bin/php -q
<?
mysql_connect('localhost', 'root', 'password');
$databases = mysql_list_dbs();
$db_count = mysql_num_rows($databases);
$db_ignore = array('mysql','test','mysql_stat');
$today = date('Y-m-d');
$hour = date('h');
In this script, the first line signifies to the shell that what follows
is a PHP script. This is necessary to execute the script from the
command line instead of through Apache and the Web browser. On the
third line, PHP connects to MySQL, providing the host, the user name,
and the password. Incidentally, for the PHP scripts in this article,
I'm using several MySQL functions that come with PHP and don't
require a PEAR library to be loaded. The next line of code above uses
the function mysql_list_dbs() to retrieve a list of databases
in MySQL. An identifier for the results set is stored in the variable
$databases. The function mysql_num_rows() counts the
number of rows in $datbases so that PHP knows how many databases
there are in the results set. It will have one database name per row,
the value of which will be stored in $db_count. With this count,
PHP can use a for statement to loop through $databases.
To be able to exclude certain databases from processing, on the next
line above I've set up an array containing a list of databases
to ignore.
for($cnt = 0; $cnt < $db_count; $cnt++) {
$db = mysql_db_name($databases, $cnt);
if(in_array($db, $db_ignore)){ continue; }
$tables = mysql_list_tables($db);
$tb_count = mysql_num_rows($tables);
Here we have the for statement mentioned before. Using the
function mysql_db_name(), the name of one database from $databases
is extracted. Which one is determined by the second argument of the
function. The counter $cnt, which starts with a value of 0,
is incremented ($cnt++) with each pass through the for
statement, not to exceed the value of $db_count. The second
line of the statement block uses the in_array() function to
see whether the database name is contained in the array $db_ignore.
If it is, then the remainder of the for statement block is
skipped and PHP makes another loop through to process the next database
name. Otherwise, using the mysql_list_tables() function, PHP
retrieves a list of tables from MySQL associated with the database
named as the function's argument. To prepare for an embedded
for statement that follows, the results set that contains the
list of tables for the database is counted with mysql_num_rows():
for($c = 0; $c < $tb_count; $c++) {
$table = mysql_tablename($tables, $c);
$file_name = "/var/mysql/$db/$table.MYD";
if(file_exists($file_name)) {
$file_size = filesize($file_name);
$file_modify = date('Y-m-d', filemtime($file_name));
$file_hour = date('h', filemtime($file_name));
if($file_modify == $today & $file_hour == $hour) {
$sql_stmnt = "INSERT INTO table_stats
(db_name, table_name, stat_time,
file_size, file_modify)
VALUES('$db', '$table', NOW(),
'$file_size', '$file_modify')";
$results = mysql_db_query('mysql_stats', $sql_stmnt)
or die(mysql_error());
}
}
}
}
mysql_close();
?>
A different variable ($c) is used so PHP does not confuse the
value of the counter of the outer for statement with the counter
of the embedded one. The first line of the statement block extracts
the table name from $tables and stores it temporarily in $table.
The next line sets up a variable that will contain the file path to
where the live MySQL tables are stored. This bit of information can
be found in the my.cnf file, usually located in the /etc
directory on Unix systems. It's contained in the value
of datadir. The additional path information comes from the
values of the variables determined by the script. MySQL stores the
data for a table in a file with the same name as the table, with the
extension of .MYD, in a directory with the same name as the
database.
Using the function file_exists(), PHP tests whether the
data file for the table exists. On the first line of this if
statement, PHP gets the table's data file size. It then captures
the date it was last modified on the next line. The function filemtime()
returns Unix time. To reformat that value to the MySQL date
format (i.e., yyyy-mm-dd), the function date() is used with
the desired format given. Using the same function, we get the hour
to compare to the current hour ($hour).
If the modification date is the same as today's date and
hour, then PHP will query MySQL to insert the data with mysql_db_query()
in which the database name is given as the first argument and the
SQL statement as the second. Since this concludes the script, the
connection to MySQL is closed. Incidentally, the trailing closing
curly braces close the if statements and the embedded for
statement in this code excerpt, as well as the for statement
from the previous code excerpt.
A Complete Backup
Before going through the script for selectively backing up tables
based on statistics collected by the previous script, I'll
look at a PHP script that can be used to back up all of the databases
in MySQL. Such a script might be set to run once a week by cron.
It will back up each database to a separate tar file using mysqldump.
With mysqldump, the backups are performed without shutting
down MySQL. In addition to backing up the data, this script will
also back up the configuration file (my.cnf) and MySQL itself.
To prepare for this script, I'll first create another table
in MySQL. This one will be used to log information on each backup.
I'll do this from the mysql client:
CREATE TABLE backup_log
(log_id INT AUTO_INCREMENT PRIMARY KEY,
backup_date DATETIME, type ENUM('database','table'),
cycle ENUM('weekly','daily'),
tar_file VARCHAR(50), file_path VARCHAR(50));
To generate reports on when backups were made, I've included
a column for the date of the backups. To distinguish whether a backup
was of a database or a table, I've added an enumerated column
called type. Another numerated column indicates the cycle of
the backup. Finally, the name of the tar file along with its file
path will be saved. I could figure all of this out by just looking
at the backup files; however, this is a tidier and more organized
method. With this table created, I can now go through the script,
mysql_backup_weekly.php:
#!/usr/bin/php -q
<?
mysql_connect('localhost', 'root', 'password');
$databases = mysql_list_dbs();
$db_count = mysql_num_rows($databases);
$dir = '/mnt/backup';
$date_prefix = date('Ymd');
The opening code here is very much like the beginning of the previous
script. After starting PHP and connecting to MySQL, the script gets
a list of databases. After this, a variable containing the path for
the backup directory is given. Next, a variable that contains the
numeric date without dashes or slashes is created. This will be used
as a prefix for the backup file names so that PHP doesn't overwrite
previous backups. Incidentally, by not giving a second argument to
the date() function, the current date is used. PHP is now ready
to loop through the list of databases and to export the data from
each database:
for($cnt = 0; $cnt < $db_count; $cnt++) {
$db = mysql_db_name($databases, $cnt);
system("mysqldump -u root -ppassword -h localhost
-x -e -A $db > $dir/$date_prefix-$db.sql");
$sql_stmnt = "INSERT INTO backup_log
(backup_date, type, cycle, tar_file, file_path)
VALUES(NOW(), 'database', 'weekly',
'$date_prefix-$db.sql', '$dir')";
mysql_db_query('mysql_stats', $sql_stmnt);
}
mysql_close();
system("tar cfz mysql.tar.gz /etc/my.cnf /usr/local/mysql
/usr/lib/mysql /usr/bin/mysql /usr/sbin/mysqld");
?>
This for statement is much like the two found in the previous
script. What's different is the system() function that
executes what is contained within the parentheses, and the double-quotes
at the system level. For details on mysqldump, see my article
on MySQL backups at: http://www.unixreview.com/documents/s=8989/ur0408d/.
Next, an SQL statement is set up to insert information regarding
the backup. After closing out the MySQL session, I end the script
by creating another tar file containing the MySQL configuration
file and all of the various libraries and binaries. Instead of doing
this, though, it might be better to just keep a clean copy (from
mysql.com) of the installation file that you used to install MySQL.
Just be sure to back up my.cnf if you've modified it.
Selective Backups
Which statistics should be kept on databases to determine which
tables to back up is debatable and should be determined by the sys
admin. As a simple example, I've chosen to determine daily
backups based on whether a table's data file is modified at
least four times during the day. Based on this policy, I've
put together an SQL statement that will check for records where
the value of the column file_modify is the same as today's
date and where there are more than four records. Let's look
at the script, mysql_backup_daily.php, which will be run
late each night during a low-traffic time for the fictitious databases:
#!/usr/bin/php -q
<?
mysql_connect('localhost', 'root', 'password');
$dir = '/mnt/backup';
$sql_stmnt = "SELECT db_name, table_name, stat_date
FROM (
SELECT db_name, table_name,
DATE_FORMAT(file_modify, '%Y-%m-%d') AS stat_date,
COUNT(table_name) AS recs
FROM table_stats
WHERE file_modify = CURDATE()
GROUP BY table_name
) AS frequency
WHERE recs > 4;";
$results = mysql_db_query('mysql_stats', $sql_stmnt);
The opening here is much like the previous scripts. One notable difference
is that the SELECT statement involves a sub-query. Sub-queries
are available as of version 4.1 of MySQL. See my article on MySQL
sub-queries at:
http://www.unixreview.com/documents/s=8969/ur0407e/
The sub-query here is contained within the parentheses and creates
a temporary table from which the outer query selects data. Once a
results set that contains a list of databases and tables has been
generated, it's merely a matter of looping through the data for
processing:
while($row = mysql_fetch_object($results)) {
$db = $row->db_name;
$table = $row->table_name;
system("mysqldump -u root -ppassword -h localhost
-x -e $db $table > $dir/$db.$table.sql");
$sql_stmnt = "INSERT INTO backup_log
(backup_date, type, cycle, tar_file, file_path)
VALUES(NOW(), 'table', 'daily', '$db.$table.sql',
'$dir')";
mysql_db_query('mysql_stats', $sql_stmnt);
}
mysql_close();
?>
Using the function mysql_fetch_object(), each row is extracted
as an object. To access the value of each component, the field name
of the results set is used as the identifier. With the name of the
database and related table, the utility mysqldump backs up
the appropriate tables. An entry is then made to the backup_log
table. I'm recording that the backup was of a table and a daily
one.
Conclusion
To construct a backup system like the one discussed here, you
may want to consider the criteria by which you will decide whether
a table will be backed up daily. If your databases are small enough
and their usage is low, then you may want to perform a complete
backup daily regardless of a table's file modification date.
The important thing is to back up the data regularly and appropriately.
Russell Dyer is a Perl programmer, a MySQL developer, and a
Web designer living and working on a consulting basis in New Orleans.
He is also an adjunct instructor at a technical college where he
teaches Linux and other open source software. He can be reached
at russell@dyerhouse.com. |