NetBackup Performance Reports with MySQL/Perl

Jerry Uanino

Backups are always a problem. Often, even in large companies where you would expect such an important task to be a high priority, backups are given to a new guy, an intern, or anyone willing to pay attention. Many excellent systems administrators began their careers changing tapes and monitoring or restarting failed backups.

If you oversee a large installation and backups are not your primary responsibility, it's a daunting task to determine not only that the backups are succeeding, but also that they are occurring consistently and efficiently. In this article, I'll describe a technique for monitoring the performance of a Veritas NetBackup Datacenter system. Although this technique is specific to a NetBackup installation, the same concepts can be used to monitor any backup system.

The main concept is to run a collector script that queries the NetBackup catalog for backup statistics and stores that information in a MySQL database. A second program then uses the backup data in the MySQL database to generate a chart with Perl and the help of GD::Graph::bars. Alternatively, you can use SQL commands to query the MySQL database directly and produce any type of custom report. Figure 1 shows some example output.

Performance data gathered for each backup would include the backup ID, the duration, the client host, the schedule type (full or incremental), the amount of data backed up, the number of files, the kilobytes per second, the policy used, the date the backup started, and the server that initiated the backup. This data can then be queried to help answer such questions as:

Which system is the best performer?

mysql> select client,kbpersec from backperf order by kbpersec desc limit 1;
+------------+----------+
| client     | kbpersec |
+------------+----------+
| somehost   |    24612 |
+------------+----------+

Which systems back up the most data?

mysql> select client,kbytes from backperf order by kbytes desc limit 1;
+---------+-----------+
| client  | kbytes    |
+---------+-----------+
| ct01sts | 746335072 |
+---------+-----------+

What is the average number of files being backed up?

mysql> select avg(numfiles) from backperf;
+---------------+
| avg(numfiles) |
+---------------+
|   102661.9459 |
+---------------+

Other questions we could answer include: How long is client X taking to back up? What is the median/average backup time? How many files are the poor performers backing up? Which are the good performers?

These kinds of questions can be answered by analyzing the data shown in the previous examples. Figure 1 shows kilobytes/sec, which helps identify which systems might need attention and provides a good report to check every morning.

The Setup

Now let's get to the nitty-gritty. The first piece in this setup is a MySQL database. This article assumes you have a MySQL server running. Using nbperf.sql (Listing 1), you can create a MySQL table called "backperf" to store your backup performance data.

#mysql -u root -p
mysql> create database nbperf
(control-d)
#cat nbperf.sql | mysql -u root -p nbperf

At this point you have a database (nbperf) with one table (backperf).

The next piece is to put data into the table on a regular basis. Ideally, this would occur as a cron job sometime after the backups are complete (early morning in most cases). Because the collector can run as often as you like, scheduling depends on your environment. The collector, co_backperf.pl (Listing 2), will need to be modified to fit your system environment. Set the $database and $hostname variables to the database you created in the above steps on the MySQL server. You will also need to modify @servers to contain the NetBackup servers on which you want to collect data (i.e., "server1" and "server2"). Going through the listing, the next important thing to note is the $cmd variable. "bpimagelist" is the guts of the output we want to store; every time it runs, it produces a listing of everything in the NetBackup catalog since the "-d" option and includes several important fields. Typical output of this command looks like this:

IMAGE clientname 0 0 6 clientname_1054339491 policyname 0 *NULL* root
schedulename 0 5 1054339491 1324 1062374691 0 0 2659008 113017 1 1 0
policyname_1054339491_FULL.f *NULL* *NULL* 0 1 0 0 0 *NULL* 0 0 0 0 0 0 0
*NULL* 0 0 0

Although this looks confusing, co_backperf.pl will parse it. If you want to get a better "user" view of this data, you can add the "-U" flag to the command (but don't change this in the script). If the output of this command contains lines with "IMAGE", a split occurs on whitespace and several fields are noted. In particular, we grab these:

$backupid  = $fields[5];
$seconds   = $fields[14];
$client    = $fields[1];
$schedtype = $fields[11];
$kb        = $fields[18];
$num_files = $fields[19];
$kbpersec  = ($kb/$seconds);
$policy    = $fields[6];
$backdate  = $fields[13];
$backserver= $host;
$backdate  = &humantime($backdate);

After noting the important fields, co_backperf.pl will insert an entry into the database for this backup id (field 5 is the backup id). Running the script a second time will produce several failed inserts. This is because entries that are in the database will still appear in the catalog for some time until they expire, but the failed insert will prevent duplicate entries. This is done by a primary key set on the backupid field. Thus, even if the NetBackup catalog expires an entry, its historical performance data is still kept in the MySQL database. Additionally, one could write a co_driveindex.pl that would run bperror commands to collect the drive on which a specific backup id occurred. This is handy for checking the balancing of your drive usage for multiple tape drive configurations. (The drive_index field is unused in this article.)

Note that I call the primitive rsh in my script, but you can use ssh or any other method. If you run the script locally on the backup server, you could remove the call altogether. One significant improvement would be to switch the command calls to use Net::SSH::Perl or a similar module.

Next, I will cover the representation of this data, which makes for good management meetings. mkimage_backperf.pl (Listing 3) creates an image using GD::Graph and a few other CPAN modules. In the listing, you'll notice $database, $hostname, $user, and $password must modified as with co_backperf.pl. Then, we issue the following query:

SELECT backdate, client, kbpersec, policy,
 (to_days(now()) - to_days(backdate)) as age
 FROM nbperf WHERE
 (to_days(now()) - to_days(backdate)) < 2
 $policy_hunt
 ORDER BY policy DESC

Here we query the backup date, the client, the kilobytes per second, the policy that was used, and the "age", which is calculated with some MySQL functions. This produces a list of rows for backups that occurred within the last 24 hours. The order in which they will appear in the chart is by policy name, descending. Each row is used to build the X and Y axis of the chart, using the GD::Graph to produce the result (see Figure 1). That's it! From this point, you could modify mkimage_backperf.pl to have a different y_max_value if your drives are slower or faster than my drives. The resulting image will be written to an img/ directory, so you need to create that directory before running the script (see Figure 1). Note that, if you don't want all this fancy charting, you could simply run this on the backup server:

/opt/openv/netbackup/bin/admincmd/bperror -all -hoursago 24  \
  |grep "wrote backup" |  \
awk {'print $1 " " $15 " " $20'}  | sed 's/,//g'

This command would give you the backup jobs and speeds from the last 24 hours. You could then store or chart these results as you like. You could also use Excel or another charting program once you have the data in the database.

Assumptions

You'll need to install the Perl modules GD::Graph and DBI/DBD::mysql and have a MySQL installation available for use. Also, note that the kilobytes per second that is calculated does not take into account whether a tape needs to be switched. This field is calculated as kilobytes backed up / seconds. You could run bperror commands and change your collector to use the actual tape speeds, but in my case a library switches tapes rather rapidly and the value is close enough to get a good picture. Besides, if a backup were slow because a drive wasn't free, or the tape wasn't found for a while, I'd want to know.

Alternatives

Veritas has a product called NetBackup Advanced Reporter, which provides reporting capabilities for backup data. I did not review this portion of NetBackup since I didn't know it existed until after I began creating my own scripts. I think the product requires a separate license, and users looking for a more supportable solution might check into this product. The co_backperf.pl would break if future versions of NetBackup changed the format of the output, but then so would a lot of other systems administration scripts that have been written around NetBackup over the years.

Conclusion

Although Veritas NetBackup provides various command-line utilities for acquiring backup statistics, there is almost always the need to tie this data into another database or use open source tools to do something with the data. MySQL acts as a good starting point for storing the data because of its simplicity and availability. Perl makes sense for parsing the output of the various NetBackup commands, and GD::Graph works well to make simple graphs. The most difficult part of this setup is getting GD::Graph to install on a Solaris system but the module is used widely enough that you can find help to work through the quirks.

Jerry Uanino holds a B.S. in Computer Information Systems from Marist College and has worked for IBM, US Internetworking, Enterasys Networks, and most recently Factset Research Systems. He can be reached at: juanino@yahoo.com.