NetBackup
Performance Reports with MySQL/Perl
Jerry Uanino
Backups are always a problem. Often, even in large companies where
you would expect such an important task to be a high priority, backups
are given to a new guy, an intern, or anyone willing to pay attention.
Many excellent systems administrators began their careers changing
tapes and monitoring or restarting failed backups.
If you oversee a large installation and backups are not your primary
responsibility, it's a daunting task to determine not only
that the backups are succeeding, but also that they are occurring
consistently and efficiently. In this article, I'll describe
a technique for monitoring the performance of a Veritas NetBackup
Datacenter system. Although this technique is specific to a NetBackup
installation, the same concepts can be used to monitor any backup
system.
The main concept is to run a collector script that queries the
NetBackup catalog for backup statistics and stores that information
in a MySQL database. A second program then uses the backup data
in the MySQL database to generate a chart with Perl and the help
of GD::Graph::bars. Alternatively, you can use SQL commands to query
the MySQL database directly and produce any type of custom report.
Figure 1 shows some example output.
Performance data gathered for each backup would include the backup
ID, the duration, the client host, the schedule type (full or incremental),
the amount of data backed up, the number of files, the kilobytes
per second, the policy used, the date the backup started, and the
server that initiated the backup. This data can then be queried
to help answer such questions as:
Which system is the best performer?
mysql> select client,kbpersec from backperf order by kbpersec desc limit 1;
+------------+----------+
| client | kbpersec |
+------------+----------+
| somehost | 24612 |
+------------+----------+
Which systems back up the most data?
mysql> select client,kbytes from backperf order by kbytes desc limit 1;
+---------+-----------+
| client | kbytes |
+---------+-----------+
| ct01sts | 746335072 |
+---------+-----------+
What is the average number of files being backed up?
mysql> select avg(numfiles) from backperf;
+---------------+
| avg(numfiles) |
+---------------+
| 102661.9459 |
+---------------+
Other questions we could answer include: How long is client X taking
to back up? What is the median/average backup time? How many files
are the poor performers backing up? Which are the good performers?
These kinds of questions can be answered by analyzing the data
shown in the previous examples. Figure 1 shows kilobytes/sec, which
helps identify which systems might need attention and provides a
good report to check every morning.
The Setup
Now let's get to the nitty-gritty. The first piece in this
setup is a MySQL database. This article assumes you have a MySQL
server running. Using nbperf.sql (Listing 1), you can create a MySQL
table called "backperf" to store your backup performance
data.
#mysql -u root -p
mysql> create database nbperf
(control-d)
#cat nbperf.sql | mysql -u root -p nbperf
At this point you have a database (nbperf) with one table (backperf).
The next piece is to put data into the table on a regular basis.
Ideally, this would occur as a cron job sometime after the backups
are complete (early morning in most cases). Because the collector
can run as often as you like, scheduling depends on your environment.
The collector, co_backperf.pl (Listing 2), will need to be modified
to fit your system environment. Set the $database and $hostname
variables to the database you created in the above steps on the
MySQL server. You will also need to modify @servers to contain
the NetBackup servers on which you want to collect data (i.e., "server1"
and "server2"). Going through the listing, the next important
thing to note is the $cmd variable. "bpimagelist"
is the guts of the output we want to store; every time it runs,
it produces a listing of everything in the NetBackup catalog since
the "-d" option and includes several important fields.
Typical output of this command looks like this:
IMAGE clientname 0 0 6 clientname_1054339491 policyname 0 *NULL* root
schedulename 0 5 1054339491 1324 1062374691 0 0 2659008 113017 1 1 0
policyname_1054339491_FULL.f *NULL* *NULL* 0 1 0 0 0 *NULL* 0 0 0 0 0 0 0
*NULL* 0 0 0
Although this looks confusing, co_backperf.pl will parse it. If you
want to get a better "user" view of this data, you can add
the "-U" flag to the command (but don't change this
in the script). If the output of this command contains lines with
"IMAGE", a split occurs on whitespace and several fields
are noted. In particular, we grab these:
$backupid = $fields[5];
$seconds = $fields[14];
$client = $fields[1];
$schedtype = $fields[11];
$kb = $fields[18];
$num_files = $fields[19];
$kbpersec = ($kb/$seconds);
$policy = $fields[6];
$backdate = $fields[13];
$backserver= $host;
$backdate = &humantime($backdate);
After noting the important fields, co_backperf.pl will insert an entry
into the database for this backup id (field 5 is the backup id). Running
the script a second time will produce several failed inserts. This
is because entries that are in the database will still appear in the
catalog for some time until they expire, but the failed insert will
prevent duplicate entries. This is done by a primary key set on the
backupid field. Thus, even if the NetBackup catalog expires an entry,
its historical performance data is still kept in the MySQL database.
Additionally, one could write a co_driveindex.pl that would run bperror
commands to collect the drive on which a specific backup id occurred.
This is handy for checking the balancing of your drive usage for multiple
tape drive configurations. (The drive_index field is unused in this
article.)
Note that I call the primitive rsh in my script, but you can use
ssh or any other method. If you run the script locally on the backup
server, you could remove the call altogether. One significant improvement
would be to switch the command calls to use Net::SSH::Perl or a
similar module.
Next, I will cover the representation of this data, which makes
for good management meetings. mkimage_backperf.pl (Listing 3) creates
an image using GD::Graph and a few other CPAN modules. In the listing,
you'll notice $database, $hostname, $user,
and $password must modified as with co_backperf.pl. Then,
we issue the following query:
SELECT backdate, client, kbpersec, policy,
(to_days(now()) - to_days(backdate)) as age
FROM nbperf WHERE
(to_days(now()) - to_days(backdate)) < 2
$policy_hunt
ORDER BY policy DESC
Here we query the backup date, the client, the kilobytes per second,
the policy that was used, and the "age", which is calculated
with some MySQL functions. This produces a list of rows for backups
that occurred within the last 24 hours. The order in which they will
appear in the chart is by policy name, descending. Each row is used
to build the X and Y axis of the chart, using the GD::Graph to produce
the result (see Figure 1). That's it! From this point, you could
modify mkimage_backperf.pl to have a different y_max_value if your
drives are slower or faster than my drives. The resulting image will
be written to an img/ directory, so you need to create that directory
before running the script (see Figure 1). Note that, if you don't
want all this fancy charting, you could simply run this on the backup
server:
/opt/openv/netbackup/bin/admincmd/bperror -all -hoursago 24 \
|grep "wrote backup" | \
awk {'print $1 " " $15 " " $20'} | sed 's/,//g'
This command would give you the backup jobs and speeds from the last
24 hours. You could then store or chart these results as you like.
You could also use Excel or another charting program once you have
the data in the database.
Assumptions
You'll need to install the Perl modules GD::Graph and DBI/DBD::mysql
and have a MySQL installation available for use. Also, note that
the kilobytes per second that is calculated does not take into account
whether a tape needs to be switched. This field is calculated as
kilobytes backed up / seconds. You could run bperror commands and
change your collector to use the actual tape speeds, but in my case
a library switches tapes rather rapidly and the value is close enough
to get a good picture. Besides, if a backup were slow because a
drive wasn't free, or the tape wasn't found for a while,
I'd want to know.
Alternatives
Veritas has a product called NetBackup Advanced Reporter, which
provides reporting capabilities for backup data. I did not review
this portion of NetBackup since I didn't know it existed until
after I began creating my own scripts. I think the product requires
a separate license, and users looking for a more supportable solution
might check into this product. The co_backperf.pl would break if
future versions of NetBackup changed the format of the output, but
then so would a lot of other systems administration scripts that
have been written around NetBackup over the years.
Conclusion
Although Veritas NetBackup provides various command-line utilities
for acquiring backup statistics, there is almost always the need
to tie this data into another database or use open source tools
to do something with the data. MySQL acts as a good starting point
for storing the data because of its simplicity and availability.
Perl makes sense for parsing the output of the various NetBackup
commands, and GD::Graph works well to make simple graphs. The most
difficult part of this setup is getting GD::Graph to install on
a Solaris system but the module is used widely enough that you can
find help to work through the quirks.
Jerry Uanino holds a B.S. in Computer Information Systems from
Marist College and has worked for IBM, US Internetworking, Enterasys
Networks, and most recently Factset Research Systems. He can be
reached at: juanino@yahoo.com.
|