Extracting
Symmetrix Data with Orca
Tom Kranz
The Symmetrix is EMC's flagship storage device. It's
big. It's bad. And it can sometimes be a bit opaque, meaning
that performance data can be difficult to extract. EMC produces
a range of tools -- some graphical, and some that can be called
from the command line -- but using these to generate long-term
performance statistics, in a management-friendly format, can be
a bit of a struggle.
Orca is an open source package by Blair Zajac that uses RRDTool
to plot arbitrary data from text files. Traditionally, Orca (http://www.orcaware.com/orca)
has been used in conjunction with the SymbEL Toolkit to plot Solaris
performance data. In this article, I will show how to use EMC's
command-line tools, in conjunction with Orca, to plot performance
statistics for your Symmetrix.
For purposes of this article, I'm assuming that you have
rsync, ssh, and some sort of Web server installed -- these should
form part of every sys admin's toolbox. You'll also need
SymmCLI installed on the host connected to the Symmetrix (I have
version 4.1). You'll also need Orca already installed and configured;
I highly recommend it merely for the out-of-the-box Solaris performance
monitoring it gives.
About Orca
A process (called orcallator) runs on a host and logs data into
a file. Using rsync, this data file can then be copied to a central
server, where Orca is run to process the data. Orca places that
data into RRDtool databases, and then generates graphs from those
databases. Orca works out averages and percentage increases by using
the date stamps, which are written to each line of data in the orcallator
output file. The graphs and output files are then placed in a directory
where your Web server can see them, and voilà! Your performance
data is available via a Web browser. See Listing 1.
Collecting Data from Symmetrix
The trick here is to pull performance data from the Symmetrix
and then log that in a file that Orca can understand. symmstat is
the SymmCLI tool that queries the Symmetrix and returns performance
statistics. Listing 2 (orca-symmetrix_probe.pl) is a Perl script
that calls symmstat, parses the output, and then generates a data
file that Orca can understand.
We'll collect all the performance output from symmstat --
it's all useful, and is best looked at all together rather
than focusing on one statistic (e.g., writes per second).
symmstat will provide the following data:
- I/O reads per second
- I/O writes per second (Figure 1)
- kb read per second (Figure 2)
- kb written per second
- cache reads per second
- cache writes per second (Figure 3)
- sequential reads per second
A normal I/O read is when one block of data is requested and received.
A sequential read is when we request several blocks of data, but
they are all next to each other. The disk head can position itself,
then just read several blocks, one after the other, which is an
efficient way to get data off the disks. Write pending (wp) tracks
(Figure 4) the number of tracks that the Symmetrix is sitting on;
it's waiting for a drop in I/O before it stages them out of
cache and down to disk.
One of the big advantages of getting Orca to graph these figures
is that we can easily look at them together, over time. For instance,
a high number of read I/Os, combined with a large amount of kilobits
being read, with high numbers of sequential reads, could be the
normal result of large table scans in your Oracle database (which
might not be a problem).
A large number of small writes that aren't hitting the cache
(large number of I/O writes, small value for kb writes, and small
number of cache writes) will mean that your overall I/O performance
will be suffering, with the Symmetrix constantly having to write
to disk. You'd need to look at what sort of I/O is taking place,
or maybe invest in some more cache.
Note that you can use this information to get an I/O "footprint"
of your applications and then, using the historic graphing of Orca,
you get an idea of when things are ok, and when they are not. You
can also use the Orca graphs for capacity planning or gauging what
impact a new software release will have.
It's worthwhile keeping an eye on the amount of write pending
tracks. Compare it to the amount and size of the write I/Os occurring.
Ideally, you want the Symmetrix to be busy, but not so busy that
it gets overwhelmed.
When pulling the data from the Symmetrix, there are two main things
to remember:
1. We need the epoch date (Orca uses this when working out averages
over time).
2. If symmstat returns nothing for a device (which it can do regularly),
we need to trap this and put a 0 for that value in the output file.
orca-symmetrix_probe.pl (Listing 2) takes symmstat data and creates
Orca-friendly output files. It should be called from root's
crontab every 5 minutes. Once it's running, you should notice
the output file being generated and updated in /var/log/probe.
I have been using this script against a Symmetrix 3630. If you
have a different breed of Symmetrix, you might notice that symmstat
reports slightly different information. In that case, you'll
need to edit the script to suit.
Once we're gathering data, we need a mechanism to copy those
output files back to the central Orca machine for processing. Listing
3 (orca-emc_getlogs.ksh) should be run from the Orca user's
crontab every 30 minutes. Be sure its execution doesn't coincide
with the orca-symmetrix_probe.pl script (Listing 2), otherwise you'll
be copying an incomplete data file.
Essentially, this script calls rsync to copy across the data file,
and then calls orca to process that data. I'm using rsync over
ssh because of the secure password-less connection that ssh offers,
and because it ensures we only copy updated data, vastly reducing
network traffic.
When we've called Orca from the script, we're passing
it the enclosed config file, which tells Orca what values to take
from the data files, and how to plot them. With this setup, you
should now be able to launch a Web browser and view the performance
graphs for your Symmetrix.
Whether the information you gather is sufficient to help you secure
an extra gigabyte of cache is another matter, but at least you know
how the Symmetrix is performing.
My thanks to Mick Sheppard for his invaluable help with the intricacies
of Perl.
Tom Kranz is a contract sys admin with more than 7 years of
experience. His main skills lie with Solaris, IRIX, and SANs. When
he's not stuck in front of his Octane, he can be found herding
his children round in a classic Rover.
|