Cover V12, I06

Article
Table 1

jun2003.tar

Introduction to RAID

Henry Newman

I am sure that over the years, many of you have seen a great deal written about RAID hardware, software, and a myriad of related topics. In this column, I will look at the whole topic of RAID from a slightly different perspective.

I divide RAID devices into two categories: cache-centric and storage-centric. You may see different terminology used to describe the same thing. Some people call these RAID types "enterprise" and "mid-range", for example. Whatever you call them, there are major architectural differences between these two device types.

Cache-Centric RAID

I use the term cache-centric because RAID devices in this category depend significantly on data residing in cache to ensure good performance. Cache-centric devices generally have feature sets such as:

  • Very high reliability (dual everything, virtually no downtime)
  • Large caches (e.g., 64 GB or greater)
  • Designed emphasis on using RAID-1
  • Software that allows snapshots, hot upgrades, and many other features
  • If RAID-5 is supported, generally a smaller number of devices supported stripe sizes (e.g., 4+1 4 data disks plus 1 parity drive as compared to 8+1 configurations available on mid-range products)
  • Cache is always mirrored
  • Large number of front-end connections
  • Support for many types of remote mirroring (e.g., dark fibre, IP)
  • Smaller block sizes
  • Huge amounts of storage managed in a single box (e.g., 100 TB)
  • Per-component reliability testing
  • Error monitoring for all hardware, including disk monitoring
  • Designed for I/O processors (IOPs) not streaming I/O
  • Far more bandwidth from cache to servers than from cache to disk (I call this front-end bandwidth and back-end bandwidth)
  • Very high cost per MB of storage compared with storage-centric RAID

Cache-centric RAID vendors include:

  • EMC Symmetrix
  • Hitachi Data Systems 99xx series
  • IBM Shark

Most of these products can run both on UNIX servers and on IBM mainframes. They are designed for when reliability is the most critical issue -- customers may trade reliability for performance because reliability is more important. For these boxes to have good performance, they need a high number of cache hits.

Storage-centric RAID

I use the term storage-centric because the dependency for these devices is on the underlying storage, not on the cache. These devices have the following features:

  • Good reliability but not nearly as high as cache-centric devices
  • Smaller caches (usually 2 GB to 8 GB)
  • RAID-5 support with up to 20 disks in a RAID-5 LUN
  • Cache mirroring can often be turned off to improve performance
  • Far less software for management and maintenance
  • Far less storage in a single box compared with cache-centric RAID
  • Excellent support for streaming I/O with large blocks
  • Support for IOPs, but with small cache they have to be to disk
  • More back-end bandwidth than front-end bandwidth (more channel bandwidth from cache to disk than from cache to servers)
  • Much lower cost per MB for storage than cache-centric RAID

Some examples of storage-centric RAID vendors include:

  • Ciprico
  • DataDirect Networks
  • DotHill
  • EMC CLARiiON CX line (Dell resells EMC)
  • Hitachi Data Systems 9500
  • LSI 5600 (OEM'ed by many other vendors)
  • Sun T3 and S1

There are many vendors in this market area because it is much easier to develop these products than cache-centric devices for several reasons. For example, the reliability does not need to be as great, and they don't need to create mainframe interfaces. Many vendors in this market space are now looking optionally to use IDE drives instead of SCSI drives to provide lower cost and greater data density than is available from SCSI drives. Additionally, a few vendors are developing IDE-only solutions for the low end of this market. RAID devices of this type are far more prevalent and usually have two or more times more bandwidth to disk than to the front-end servers.

So Which Is Better?

As with many things in computing, "it depends on many factors". Not long ago, I was asked to work on a project where the customer wanted to use cache-centric or "enterprise" storage for capture of high-speed data streams. I knew (as did the other technical people involved) that the customer could not use a cache-centric storage box for high-speed full duplex (writing one file and reading other files) at the same time. The enterprise storage vendors benchmark team said they could, but we thought this was absurd because we knew this was a cache-centric device and we were correct. The customer is using a storage-centric device. On the other hand, the same cache-centric storage box will far exceed the performance of the storage-centric device for a database where the index files are used often and will fit in cache. A few new vendors are entering the market trying to combine the best of both.

Before you can make a recommendation of what to buy, you need to review your requirements and your operational usage. Here are just a few issues to consider:

1. What are the uptime and reliability requirements?

2. What are your backup requirements?

3. What RAID level will you be using?

4. How big are your files and how are they accessed?

Reliability

It comes down to how much downtime you can afford for the box. If you have many hosts connected, you cannot afford any downtime. This is often called the 9 count, referring to the number of 9s associated with product reliability. Table 1 shows the 9s and how they equate to downtime. Knowing the reliability requirements and the number of hosts attached will help determine the type of equipment needed.

Backup

Cache-centric or enterprise boxes have the ability to create a mirror, break the mirror for a backup, and then re-attach the mirror and update the box. Most storage-centric or mid-range boxes do not have this feature, although it is starting to become more common.

RAID Level

There are many different RAID levels, but I see RAID-1 and RAID-5 used most often. You will have to determine which RAID level you need to use. The RAID level depends on:

1. Cost -- RAID-1 requires far more disks than RAID-5 for the same amount of data storage.

2. How the data is used -- If you are making small random requests, RAID-1 is faster than RAID-5. If you are making large sequential requests and, especially, doing a great deal of writes, RAID-5 will be much faster.

So, if you are making small requests (especially if they are random), RAID-1 will be much faster than RAID-5 if both are tuned. With RAID-1, each device is mirrored, but with RAID-5 you create a LUN with a parity drive so that the LUN can be rebuilt if a device fails. With RAID-1, you write far more data as each disk is mirrored than you write with RAID-5.

File Sizes and Accesses

In today's world, the likelihood of everything fitting in the RAID cache is very low. Heck, why would you even buy a RAID if that were the case -- you could just buy an SSD (solid state storage). The real questions to answer are: How are these files accessed? Can the data be reused? Would a large cache help or will it not make any difference?

Conclusions

The choice between cache-centric RAID devices and storage-centric RAID devices likely will be made based on budgetary constraints and not the performance of the devices. The operational environments of these two types of devices are often vastly different. Other issues to consider are:

  • How many LUNs and how much storage do you want under the control of one device?
  • What RAID levels you want to run based on the cost per MB?
  • What RAID levels are going to provide the best performance given the applications?
  • What are the application types and request sizes?

It really comes down to "the requirements". Purchases should be made in terms of the requirements and budget so you get the best value for the available monies.

Next time, we will dig deeper into RAID and discuss how RAID and file systems layout should be architected.

Henry Newman has worked in the IT industry for more than 20 years. Originally at Cray Research and now with a consulting organization, he has provided expertise in systems architecture and performance analysis to customers in government, scientific research, and industry around the world. His focus is high-performance computing, storage and networking for UNIX systems, and he previously authored a monthly column about storage for Server/Workstation Expert magazine. He may be reached at: hsn@hsnewman.com.