Storage Area Networks -- Achieving Safe Shared Data Access
George Ericson and Ken Kutzer
The rapid acceptance of external RAID arrays in the 1990s established the way we currently work with storage in a client/server environment. In the future, Storage Area Networking (SANs) will dramatically alter the way we purchase, deploy, and manage storage. SANs will allow us to combine storage devices and servers in large, any-to-any networks, where we can create virtual connections between resources as needed. The drivers for this shift towards storage centralization and consolidation are simple -- boost efficiency, reduce management costs, and improve service levels.
According to Carl E. Howe, Director of Research at Forrestor Research Inc., the average company's storage-related expenditures will increase five-fold over the next five years. Containing storage costs will fuel the adoption of storage networks. However, this new architecture will present new technical challenges with respect to providing safe, secure, and fair access to data. Although these challenges create real obstacles, there are solutions available.
From Private Storage to Storage Networks Traditionally, the distributed, client/server, UNIX, and NT storage model has been direct attached, private storage. The few exceptions occur when clustering or other specialized software enables server-to-server cooperation around storage access and ownership. Given this situation, developers of storage device access software have been able to ignore the general set of problems created by shared access to storage. Both SCSI and the first generation of Fibre Channel devices followed this simple, locally connected, private storage model. (See Figure 1.)
The next generation of advanced Fibre Channel storage is being deployed in storage networks with complex topologies, including arbitrated loops and switched fabrics, that address the dynamic storage needs of companies. Bridges, switches, and hubs replace private I/O channels and connect storage devices that will be shared by multiple, non-cooperating systems. These new environments shift the storage model from locally attached, private storage, to network attached, shared or public storage -- SANs. (See Figure 2.)
SANs Bring New Challenges There are four new storage-related, operational challenges to be aware of when moving from private storage to public storage networks -- safe, secure, fair, and heterogeneous access.
Safe Access Safe access is the most critical challenge because it involves the preservation of data integrity throughout the connection of multiple, non-cooperating systems to a single SAN. Because most operating systems assume that storage is locally attached and private, there typically are no locking mechanisms or data control structures. Without those controls, the risks are severe. For example, a single write to a foreign file system could result in complete data loss. To prevent corruption, it is necessary to ensure that all systems access only their allocated storage and do not mount or mark other devices. Configuring all attached systems properly is extremely important, because a simple configuration error could allow unwanted access and result in data corruption. This becomes a greater, but solvable, challenge as storage networks become larger, more complex, and more dynamic.
Secure Access Establishing a secure environment addresses the need for restrictive access to sensitive information. Unlike file systems, storage systems generally do not support the concept of access control. Systems that are connected to a storage device are assumed to be trusted and are given free access to the stored information. Security and access control are simply not a part of traditional I/O channel protocols. Generally, the data on a storage device is part of a file system, and that file system governs access to the contained information. Because it is unlikely that all systems in a heterogeneous SAN will run the same file system, intervention is required to provide adequate security and access control, otherwise users on one system could gain access to sensitive information by performing block reads from storage allocated to another system.
Fair Access Ensuring fair access to stored data involves providing a reasonable and predictable level of service and response time from shared storage devices. In the traditional direct attach storage model, all access is assumed to be from the same requestor or host. With storage consolidation in a SAN, a physical device may be sliced into many virtual devices that are accessed by different hosts. Appropriate sizing of storage systems (i.e., cache size, number of disks, and number of buses) will ensure proper operation during normal loading. However, the inevitable peak usage periods may push storage resources to their limits.
To sustain proper operations during these peak periods, physical storage devices must be enhanced to support the dynamic allocation of resources among multiple requesting hosts. Lack of this functionality could lead to resource starvation, which may cause timeouts or other unexpected side effects in the access software running on the hosts. An additional challenge is to ensure that some requestors do not dominate data access, while others get little or no access. Providing a fairness capability will help ensure a predictable quality of service, and prevent isolated system or application problems during performance brownouts.
Heterogeneous Access Fostering a heterogeneous network allows a single storage subsystem to behave optimally while being accessed by different host types. In the private, locally attached model, a storage subsystem is generally configured and tuned to provide the best performance for an individual host system, meeting only that particular system's protocol or other idiosyncratic needs. In a heterogeneous SAN environment, storage subsystems must remember multiple sets of tuning parameters, and must apply the correct ones to each request. Additionally, different hosts and applications generate widely diverse I/O access signatures, which may require different data layouts for optimal performance. Without addressing these factors, SAN-attached storage may operate well with one host type but run poorly, or not at all, with others. The full utilization of storage capabilities requires applying server- specific tuning parameters, on a per request basis, as well as supporting various data layouts from one virtual device to another.
Solutions for Safe and Secure Access Being able to ensure continued operations without data corruption or data loss is essential to the deployment of a shared storage SAN environment. There are a number of strategies that help ensure data integrity and provide secure access. These strategies involve some form of hiding storage devices from unassociated hosts on the SAN (LUN Masking, LUN Hiding, LUN Remapping, Virtual Arrays, or Storage Groups). For this discussion, We use the term LUN masking. Note that LUN stands for Logical Unit Number. A Logical Unit is a generic reference for a specific device, such as the logical disks presented by a storage subsystem.
LUN masking can be implemented at several levels in the SAN -- on the attached systems, on infrastructure devices like switches, hubs, or bridges, and lastly on the attached storage subsystems.
Safe and Secure Access on the Host When implemented on the attached systems, often called host-based LUN masking, the solution is generally implemented at the host-driver level. The host system sees all of the storage on the SAN, but then forgets about the devices that it does not own. This specialized host software must be available, and installed and configured properly for all types and revisions of operating systems that will be connected to the SAN. This class of solution works for all types of storage devices, but again there must be driver support for the devices to be connected.
By itself, host-based LUN masking can be risky when attaching a new host and bringing it up without pre-configurations. Any errors made by administrators open up the possibility of data loss. As a result, this solution is not recommended as a standalone remedy except in well-controlled environments overseen by a highly trained staff, and where a required restoration from tape backups is an acceptable disruption of service. Security is only as strong as the weakest host attached. Once security on any host is breached, that host can be used to read data from any disk in the SAN.
Safe and Secure Access on the Switch Infrastructure Many switch vendors have implemented a storage-hiding feature called zoning. Zoning provides a reasonably safe and secure method to control access, and has the benefit of being independent of both the host and storage type. Essentially, the switch maintains a database that recognizes which hosts may access specific parts of a storage subsystem. A subsystem makes a subset of its logical units (storage devices) through each of its ports. Each request is verified by the switch and either passed on or rejected. This environment is relatively safe, as unrecognized hosts can be defaulted as not having access to devices. This feature makes zoning a useful adjunct to host-based LUN masking in preventing the side-affects outlined above when an unconfigured system is attached to a SAN. Zoning is also relatively secure, as a breached host would not have full access to all storage on the SAN.
The primary drawback with zoning is that it is a port-level masking solution that provides less granularity than LUN masking. Most individual disk and tape drives support a single Logical Unit on each device. For these individual drives, there is little difference between port-level and LUN-level masking. However, tape library units and larger disk arrays support access to many Logical Units through each port. Because sharing a storage subsystem is generally done by assigning different LUNs or groups of LUNs to different hosts, zoning is too coarse and not practical in creating safe or secure access.
An additional drawback is that until switch-zone management and host-based LUN masking software are truly integrated, the configuration process still requires a highly trained staff. Finally, as switched fabrics become more complicated, the management software must become more sophisticated. The management software in effect must detect all paths through the fabric from each attached server to each attached storage device, and must ensure that each path is configured with the same level of access restrictions.
Safe and Secure Access through SAN Partitioning Hubs, bridges, and various multi-ported storage subsystems, can also be used to provide safe and secure access. Such devices are physically separated both by front-end ports that become dedicated host connections, and by back-end SANs consisting of multiple ports with individual devices attached. This strategy is referred to as SAN partitioning.
SAN partitioning achieves safe and secure access by creating virtual devices on the bridge, hub, or subsystem, and then mapping these virtual devices to the appropriate host ports. Normally, each host has a dedicated front-end port, which provides safety and security for the virtual devices. If desired, a front-end port can be shared. However, the assigned virtual devices are no longer protected as any host on that port can access them.
Adding servers in a SAN partitioning environment requires installing additional physical ports, which may limit the scalability. Because each host has a separate connection, there is no way to share a device (tape library), or specialized port (remote connection), that is not connected to the back-end of the bridge, hub, or array. To remedy this, an entire SAN with separate host connections will need to be put in place for access to the shared device.
As with switches, sophisticated storage management software is required to detect all paths and to ensure that the topology is partitioned correctly. Finally, except in the case of storage subsystem implementation, SAN partitioning requires the insertion of a bridge-class device into the infrastructure, which reduces performance.
Safe and Secure Access on the Subsystem The final method of providing data protection is through subsystem-based LUN masking. With subsystem-based LUN masking, the storage subsystem itself ensures proper access control to storage devices. If there are additional disk devices connected to the SAN that do not support subsystem-based LUN masking, then one or more of the other solutions discussed may be necessary.
Subsystem-based LUN masking fulfills all legitimate access requests and ignores those from unknown requestors. This provides strong configuration safety, because an administrator must intentionally enable LUN access on a per host basis. Secure access is also enforced, as a compromised server can not be used to read data from unassigned LUNs. This solution is independent of both the host operating system types, and the type of SAN infrastructure installed, providing a high level of flexibility and safety.
Access is assigned to a single LUN or a group of LUNs. This provides a higher level of granularity than the zoning previously discussed, and is better equipped to support the disk subsystems likely to be used in SAN environments. Compared to SAN partitioning, subsystem-based LUN masking provides greater scalability and flexibility because additional ports do not need to be added for growth. Management complexity is also minimized because all hosts can utilize a common SAN versus separate connections. Furthermore, the SAN achieved with this strategy provides an infrastructure that can be leveraged for backup, disaster recovery, or other needs.
Solutions for Fair and Heterogeneous Access In most IT environments, fair and heterogeneous access are complex challenges but generally less critical problems to be solved. Once safe and secure access has been reached, maintaining required performance levels and optimizing hardware can be addressed. The only practical place to address fair access or heterogeneous access is on the physical storage subsystem. In a fair access strategy, the subsystem takes the responsibility for satisfying requests in a fashion that ensures no hosts get locked out by the heavy usage of other hosts. This could be accomplished through a fairly simple round robin process similar to bus arbitration, or through a very sophisticated process involving user selectable input.
For heterogeneous access, the subsystem must maintain a set of tuning parameters associated with each host and have the ability to apply these parameters to each host request. This can be accomplished only if the storage subsystem has an understanding of which hosts have been assigned to specific local units. A manual method of accomplishing this is to deploy multiple arrays, tune them differently, and then assign access to each array by system or I/O access signature type. This will help ensure optimization but unfortunately will increase the management burden, as storage must be viewed as multiple pools with different characteristics versus one large pool.
Choosing the Right Solution The overall challenge in deploying a reliable SAN is ensuring safe and secure access while remaining sensitive to the need for fair and heterogeneous access. There are numerous strategies for building a reliable SAN, based on the intended size of the SAN and the platforms that will be introduced in the SAN environment. The achievement of a company's long-term objectives depends upon the choice of an appropriate strategy.
For instance, a company has a large number of distributed servers including NT and UNIX, Ethernet-based networks, and a mixture of applications (many of which are deemed business critical). This company may expect its storage to grow 100% per year, but may be unsure of the exact distribution among servers and applications. Given this expected growth, the primary infrastructure goal is to consolidate many of the distributed servers onto larger systems, in order to simplify the required management effort.
An initial plan would be to build a single storage pool, which can be partitioned and allocated to three NT systems and three Solaris systems. New applications will be deployed on the new servers and existing applications will be migrated over time. The number of servers connected to this storage pool is expected to double in one year, and then double again in the following year. The decision is made to deploy a SAN to support this environment with the objective of minimizing architectural changes throughout the two-year window. How can this be accomplished?
Scenario 1: Host-Based Solution The company decides to utilize a host-based LUN masking solution. Each of the six servers runs specialized software, which is loaded to provide safety and security, as well as enable the administrators to slice up the storage and assign it to the six systems.
This delivers a solution that is independent of both storage and switch products at a relatively low cost. The solution is easily expandable with increasing security as the environment grows.
Data corruption or downtime can result, however, from mistakes made in the configuration. The growth of this environment requires additional people and support, making it more difficult to manage and creating a greater chance for error.
Scenario 2: Switch-Based Solution The company decides to utilize a switch-based solution. The administrators set up a number of different zones, each of which include a server and the necessary storage for that zone. Servers in one zone are unable to see storage in another, making the zones independent of each other. This solution is independent of server and storage products. The solution provides substantial safety and security as the environment prevents the new host from gaining access to existing storage.
Consider the increased cost of this environment, due to each host requiring dedicated storage ports or storage subsystems with the unavailability LUN level partitioning. The addition of host-based LUN masking allows sharing of storage units, but also introduces the configuration risks associated with that strategy.
Scenario 3: Array-Based Solution The company decides on an array-based LUN masking solution. The administrators set up groups of LUNs, which are then assigned to the desired hosts. The array responds only to those LUN requests that are authorized.
This solution is independent of servers and the SAN infrastructure. Again, safety and security can be reached as the environment prevents new hosts from gaining access to existing LUNs. Security is independent of SAN infrastructure, therefore proper operation does not hinge on the availability of intelligent software that can detect and manage all data paths within the SAN. However, storage arrays that do not support array-based LUN masking will require the introduction of additional security strategies to create the level of security typically required.
Conclusion The achievement of a reliable and valuable SAN is found in the ability to address the challenges of safe, secure, fair, and heterogeneous access in any environment while continuing to boost efficiency, reduce management costs, and improve the levels of service.
About the Author
George Ericson is a Senior Staff Engineer for CLARiiON Advanced Tecnology and Ken Kutzer is the CLARiiON Marketing Manager at EMC corporation. George can be reached at gericson@clariion.com. Ken can be reached at kutzer_ker@emc.com.
|