Root
Disk Mirroring with Linux
Jeff Layton
With hard disk drives, you can count on one thing: eventually,
they will fail. Most operating systems have the ability to "mirror"
hard disk data to two (or more) drives, which helps guard against
failure. Linux is no exception and provides a kernel-based software
RAID driver (known as the MD driver) that allows administrators
to mirror devices (among other things).
Another part of this problem faced by systems administrators involves
partition sizing. It's often a good idea to put different pieces
of the filesystem tree into separate filesystems. With traditional
hard partitioning, this can mean doing a lot of guesswork when sizing
these partitions at install time and downtime later if those guesses
were wrong. With Linux, though, you can use the Logical Volume Manager
(LVM) to slice up disk space. The LVM allows admins to resize volumes
on the fly. Growing partitions can even be done with them mounted
(as long as you use a filesystem that allows it).
In this article, I will suggest a "best practices" approach to
root disk mirroring using the md driver and LVM drivers in Linux.
I'll cover an example machine that has two 10-GB disks on independent
controllers. The operating system used in my example is Debian Linux
with a 2.4 kernel installed on the first hard disk. The second disk
is unused. I'll describe how to convert this machine to a setup
where the disks are mirrored using the md driver and sliced into
volumes using the LVM. I'll also cover booting to the new setup
and resizing volumes.
This article focuses on the 2.4.x series of kernels and assumes
that you understand building and installing a new kernel, as well
as building and installing userspace programs. I will focus on Debian
Sarge as a distribution, but the instructions should be applicable
to any Linux distribution as long as the correct software tools
are installed.
It goes without saying that setting up disk mirroring is an inherently
dangerous activity. The possibility of making mistakes and losing
data is high, so this should not be attempted without a good backup
(or with the understanding that a mistake could mean loss of data).
Initial Setup and Overview
Note that you can mirror any two devices in your machine, but
it's always preferable to mirror devices that are on separate controllers.
Not only does this eliminate the controller as a single point of
failure, but it also helps performance.
For the initial setup, the base operating system is installed
on the first hard disk (hda) using traditional hard partitioning,
with the second disk (hdc) unused. The root filesystem is on the
first partition (hda1).
The goal is to have each disk set up with two partitions that
will be mirrored. The first partition will be a small one containing
the /boot filesystem. The second partition will be a large block
of space encompassing the bulk of the disk. This partition will
be allocated to the LVM.
Userspace Programs
This rootdisk mirroring setup requires several userspace programs.
If possible, get the prepackaged versions with your operating system.
You'll need the following:
1. raidtools version 1.00.3 or greater -- With Debian, this package
is known as "raidtools2" (to distinguish it from the version that
deals with the older style on-disk RAID format). If you need to
build these tools from sources, they are available at the following
URL:
http://people.redhat.com/mingo/raidtools/
2. LVM software tools -- I will use the 1.0.x set of LVM tools in
this example. Under Debian, these are contained in the "lvm-common"
and "lvm10" packages. The sources for these tools are available at:
http://www.sistina.com/products_lvm_download.htm
3. The GRUB bootloader -- Although it is possible to use LILO for
this setup, I highly recommend making the switch to GRUB as a bootloader.
GRUB is much more flexible and forgiving than LILO. In this article,
I'll cover setting up the bootloader using GRUB. Sources are available
at the GRUB homepage at:
http://www.gnu.org/software/grub/
4. rsync -- Once the new rootdisk is set up, the data must be copied
from the old rootdisk. There are many tools that can do this, but
I use rsync. It's available in the "rsync" package in Debian. If you
need sources, they're available at:
http://samba.org/rsync/
Once you have installed all of the proper userspace tools, you can
begin building and installing a new kernel.
Kernel Configuration
Most vendor kernels (Debian included) build some of the needed
drivers as modules. We'll be using an initial ramdisk (initrd) image
during the boot process, so it's technically possible to build some
of these options as modules. I find it much easier, however, to
build a kernel that has the drivers needed for booting compiled
in statically.
Building and installing a kernel is outside the scope of this
article, but you should make sure you have the following options
set in your kernel configuration:
CONFIG_MD=y
CONFIG_BLK_DEV_MD=y
CONFIG_MD_RAID1=y
CONFIG_BLK_DEV_LVM=y
You'll also need initial ramdisk (initrd) support, and it's recommended
to compile in the loop block device:
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=4096
CONFIG_BLK_DEV_INITRD=y
CONFIG_BLK_DEV_LOOP=y
Additionally, you should build in support for the filesystems you
intend to use. I recommend reiserfs, because it allows you to grow
filesystems while they're mounted, which is very helpful. Also make
sure to build in ext2fs support, as the initial ramdisk image created
later will use that as a filesystem:
CONFIG_EXT2_FS=y
CONFIG_REISERFS_FS=y
Once you get the kernel and modules built, be sure to place the kernel
in the /boot directory and not in the root directory. Rename the existing
kernel to "vmlinuz.old". The new kernel should be named "vmlinuz"
(or make a symlink called that to the real kernel). The kernel modules
should be installed in the normal place (/lib/modules/<kernel version>).
Then, configure GRUB as the bootloader.
GRUB Configuration
Once your new kernel is installed under /boot, it's time to set
up GRUB. Make the /boot/grub directory and copy all of the grub
drivers to it. With the Debian GRUB package, these files are installed
under /usr/lib/grub/i386-pc:
# mkdir /boot/grub
# cp /usr/lib/grub/i386-pc /boot/grub
Your GRUB installation might put these files in a location different
from "/usr/lib/grub/i386-pc".
Next, we'll make a GRUB configuration file. Create a file called
"/boot/grub/menu.lst" and put the following contents in it:
# Boot automatically after 20 secs.
timeout 20
# By default, boot the first entry.
default 0
# Fallback to the second entry.
fallback 1
# For booting Linux
title GNU/Linux
root (hd0,0)
kernel /boot/vmlinuz root=/dev/hda1
# For booting Linux from my old kernel
title GNU/Linux (old kernel)
root (hd0,0)
kernel /boot/vmlinuz.old root=/dev/hda1
Depending on your setup, you may need other options on the kernel
command lines. This is a very basic GRUB config file; there are many
more options. Consult the documentation for more information. As root,
run grub and install the bootloader in the boot sector on the first
disk:
# grub
grub> root (hd0,0)
Filesystem type is reiserfs, partition type 0x83
grub> setup (hd0)
Checking if "/boot/grub/stage1" exists... yes
Checking if "/boot/grub/stage2" exists... yes
Checking if "/boot/grub/reiserfs_stage1_5" exists... yes
Running "embed /boot/grub/reiserfs_stage1_5 (hd0)"... 18 sectors
are embedded.
succeeded
Running "install /boot/grub/stage1 (hd0) (hd0)1+18 p (hd0,0)
/grub/stage2 /grub/menu.lst"... succeeded
Done.
Next, reboot the box to the new kernel to ensure that it works.
Partitioning the New Disk
Presuming you have a working kernel when you reboot, you can next
lay out partitions on the second disk. Run fdisk on the unused
second disk:
# fdisk /dev/hdc
Remove any existing partitions from the disk, and create two new partitions.
Make one fairly small partition (128M-512M is usually plenty) at the
beginning of the disk that will be mounted on "/boot". The other partition
should encompass the rest of the disk and will be allocated as space
for the LVM.
Note that if you are working with two disks of different size,
you'll need to make sure that both partitions will fit on the smaller
of the disks. The example server for this article has identical
disks.
Set the partition types on both partitions to "fd" (Linux RAID
autodetect). This will tell the kernel that these partitions hold
software RAID partitions, and it will start them early in the boot
process (which is what we want).
Here's the partition table on /dev/hdc on the example machine
(from the "p" command in fdisk):
# fdisk /dev/hdc
Command (m for help): p
Disk /dev/hdc: 10.0 GB, 10005037056 bytes
16 heads, 63 sectors/track, 19386 cylinders
Units = cylinders of 1008 * 512 = 516096 bytes
Device Boot Start End Blocks Id System
/dev/hdc1 1 249 125464+ fd Linux raid autodetect
/dev/hdc2 250 19386 9645048 fd Linux raid autodetect
Once you allocate the partitions and set the types, write the new
partition table to the disk with the "w" command.
The Linux kernel will not recognize partition table changes without
being told to do so, so you'll need to either reboot the machine
so it will see the new partition table, or use the hdparm -z
command:
# hdparm -z /dev/hdc
Converting the Partitions into RAID Devices
The first step is to set up /etc/raidtab. This file serves as
the configuration file for the mkraid command. We need to
set up a stanza for both partitions, and declare any partitions
on hda as a "failed-disk" for now. This will keep the md driver
from trying to use them at this time.
# md1 is the /boot array
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
device /dev/hdc1
raid-disk 0
# this is our old disk, mark as failed for now
device /dev/hda1
failed-disk 1
# md2 is the LVM array
raiddev /dev/md2
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
persistent-superblock 1
device /dev/hdc2
raid-disk 0
# mark this device as failed for now
device /dev/hda2
failed-disk 1
Next, we can set up the RAID devices:
# mkraid /dev/md1
handling MD device /dev/md1
analyzing super-block
disk 0: /dev/hda1, failed
disk 1: /dev/hdc1, 125464kB, raid superblock at 125376kB
# mkraid /dev/md2
handling MD device /dev/md2
analyzing super-block
disk 0: /dev/hda2, failed
disk 1: /dev/hdc2, 9645048kB, raid superblock at 9644928kB
Setting up the LVM
To initialize the larger RAID partition as an LVM physical volume,
do the following:
# pvcreate /dev/md2
pvcreate -- physical volume "/dev/md2" successfully created
Next, create a volume group, using the physical volume as the underlying
device:
# vgcreate rootvg /dev/md2
vgcreate -- INFO: using default physical extent size 32 MB
vgcreate -- INFO: maximum logical volume size is 2 Terabyte
vgcreate -- doing automatic backup of volume group "rootvg"
vgcreate -- volume group "rootvg" successfully created and activated
Note that the LVM tools are finicky about symbolic links. Be sure
that the devices passed to it are actual block device files and not
symlinks.
Creating and Initializing Volumes
At this point, you can lay out volumes within the volume group.
This part of the process is pretty arbitrary. Make sure you allocate
enough space in each volume for the existing contents of these directories,
but there is no need to make any of them greatly oversized as it's
very easy to grow them later.
Here's how I set up the volumes on my machine:
# lvcreate -n swap -L 1024M rootvg
lvcreate -- doing automatic backup of "rootvg"
lvcreate -- logical volume "/dev/rootvg/swap" successfully created
# lvcreate -n root -L 512M rootvg
# lvcreate -n usr -L 2048M rootvg
# lvcreate -n var -L 2048M rootvg
This leaves 4.5 GB of disk space in reserve that can be added to any
of the above volumes as needed.
The next step is to initialize each logical volume and the /boot
filesystem. As before, I used reiserfs for each of them:
# mkfs -t reiserfs /dev/rootvg/root
# mkfs -t reiserfs /dev/rootvg/usr
# mkfs -t reiserfs /dev/rootvg/var
# mkfs -t reiserfs /dev/md1
# mkswap /dev/rootvg/swap
Copying Data
Mount all the new filesystems under a directory under the root
directory:
# mkdir /newdisk
# mount /dev/rootvg/root /newdisk
# mkdir /newdisk/usr
# mkdir /newdisk/var
# mkdir /newdisk/boot
# mount /dev/rootvg/usr /newdisk/usr
# mount /dev/rootvg/var /newdisk/var
# mount /dev/md1 /newdisk/boot
Next, we use rsync to do a preliminary copy of the data from hda to
hdc. This can be done with the machine fully operational to minimize
downtime. We'll exclude the contents of /tmp and /proc from the copy,
but will copy the directories themselves. We'll skip copying the directory
of /newdisk, however:
# cd /
# rsync -avH --exclude='/tmp/**' --exclude='/proc/**'
--exclude='/newdisk**' * /newdisk
Create Initial RamDisk Image
We next will create an initial ramdisk image. The LVM software
includes a script for doing just this. Run:
# lvmcreate_initrd
Logical Volume Manager 1.0.7 by Heinz Mauelshagen 05/03/2003
lvmcreate_initrd -- make LVM initial ram disk /boot/initrd-lvm-2.4.24.gz
lvmcreate_initrd -- finding required shared libraries
lvmcreate_initrd -- stripping shared libraries
lvmcreate_initrd -- calculating initrd filesystem parameters
lvmcreate_initrd -- calculating loopback file size
lvmcreate_initrd -- making loopback file (3184 kB)
lvmcreate_initrd -- making ram disk filesystem (84 inodes)
lvmcreate_initrd -- mounting ram disk filesystem
lvmcreate_initrd -- creating new /etc/modules.conf
lvmcreate_initrd -- creating new modules.dep
lvmcreate_initrd -- copying initrd files to ram disk
lvmcreate_initrd -- copying shared libraries to ram disk
lvmcreate_initrd -- creating new /linuxrc
lvmcreate_initrd -- creating new /etc/fstab
lvmcreate_initrd -- ummounting ram disk
lvmcreate_initrd -- creating compressed initrd /boot/initrd-lvm-2.4.24.gz
Once your initrd image has been created, copy it to /newdisk/boot
and create a symlink to it called "initrd":
# cp /boot/initrd-lvm-2.4.24.gz /newdisk/boot
# cd /newdisk/boot
# ln -s initrd-lvm-2.4.24.gz initrd
Reconfigure GRUB
Before we set up GRUB for booting to the new filesystem, I'll
explain how booting works when you have your root filesystem on
LVM. The boot process works like this:
1. The BIOS starts and loads GRUB from the primary boot device.
2. GRUB loads the kernel from the /boot partition and passes information
to the kernel about the location of the initial ramdisk (initrd)
image.
3. The kernel boots and eventually loads the md driver. The md
driver scans all the hard disks looking for partitions of type "fd".
When it sees them, it attempts to activate them as RAID devices.
4. The kernel then mounts the initrd image as the root file system,
and runs the script on it named /linuxrc. /linuxrc has instructions
in it to start the LVM volume groups.
5. It mounts its "real" root filesystem (/dev/rootvg/root in this
example) and then continues booting.
At this point, the grub configuration file (menu.lst) should look
something like this:
# For booting Linux from our old rootdisk
title GNU/Linux
root (hd0,0)
kernel /boot/vmlinuz root=/dev/hda1
# For booting Linux from our old rootdisk (single user)
title GNU/Linux (single user mode)
root (hd0,0)
kernel /boot/vmlinuz root=/dev/hda1 single
# For booting Linux from our new LVM mirror
title GNU/Linux (LVM mirror disk)
root (hd1,0)
kernel /vmlinuz root=/dev/rootvg/root
initrd /initrd
So, when we boot from the /dev/hda1, we simply load the kernel from
/boot/vmlinuz on the root filesystem (/dev/hda1).
When booting from hdc, things are a little more confusing. /dev/hdc1
(which is one half of the /boot md device) is the "root" as far
as GRUB is concerned, so we list the kernel as /vmlinuz instead
of /boot/vmlinuz like we did with the first boot option. The same
goes for the initrd image (/initrd).
Note that GRUB doesn't need to understand "md" partitions. Because
the metadata for "md" devices lies at the end of the partition,
it can work with the filesystem that exists on the partition, without
worrying about the fact that it's half of a mirrored pair.
Once you create this file, copy it onto both the old and new rootdisks:
# cp menu.lst /boot/grub/menu.lst
# cp menu.lst /newdisk/boot/grub/menu.lst
Install Boot Sector on /dev/hdc (Optional)
A GRUB boot sector on /dev/hdc is really most helpful if your
BIOS is capable of booting from it. If it isn't, then I highly recommend
building a GRUB boot floppy or CD; otherwise, you will not be able
to boot if /dev/hda fails.
To install a boot sector on /dev/hdc, run GRUB and enter the following
commands to install a boot sector on the secondary disk:
# grub
grub> root (hd1,0)
Filesystem type is reiserfs, partition type 0xfd
grub> setup (hd1)
Checking if "/boot/grub/stage1" exists... no
Checking if "/grub/stage1" exists... yes
Checking if "/grub/stage2" exists... yes
Checking if "/grub/reiserfs_stage1_5" exists... yes
Running "embed /grub/reiserfs_stage1_5 (hd1)"... 18 sectors are
embedded.
succeeded
Running "install /grub/stage1 (hd1) (hd1)1+18 p (hd1,0)/grub/stage2
/grub/menu.lst"... succeeded
Done.
I usually install a boot sector on all the disks in the system regardless
of whether the BIOS supports booting from it. That way, at worst,
I could physically move the disk to the other controller and boot
from it.
Reboot and Do Final Copy
Reboot to the /dev/hda in single-user mode (option 2 of our current
GRUB menu). When you get to a shell, mount the volumes in rootvg
and /boot:
# mount /dev/rootvg/root /newdisk
# mount /dev/rootvg/usr /newdisk/usr
# mount /dev/rootvg/var /newdisk/var
# mount /dev/md1 /newdisk/boot
Rerun the copy with rsync to transfer any files that have changed
since the original copy:
# cd /
# rsync -avH --exclude='/tmp/**' --exclude='/proc/**'
--exclude='/newdisk**' * /newdisk
This should ensure that all files are transferred intact by doing
a final copy with most programs not running.
Fix fstab
Once this step is done, edit /newdisk/etc/fstab and set it up
to mount the new LVM volumes in place of the old partitions, and
to mount the /boot partition. We would change it to the following
on the example machine:
# <file_system> <mountpt> <fstype> <options> <dump> <pass>
/dev/rootvg/root / reiserfs defaults 1 1
/dev/md1 /boot reiserfs defaults 1 1
/dev/rootvg/var /var reiserfs defaults 1 1
/dev/rootvg/usr /usr reiserfs defaults 1 1
/dev/rootvg/swap none swap defaults 0 0
proc /proc procfs defaults 0 0
Boot to hdc and Start Mirroring
Next, reboot the box:
# reboot
When the GRUB menu comes up after the reboot, select the third option
(the boot from /dev/hdc). The machine should boot to /dev/hdc and
mount its partitions as LVM volumes. The /dev/hda should be unused
at this point.
When you get to a login prompt, log in as root and repartition
/dev/hda with identical partitions to that of /dev/hdc. If the disks
are of identical types (as on our example machine) and you have
"sfdisk" installed, you can do the following to copy the partition
table verbatim:
# sfdisk -d /dev/hdc | sfdisk /dev/hda
Otherwise, you'll need to partition /dev/hda by hand with fdisk
(or another program). Make sure both partitions on the /dev/hda are
identical in size to, or slightly larger than, those on /dev/hdc.
Once the /dev/hda has been partitioned, cue the kernel to reread the
partition table from the disk:
# hdparm -z /dev/hda
Next, attach the partitions to the metadevices and start them synchronizing:
# raidhotadd /dev/md1 /dev/hda1
# raidhotadd /dev/md2 /dev/hda2
Now you can watch the progress on the mirror synchronization via the
/proc/mdstat pseudo-file:
# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 hda1[0] hdc1[1]
125376 blocks [2/2] [UU]
md2 : active raid1 hda2[2] hdc2[1]
9644928 blocks [2/1] [_U]
[>....................] recovery = 0.1% (11696/9644928) \
finish=13.7min speed=11696K/sec
unused devices: <none>
Finalize GRUB Configuration
Next, we'll edit the GRUB menu.lst file into its final configuration.
We dispose of the stanzas relating to the non-LVM rootdisk and add
a stanza for booting to the old rootdisk as an LVM disk. Here's
the final GRUB configuration:
# Boot automatically after 30 secs.
timeout 15
# By default, boot the first entry.
default 0
# Fallback to the second entry.
fallback 1
# For booting Linux from our old rootdisk as a LVM mirror disk
title GNU/Linux (LVM primary disk)
root (hd0,0)
kernel /vmlinuz root=/dev/rootvg/root
initrd /initrd
# For booting Linux from our new LVM mirror
title GNU/Linux (LVM mirror disk)
root (hd1,0)
kernel /vmlinuz root=/dev/rootvg/root
initrd /initrd
This only needs to be copied to /boot/grub/menu.lst. Once the /boot
mirror finishes sync'ing, it's safe to reboot the box (if necessary).
Monitoring and Maintenance
One thing that is often overlooked with disk mirroring is the
need to monitor it. Disk mirroring won't save you if both disks
fail, so with a setup like this, you must monitor the status of
the /proc/mdstat pseudo-file, looking for any disks that are marked
as having failed. If any have failed, replace them in a timely manner.
If possible, you should integrate this type of check into your
existing monitoring software, or write a script to email you if
any part of your RAID set shows up as faulty.
In this excerpt from /proc/mdstat, /dev/hdc2 is showing errors,
so the md driver has marked it faulty and taken it out of the md2
RAID set:
md2 : active raid1 hda2[0] hdc2[1](F)
9644928 blocks [2/1] [U_]
In the event of a complete disk failure (or more commonly of a
disk showing errors), you'll need to complete the following steps
to replace it:
1. Make sure that all the partitions on the physical disk are
marked as faulty. Commonly, the second partition will show errors
(simply because it is larger and has more blocks that can fail),
but the first still seems to be fine. If this is the case, mark
any partitions on the problem disk as faulty. Here, we're marking
/dev/hdc1 as a faulty partition:
# raidsetfaulty /dev/md1 /dev/hdc1
2. Shut down the machine and put in the new disk.
3. Boot to the other disk in the machine (or to your GRUB floppy
or CD if the other disk isn't bootable from the BIOS).
4. Partition the disk as before, and then use the raidhotadd
command to start sync'ing mirrors to the new disk.
If you have hot-swappable disks, you can likely skip bringing
the machine down. Simply set the disk to be swapped as faulty and
replace the disk while the machine is running. Once the new disk
is in place, partition it and use the raidhotadd command
to start the mirror synchronization.
In either case, this is much easier and quicker than rebuilding
the machine from backups.
Growing Filesystems
Because we have several gigabytes of disk space in reserve, we
can now grow filesystems as needed (except for the /boot filesystem,
which isn't an LVM volume). And, because we used reiserfs for all
filesystems, we can grow them while they're mounted and in use.
If, for example, we start running out of space in /var, we can add
another gigabyte to it with the following commands:
# lvextend -L +1G /dev/rootvg/var
# resize_reiserfs /dev/rootvg/var
The first command extends the volume by 1GB, and the second grows
the filesystem to fill the new space.
Conclusion
I've found the process described here very useful for mirroring,
and I use it on all my production Linux machines. The ability to
grow volumes and filesystems online really makes the Linux offerings
competitive with those of proprietary versions of Unix. With the
introduction of the SATA standard for disks, I hope we can look
forward to hot-swappable disks even in low-end machines.
Although this article focuses on a relatively specific configuration,
the techniques used in it are applicable to many alternate situations
(for instance, using SCSI disks instead of IDE, or having the LVM
based on a RAID 5 or striped setup). It's also possible to set up
a similar configuration using the 2.6 kernel series with the same
md driver and LVM2 (the successor to the 2.4 kernel's LVM driver).
Jeff Layton has been working with computers since he got his
paws on a TRS-80 in 1981 at age 10. After working in the Macintosh
and PC realm for more than a decade and attaining a degree in Physics
Education, he came to his senses and made the jump to Unix administration
in 1996. He is currently employed as a Senior Unix Systems Administrator
at Bowe Bell and Howell.
|