Software RAID in Linux

From lxadm | Linux administration tips, tutorials, HOWTOs and articles
Jump to: navigation, search

Software RAID in Linux - overview

This article focuses on managing software RAID level 1 (RAID1) in Linux, but similar approach could be used to other RAID levels.

Software RAID in Linux we use can be managed with mdadm tool.

Devices used by RAID are /dev/mdX, X being the number of a RAID device, for example /dev/md0 or /dev/md1.

To list all devices in the system, including RAID devices, use fdisk:

# fdisk -l

Disk /dev/md0: 1044 MB, 1044512768 bytes
2 heads, 4 sectors/track, 255008 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/sda: 80.0 GB, 80032038912 bytes
255 heads, 63 sectors/track, 9730 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1         127     1020096   fd  Linux raid autodetect
/dev/sda2             128        9730    77136097+  fd  Linux raid autodetect

Disk /dev/sdb: 80.0 GB, 80032038912 bytes
255 heads, 63 sectors/track, 9730 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         127     1020096   fd  Linux raid autodetect
/dev/sdb2             128        9730    77136097+  fd  Linux raid autodetect

Disk /dev/md1: 78.9 GB, 78987264000 bytes
2 heads, 4 sectors/track, 19284000 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

    Device Boot      Start         End      Blocks   Id  System


Warnings about the lack of a "valid partition table" are normal with swap on md devices.

Using mdadm tool

Viewing RAID devices

This one shows details for the device /dev/md0 - it has two RAID/active/working devices, both are active and are in sync:

# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Wed Nov 30 20:42:26 2005
     Raid Level : raid1
     Array Size : 1020032 (996.29 MiB 1044.51 MB)
    Device Size : 1020032 (996.29 MiB 1044.51 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Dec  1 13:04:19 2005
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           UUID : 131294eb:84dbaed1:e44abf9b:340c65a3
         Events : 0.65

    Number   Major   Minor   RaidDevice State
       0       8        1        0      active sync   /dev/sda1
       1       8       17        1      active sync   /dev/sdb1

Simulating hardware failure

This one shows details for the device /dev/md1 - it has two RAID devices, and only one of them is active and working. One RAID device is marked as removed - this was caused by a simulated hardware failure:

  • booting with only the first disk to see if RAID is configured properly,
  • booting with only the second disk to see if RAID is configured properly,
  • booting the server again with both disks.
# mdadm --detail /dev/md1
/dev/md1:
        Version : 00.90.01
  Creation Time : Wed Nov 30 20:42:26 2005
     Raid Level : raid1
     Array Size : 77136000 (73.56 GiB 78.99 GB)
    Device Size : 77136000 (73.56 GiB 78.99 GB)
   Raid Devices : 2
  Total Devices : 1
Preferred Minor : 1
    Persistence : Superblock is persistent

    Update Time : Thu Dec  1 14:25:12 2005
          State : clean, degraded
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           UUID : c36b5402:58ba0631:b2266f01:15bb8173
         Events : 0.19308

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8        2        1      active sync   /dev/sda2

One RAID device is marked as "removed", because it is not in sync (is "older") with the other ("newer") device.


Recovering from a simulated hardware failure

This part is easy: just mark the device as faulty, remove it from the array, and then add it again - it will start to reconstruct.

Setting the device as faulty:

# mdadm /dev/md0 -f /dev/sda1
mdadm: set /dev/sda1 faulty in /dev/md0


Remove the device from the arrry:

# mdadm /dev/md0 -r /dev/sda1
mdadm: hot removed /dev/sda1


Add the device to the array:

# mdadm /dev/md0 -a /dev/sda1
mdadm: hot added /dev/sda1


Check what the device is doing:

# mdadm --detail /dev/md0
/dev/md0:
        Version : 00.90.01
  Creation Time : Wed Nov 30 20:42:26 2005
     Raid Level : raid1
     Array Size : 1020032 (996.29 MiB 1044.51 MB)
    Device Size : 1020032 (996.29 MiB 1044.51 MB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0
    Persistence : Superblock is persistent

    Update Time : Thu Dec  1 15:10:29 2005
          State : clean, degraded, recovering
 Active Devices : 1
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 1

 Rebuild Status : 38% complete

           UUID : 131294eb:84dbaed1:e44abf9b:340c65a3
         Events : 0.68

    Number   Major   Minor   RaidDevice State
       0       0        0        -      removed
       1       8       17        1      active sync   /dev/sdb1

       2       8        1        0      spare rebuilding   /dev/sda1

As we can see, it's being rebuilt - after that process is finished, both devices (dev/sda1 and /dev/sdb1) will be marked as "active sync".

You can see in /proc/mdstat how long this process will take and at what speed the reconstruction is progressing:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sda1[2] sdb1[1]
      1020032 blocks [2/1] [_U]
      [==>..................]  recovery = 13.0% (133760/1020032) finish=0.2min speed=66880K/sec
unused devices: <none>


Recovering from a real hardware failure

This process is similar to recovering from a "simulated failure":

To recover from a from a real hardware failure, do:

  • make sure that partitions on a new device are the same as on the old one:
    • create them with fdisk (fdisk -l will tell you what partitions you have on a good disk; remember to set the same start/end blocks, and to set partition's system id to "Linux raid autodetect")
    • consult /etc/mdadm.conf file, which describes which partitions are used for md devices
  • add a new device to the array:
# mdadm /dev/md0 -a /dev/sda1
mdadm: hot added /dev/sda1

Then, you can consult mdadm --detail /dev/md0 and/or /proc/mdstat to see how long the reconstruction will take.

Make sure you run lilo when the reconstruction is complete - see below.

RAID boot CD-ROM

It's always a good idea to have a CD-ROM, from which you can always boot your system (in case lilo was removed etc.).

It can be created with mkbootdisk tool:

# mkbootdisk --iso --device /root/raid-boot.iso `uname -r`

Then, just burn the created ISO.


If everything fails

If everything fails - the system doesn't boot from any of the disks nor from the CD-ROM, you have to know that you can easily "see" files on RAID devices (at least on RAID1 devices) - just insert any Live Linux distribution, and boot the system - you should see the files on normal /dev/sdX partitions - you can copy the files to the remote system for example with scp.

You can manually assemble a RAID device using commands below:

modprobe raid1
modprobe dm-mod
mdadm --assemble --verbose /dev/md1  /dev/sda2 /dev/sdb2

Installing lilo

You have to install lilo on all devices if you replaced the disks:

# lilo
Added linux *
Added failsafe
The boot record of  /dev/md1  has been updated.
The Master boot record of  /dev/sdb  has been updated.
The Master boot record of  /dev/sda  has been updated.


If lilo gives you a following error:

# lilo
Fatal: Trying to map files from unnamed device 0x0000 (NFS/RAID mirror down ?)

This may mean two things:

  • RAID is being rebuilt - check it with cat /proc/mdstat, and try again when it's finished.
  • Another is that the first device in the RAID array doesn't exist, such as when building a degraded array with only one device. If you stop the array and reassemble it so that the active device is first, lilo should start working again.


Example lilo.conf for RAID

default="linux"
boot=/dev/md1
map=/boot/map
keytable=/boot/us.klt
raid-extra-boot=mbr
menu-scheme=wb:bw:wb:bw
prompt
nowarn
timeout=30
message=/boot/message
image=/boot/vmlinuz
        label="linux"
        root=/dev/md1
        initrd=/boot/initrd.img
        append=" resume=/dev/md0"
        vga=791
image=/boot/vmlinuz
        label="failsafe"
        root=/dev/md1
        initrd=/boot/initrd.img
        append=" failsafe resume=/dev/md0"