Software RAID in Linux - overview

This article focuses on managing software RAID level 1 (RAID1) in Linux, but similar approach could be used to other RAID levels.

Software RAID in Linux we use can be managed with mdadm tool.

Devices used by RAID are /dev/mdX, X being the number of a RAID device, for example /dev/md0 or /dev/md1.

To list all devices in the system, including RAID devices, use fdisk:

Warnings about the lack of a "valid partition table" are normal with swap on md devices.


Using mdadm tool

Viewing RAID devices

This one shows details for the device /dev/md0 - it has two RAID/active/working devices, both are active and are in sync:

Simulating hardware failure

This one shows details for the device /dev/md1 - it has two RAID devices, and only one of them is active and working. One RAID device is marked as removed - this was caused by a simulated hardware failure:

  • booting with only the first disk to see if RAID is configured properly,
  • booting with only the second disk to see if RAID is configured properly,
  • booting the server again with both disks.

One RAID device is marked as "removed", because it is not in sync (is "older") with the other ("newer") device.


Recovering from a simulated hardware failure

This part is easy: just mark the device as faulty, remove it from the array, and then add it again - it will start to reconstruct.

Setting the device as faulty:

Remove the device from the arrry:


Add the device to the array:


Check what the device is doing:

As we can see, it's being rebuilt - after that process is finished, both devices (dev/sda1 and /dev/sdb1) will be marked as "active sync".

You can see in /proc/mdstat how long this process will take and at what speed the reconstruction is progressing:

Recovering from a real hardware failure

This process is similar to recovering from a "simulated failure":

To recover from a from a real hardware failure, do:

  • make sure that partitions on a new device are the same as on the old one:
  • create them with fdisk (fdisk -l will tell you what partitions you have on a good disk; remember to set the same start/end blocks, and to set partition's system id to "Linux raid autodetect")
  • consult /etc/mdadm.conf file, which describes which partitions are used for md devices
  • add a new device to the array:

Then, you can consult mdadm --detail /dev/md0 and/or /proc/mdstat to see how long the reconstruction will take.

Make sure you run lilo when the reconstruction is complete - see below.

RAID boot CD-ROM

It's always a good idea to have a CD-ROM, from which you can always boot your system (in case lilo was removed etc.).

It can be created with mkbootdisk tool:

Then, just burn the created ISO.

If everything fails

If everything fails - the system doesn't boot from any of the disks nor from the CD-ROM, you have to know that you can easily "see" files on RAID devices (at least on RAID1 devices) - just insert any Live Linux distribution, and boot the system - you should see the files on normal /dev/sdX partitions - you can copy the files to the remote system for example with scp.

You can manually assemble a RAID device using commands below:

Installing lilo

You have to install lilo on all devices if you replaced the disks:


If lilo gives you a following error:

This may mean two things:

  • RAID is being rebuilt - check it with cat /proc/mdstat, and try again when it's finished.
  • Another is that the first device in the RAID array doesn't exist, such as when building a degraded array with only one device. If you stop the array and reassemble it so that the active device is first, lilo should start working again.

Example lilo.conf for RAID