Ubuntu Linux Software Raid – Replacing a Failing Drive



Sometime back I setup my home server running Ubuntu Linux (6.06 – Dapper Drake LTS). I used two pairs of drives to do raid cloning. Two IDE drives were for the main system structure and two SATA drives for Audio/Video storage as well as CDimages and other large file sharing on the local network. Well… I noticed the hard drive light was on solid and sure enough one of the two SATA drives had failed. (I didn’t get my status email because I’d done a network structure change and didn’t update my local mail setup…) Anyway… replacing it was a pain in the neck only for the physical access to the box. Everything else worked as it should.

I had made a file with the contents of the partition table when I first set things up which helped…

So from another pc (with the new drive attached as a usb disk) I ran…

sudo sfdisk /dev/sdb < partitiontable.sda

(sda was the one from the old setup that had failed.) Then I logged in and removed the failed drive from the arrays.

sudo su
mdadm /dev/md2 –fail /dev/sda6 –remove /dev/sda6
mdadm /dev/md1 –fail /dev/sda5 –remove /dev/sda5
mdadm /dev/md0 –fail /dev/sda1 –remove /dev/sda1

(Remember those long –‘s are really a double hyphen – wordpress is funny that way and interprets them differently.)

I shut down the machine in question and carefully pulled it out where I could work on it. (Those front loading hard drive trays would be REALLY nice for my home setup.) I figured out which was sda according to the system board, disconnected it and tested the bios to make sure. Sure enough I had pulled the correct one.

I shut down again and removed the old drive entirely, replacing it with the new drive and hooking up the cables. Then I powered back up to check bios again. All is good. I powered things back down to put the cover on the case and move it back into it’s cubby hole.

All booted up just fine – doing a “cat /proc/mdstat” showed just one drive for the raid drives of md0, md1 and md2. So, I just did the following…

mdadm /dev/md2 –add /dev/sda6
mdadm /dev/md1 –add /dev/sda5
mdadm /dev/md0 –add /dev/sda1

and checked “cat /proc/mdstat” again to find that they were in process of syncing and all looked healthy. The syncing process will take a while (these are 400GB drives) especially if there’s is change to the content during this proces, but once that’s done it should be a perfectly healthy raid array again.

While I was at it I ordered a pair of drives to start rebuilding my desktop system with a raid array for the primary system file structure and the home partitions. I like the redundancy that simple software raid gives. (I’ve talked before about even adding a third drive in temporarily as a backup drive. USB may be a bit slow for doing that frequently, but as a snapshot it’s not a bad approach from time to time.)

   Send article as PDF   

Similar Posts