[mdlug] "Device busy" errors and RAID setup issues

Fri Nov 20 20:06:47 EST 2009

Jeff Hanson wrote:
> On Fri, Nov 20, 2009 at 7:41 PM, David McMillan <skyefire at skyefire.org> wrote:
>>        Well, I did that.  Successfully created /dev/md0.  /proc/mdstat showed
>> /dev/md0 active and containing all of the hard drives except the one I'd
>> left as "missing" in the initial setup.  I even put a label on it and
>> created a filesystem on it.
>>        Then I rebooted the server.  And now /dev/md0 is gone.  /proc/mdstat
>> just gives me this:
>> sudo mdadm /dev/md0 --add /dev/sde1
>> mdadm: error opening /dev/md0: No such file or directory
>> david at Archive:~$ cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md_d0 : inactive sdf1[3](S)
>>       976759936 blocks
>>
>>        /dev/sdf1 was the last drive I added to /dev/md0.  /dev/sde1 was the
>> one I couldn't add due to the "device busy" error.  Before the reboot,
>> /dev/sdb, c, d, and f all showed in /proc/mdstat.  What the heck?  Why
>> would a RAID array that was fully built and appeared to be fine before
>> the reboot just vanish after?  And why does sdf1, and only sdf1, still
>> show up in /proc/mdstat?
> 
> Check the boot log.  It may failed during boot and is not active.

	Well, it just gets stranger:  here's every entry in /var/log that 
mentions md0, for the time of the reboot:


Nov 20 19:43:47 Archive kernel: [ 1513.065948] raid10: raid set md0 
active with 4 out of 4 devices
Nov 20 19:43:47 Archive kernel: [ 1513.067213]  md0: p1
Nov 20 19:43:47 Archive kernel: [ 1513.108103] md: resync of RAID array md0
Nov 20 19:43:47 Archive kernel: [ 1513.038785] md: md0: raid array is 
not clean -- starting background reconstruction
Nov 20 19:43:47 Archive mdadm[3041]: NewArray event detected on md 
device /dev/md0
Nov 20 19:43:47 Archive kernel: [ 1513.065948] raid10: raid set md0 
active with 4 out of 4 devices
Nov 20 19:43:47 Archive kernel: [ 1513.067213]  md0: p1
Nov 20 19:43:47 Archive kernel: [ 1513.108103] md: resync of RAID array md0
Nov 20 19:43:47 Archive mdadm[3041]: RebuildStarted event detected on md 
device /dev/md0

	That's a whole lot of action in quick succession.  I'm no expert, but 
it looks like the system starts a resync, generates a fault (no idea 
why, I hadn't even put any files on it yet) and begins background 
reconstruction.
	*Then* there's a NewArray event, which doesn't make any sense since 
that array was a good 24hrs old at this point.  Unless it's some sort of 
flag that an array is new and hasn't been used yet?
	Then, finally, there's a RebuildStarted event, but as I understand it, 
that should have shown up in /proc/mdstat.

> t this point I would suspect hardware faults, BIOS faults, kernel bug,
> or a faulty drive (not necessarily in that order).  Try building a
> smaller array with just sde and another drive using the same ports.
> Check their SMART status using smartctl just in case.

	The quick smartctl --all against all five RAID-wannabe disks comes up 
with no errors.