[mdlug] "Device busy" errors and RAID setup issues
David McMillan
skyefire at skyefire.org
Fri Nov 20 20:06:47 EST 2009
Jeff Hanson wrote:
> On Fri, Nov 20, 2009 at 7:41 PM, David McMillan <skyefire at skyefire.org> wrote:
>> Well, I did that. Successfully created /dev/md0. /proc/mdstat showed
>> /dev/md0 active and containing all of the hard drives except the one I'd
>> left as "missing" in the initial setup. I even put a label on it and
>> created a filesystem on it.
>> Then I rebooted the server. And now /dev/md0 is gone. /proc/mdstat
>> just gives me this:
>> sudo mdadm /dev/md0 --add /dev/sde1
>> mdadm: error opening /dev/md0: No such file or directory
>> david at Archive:~$ cat /proc/mdstat
>> Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5]
>> [raid4] [raid10]
>> md_d0 : inactive sdf1[3](S)
>> 976759936 blocks
>>
>> /dev/sdf1 was the last drive I added to /dev/md0. /dev/sde1 was the
>> one I couldn't add due to the "device busy" error. Before the reboot,
>> /dev/sdb, c, d, and f all showed in /proc/mdstat. What the heck? Why
>> would a RAID array that was fully built and appeared to be fine before
>> the reboot just vanish after? And why does sdf1, and only sdf1, still
>> show up in /proc/mdstat?
>
> Check the boot log. It may failed during boot and is not active.
Well, it just gets stranger: here's every entry in /var/log that
mentions md0, for the time of the reboot:
Nov 20 19:43:47 Archive kernel: [ 1513.065948] raid10: raid set md0
active with 4 out of 4 devices
Nov 20 19:43:47 Archive kernel: [ 1513.067213] md0: p1
Nov 20 19:43:47 Archive kernel: [ 1513.108103] md: resync of RAID array md0
Nov 20 19:43:47 Archive kernel: [ 1513.038785] md: md0: raid array is
not clean -- starting background reconstruction
Nov 20 19:43:47 Archive mdadm[3041]: NewArray event detected on md
device /dev/md0
Nov 20 19:43:47 Archive kernel: [ 1513.065948] raid10: raid set md0
active with 4 out of 4 devices
Nov 20 19:43:47 Archive kernel: [ 1513.067213] md0: p1
Nov 20 19:43:47 Archive kernel: [ 1513.108103] md: resync of RAID array md0
Nov 20 19:43:47 Archive mdadm[3041]: RebuildStarted event detected on md
device /dev/md0
That's a whole lot of action in quick succession. I'm no expert, but
it looks like the system starts a resync, generates a fault (no idea
why, I hadn't even put any files on it yet) and begins background
reconstruction.
*Then* there's a NewArray event, which doesn't make any sense since
that array was a good 24hrs old at this point. Unless it's some sort of
flag that an array is new and hasn't been used yet?
Then, finally, there's a RebuildStarted event, but as I understand it,
that should have shown up in /proc/mdstat.
> t this point I would suspect hardware faults, BIOS faults, kernel bug,
> or a faulty drive (not necessarily in that order). Try building a
> smaller array with just sde and another drive using the same ports.
> Check their SMART status using smartctl just in case.
The quick smartctl --all against all five RAID-wannabe disks comes up
with no errors.
More information about the mdlug
mailing list