[mdlug] ext3 - fixing bad block in RAID 1 on root file system.

Mon Dec 22 00:41:18 EST 2008

Stan Green <Stan at mcomputersolutions.com>, on Sun Dec 21, 2008 [09:40:06 PM] said:
> I have a ext3 file system, which is the root FS, on a RAID 1 array where one 
> drive failed.  No problem, I replace it and the array starts to rebuild. It 
> gets to 99% done and finds a bad block on the source drive. So, it starts 
> over going into an endless rebuild loop. 
> 
> I booted to Knoppix and ran e2fsck -cc (Should this have been -c ?) on the 
> partition,  /dev/hda1, (not the RAID array), but that did not work. The 
> rebuild still found the bad block.
> 
> So, I need to some how mark the block bad (I have the number) so that the RAID 
> rebuild will complete.
> 
> The system seems to run fine with the bad block, it just will not rebuild the 
> RAID array. The distro is Fedora 6.
> 
> Any ideas here?
> 

	Hi;

	Well, the raid rebuild is *probably* operating on a block level,
not the filesystem level, so informing the filesystem layer about
bad blocks wont help there. (ie. the raid system knows about groups
of disk sectors, not anything about the filesystem that lies on top.)
	If this data were real important, I would get a disk image
saved, do a backup of the filesystem to another disk, and *then*
reconstruct the raid. (easiest would just be to rsync to a fresh
disk, then reconstruct the raid)
	Im not familiar enough with the raid stuff to know if you can
just remove that group of sectors, but one thing you can do, is try
to get the drive to remap the bad sector. You can try writing to the
sector, and you can have the drive itself run self diagnostics and
perhaps repair itself.

	smartctl -t long /dev/hdx  <- have the disk test itself
	smartctl -a /dev/hdx | more   <- see what it thinks

	Writing the the bad sector/block can be done with dd, ie.
if you knew sector 99999 was bad, you could do:

#> dd if=/dev/hdx of=/dev/null skip=99999 count=1 bs=512

(this would read that sector, and an error would confirm it is the
right one--- maybe skip should be 99998...)

#> dd if=/dev/zero of=/dev/hdx skip=99999 count=1 bs=512

(this would try to write to the sector: WARNING dont write to your
disk unless you know what you are doing.)

	I shouldnt really even describe using dd.... if you know
the *block* that is bad, you need to set bs=BLOCKSIZE

	[ The -cc option you mention does a read/write test, but it
will probably fail to do the write after the read fails....]


> FYI: Yes I am replacing the source drive as soon as I get the the RAID array 
> working again.
> 
	Seriously, just back up the filesystem to a fresh disk and
make a new mirror... Conventional Wisdom, is that bad sectors on a
modern (less that 10 years old) disk is usually an indicator of
impending death.
	
> Thanks,
> Stan