[mdlug] Major Malfunction - Fileserver Up - Problem Solved

Robert Adkins II radkins at impelind.com
Mon Dec 13 14:35:48 EST 2010


I need to get into the habit of writing out the issues AND what I have done
to this list more often and much sooner.
 
Putting myself into the position of explaining the issue ended up helping me
see what the damnable problem was from the get go.
 
The external backup Hard Drive wasn't connected to the server. Comment out
that drive in fstab and... it booted fine.
 
Looking at the logs, it does look like there were a series of issues across
the filesystems from the crude shutdown foisted upon the server and the
datafile RAID required a manual fsck to correct a number of those.
 
Thanks for reading.
 
Rob
 


  _____  

From: Robert Adkins II [mailto:radkins at impelind.com] 
Sent: Monday, December 13, 2010 2:01 PM
To: 'MDLUG's Main discussion list'
Subject: Major Malfunction - Fileserver down


I received a call this morning that "Everything is down".
 
Turns out, that originally the Internet wasn't working and so, the logical
solution is go into the server closet with a proverbial hammer and beat the
living shit out of all of the equipment. (/end rant)
 
We have a router problem, the router needs to be power cycled, guess what
singular piece of equipment wasn't power cycled?
 
Instead, the Fileserver and the Email/Proxy Server were power cycled.
 
The Email/Proxy server came up, no problem.
 
The Fileserver is being incredibly stubborn. (OpenSuSe 11.1) It stated the
following: 
 
error on stat() /dev/disk/by-id/"The actual identifier of the disk,
including make and serial number"
 
Then it stated that essentially every single partition on that disk is
inaccessible.
 
I ran fsck and it mentioned the superblock being hosed. So, I got ahold of
TestDisk and ran that, which was able to determine all of the Superblock
locations and the block size. Ran fsck to replace the Superblock and it
seemed to do so quite fine and then wanted to repair a ton of inodes. Reran
fsck to double check things. Got a "clean" on each drive. Rebooted, same
error message as before.
 
I am beginning to think that this is referring to something specific with
the RAID1 setup, which is only used for the data drives, the OS is by
itself. The BIOS has the RAID1 set configured and the OS installation saw
the entire RAID as a single disk composed of both, installed fine and no
issues in operation.
 
I commented out the partitions in fstab, just to see if I could get the OS
to fully boot, ignoring the data drives, for the time being. Same error
message came up.
 
 So, there is something going on with the RAID setup and how the OS views
the RAID when booting. Any pointers/suggestion?
 
    Thanks,
    Rob




More information about the mdlug mailing list