[mdlug] stale NFS file handles - achille's heal of linux?

Sat Jun 14 17:55:55 EDT 2008

Dean Durant wrote:
> Hello, can anyone tell answer this question:   stale NFS file handles,
 > is there an equivalent problem in every OS or just those that rely on 
NFS?

It effects every network storage system that I know of, unless
you give the client app so much information about the disk that
you violate all sorts of rules about containerization concepts.

 >
 >     Is there a better way to share files over a network?

It all boils down to the same thing -- either you can access
the remote resource, or you can't.

> 
> I had a situation recently where there was a 38 - hour job running
 > on a cluster.   We needed to add a SCSI disk to the file server
 > (separate from the cluster).   The users were screaming about
 > space for their results.    We had to reboot the file server to
 > add the disk.

This is why machines being used for that sort of purpose should
all have hot-swap disk bays.

 >                 All the NFS mounts were dead and somehow the big
 > job got killed.    This turned out to be very bad and the CAE
 > manager who hates linux blamed it all on linux.   (The CAE
 > workstations, the file server, and the cluster all run linux).
 > I didn't know what to say.   The NFS stale file handle issue
 > seems to be a sticky one.

It's the same in commercial versions of Unix (HP-UX, IRIX, AIX,
and even Solaris (Sun invented NFS), so I don't know what the
guy is complaining about.  It's a known problem which impacts
EVERY method of remote storage.  Simply put, even on a
SAN or NAS, once the chassis with the disks is powered up
again, the NFS is going to remount, but leave the old, open
file handles still pointing at the previous (stale) data
in the kernel, which is no longer being used.

> 
> Just curious to hear what people have to say.      Thanks, Dean

Your boss needs to get a clue.