Handling file system failure

Clustering : Handling file system failure

iHub handles file system failure on stateless and stateful file systems. This overview uses Network File System (NFS) as an example of a stateless network file system and Common Internet File System (CIFS) as an example of a stateful network file system.

iHub handles some file system failures by retrying file I/O. Retrying file I/O works when a file system failure is transparent to iHub. For example, on an NFS‑based network storage system, a file system failure can be transparent to iHub. Retrying file I/O is insufficient in a configuration where file system failure is not transparent to iHub, such as on a Windows-based CIFS file system.

On a stateless file system such as an NFS-based file system, iHub can handle a network storage system failure. The machine detects that the connection to a file system is lost and attempts to reconnect. When the file system recovers, the machine re‑establishes a connection to the file system. If the connection to the file system does not time out during failure, iHub does not detect the failure.

On a stateful file system such as a Microsoft Windows-based CIFS network file system, a machine using the file system tracks file system connection states, including open files and locks. If the file system connection breaks, the machine loses connection state information. The CIFS client machine must manually reestablish file system connections. iHub can re‑establish file system connections on a stateful network file system.

iHub identifies a file system failure as a failure of the following file I/O functions:

Reading the configuration lock file

Reading the Encyclopedia volume lock file

Reading or writing to an Encyclopedia volume

Failure to read the configuration lock file affects the cluster nodes. The other two I/O failures affect the Encyclopedia volume.