Linux – How dangerous is NFS async when RAID BBU and UPS are present

linuxnfsUbuntu

I have an NFSv3 server and around 15 clients. I am looking for pros and cons enabling async on the server side. I have read about it, but it is still a bit unclear to me. I know it can lead to data corruption, if the server crashes in the middle of write operation. However, I also read that the client stores a cache of that same operation and can recover it, if needed. My questions are:

What exactly would happen if my server crashes (i.e. would it lose pending-to-be-written data, would it corrupt the underlying filesystem, etc)?;
What would happen if both the server and the client crash at the same time (i.e. power failure/fault and UPS failure to handle it)?;
What if the server crashes, but I have RAID BBU. Would the server recover safely?;
Is there any way to detect such a corruption (something similar to fsck maybe)?;
What if the server shutdown gracefully by UPS? Will I have chances of data corruption then?;
What do you guys use – sync or async?

All machines are Ubuntu OS 10.04.

I was trying to find similar question here to no available. I have read the NFS Home Page and took a quick look at Managing NFS and NIS, 2nd Edition book.

Best Answer

So what the NFSv3 spec says, is basically that for the following two NFS data operations

WRITE operation with the stable bit set
COMMIT

the server is allowed to return success to the client only after the data has hit stable storage. This is what the Linux NFS server implements with the default "sync" export option. With "async", the server can cheat and return success even though data is not on stable storage.

That is, the potential corruption issue with async is basically something along the following

Server returns success for a WRITE or COMMIT operation
Client sees the success, and at some point deletes the pages from its own cache (why waste space keeping them around since they are already on server storage, it thinks)
Server crashes, thus losing the data which was not committed to stable storage
Client reconnects to the server, but as there is no log of which data was written or not, it cannot know exactly which data was lost.

Now, the last point is the serious thing, in that there is no way to know which data was lost/corrupted or wasn't.

OTOH, if the client crashes, then any dirty data in the client cache (that hasn't been flushed) will be lost, but the client programmer can work around it (i.e. only after fsync() or close() returns success can the programmer assume data is on stable storage).

Related Solutions

Linux – nfs file access statistics

I've been banging my head against this problem all day, with no solution found.

You can turn up the debugging of the NFS server, but that doesn't provide much detail (if that exmaple is accurate) and will probably dominate a busy NFS server's disk with useless baggage logged in addition to the bare filenames.

Another solution is adding rules to auditd/auditctl to log all reads or writes to the NFS directories, but that doesn't work for our Centos 6.X machines, for reasons I can't quite figure out yet. In /etc/audit/audit.rules on a client machine:

# First rule - delete all
-D

# Increase the buffers to survive stress events.
# Make this bigger for busy systems
-b 8192

# Feel free to add below this line. See auditctl man page
-w /auto/ -p r -k read -k home
-w /auto/ -p w -k write -k home
-w /auto/ -p xa -k other -k home

...where I've given separate keys to reading, writing, and executing/changing attributes. My clients are autofs'd to mount a few different NFS directories, including their home directory, to /auto/ with soft links pointing the client machine's /home/users/ back to /auto/. I get logging of lots of stuff, but none of the files the users themselves seem to be modifying.

Troll the audit logs with ausearch -k read | aureport -f, for instance. grepping for .ODT or .PDF comes up with nothing, the only results are for metacity's configs, Chrome's crap, etc., etc.

Naturally, enabling audit on the server pointing at the real /home/users/XYZ only shows accesses from things interfacing with the server directly (mail clients) or users SSH'd into the server.

If you can figure out the right recipe for audit, or a dedicated solution all together, please, please, please share it! You'd think this would have been solved in 1993.

Mac OS X – Fixing Client Crashes When Mounting Linux Server via NFSv4

You may want to try using NFS Manager to help you configure your NFS mounts. It's a lot easier to use than Apple's Disk Utility.

Best Answer

Related Solutions

Linux – nfs file access statistics

Mac OS X – Fixing Client Crashes When Mounting Linux Server via NFSv4

Related Topic