I have several xen guest OSs that get their root file system from NFS. I changed /etc/network/interfaces on some of them (on the nfs server) and then rebooted them. Now I get lots of 'Stale NFS handles' when booting them up. I've rebooted the guest OSs a few times and I'm getting the same problem. How do I fix this?
Nfs – Stale NFS handle
nfsvirtualizationxen
Related Solutions
Short version:
Check that you have /etc/udev/rules.d/xen-backend.rules
. The file may or may not be prefixed by a number.
If not, check whether you have /etc/udev/xen-backend.rules
and create a symlink from that to /etc/udev/rules.d/xen-backend.rules
.
Long version:
I've seen this with a Gentoo 3.3 dom0, not CentOS. But I suspect the fix will be the same or similar.
The Xen build scripts call a command udevinfo -V
to determine the version of udev installed on the machine. The udevinfo
utility was depreciated a while ago in favour of udevadm
. In more recent releases of udev the old utility has been removed altogether.
The build scripts use the version of udev acquired as described to determine what installations steps need to be undertaken. If it can't find/match the udev version then it won't installed the required udev rule. By not having udevinfo
present, this is what's occurring.
Now it's probably given that you don't want to downgrade udev. So that leaves two solutions.
You can either check whether your package distributor has fixed the issue. For instance it is fixed in Xen 4.4 on Gentoo in accordance with this bug.
Alternatively, you can work around it temporarily, by fooling it that udevinfo
is still present and behaving in the way that it expects. We can do this by scripting/proxying to the new udevadm
command:
# echo -e '#!/bin/bash\n/sbin/udevadm info $1' > /usr/bin/udevinfo
# chmod +x /usr/bin/udevinfo
*** Install Xen ***
# rm /usr/bin/udevinfo
This will get it working again. But you will still need to fix the issue in the long run.
I don't suppose this for kernel modules or portage trees, is it? That's what I've seen this mechanism used for...
So, sure enough it's easy to have all of your guests have a filesystem image file attached to them as a read-only block device. It's also very straightforward to have that mounted somewhere in the guest (/etc/fstab
and all that Jazz). Ownerships you'll presumably take care of in the block device anyway (assuming you're using a filesystem type that stores that metadata -- but if you're using, say, VFAT, ownership is only a mount option away anyway).
The trick is handling updates. Once you've got your "block device" mounted in any guest, nothing can be allowed to update it. It just won't work, because nobody knows that someone else is updating the contents, so everything falls apart. Instead, you need to create a copy of the file with the filesystem image, make whatever changes are needed, and then trigger some sort of update action to make the guests unmount the old "filesystem", then the dom0 can detach the old file and attach the new one, before the guest remounts the filesystem.
In the cases I've used this, we actually had some code in the domU config files (since they're just Python anyway) to find the newest of these block devices and attach that, then the usual boot-time mounts did the right thing. So, for us, the "update process" was "reboot the guest". Whether that works for you, though, is a question I can't answer because I don't know what you're trying to use this for.
Alternately, just have a second NFS server that is only used for supplying these files to your domUs. It's probably easier than all this block device frufru (we had some pretty specific requirements that made it the least-worst option, but I don't expect they apply in your case -- in fact, I know they don't apply in your case, because you've already got an NFS server).
Best Answer
Did you reboot the NFS server? Did you do some sort of bulk move, rename or deletion of files or directories on the server? Are the clients changing files that other clients are trying to access?
The normal source of a "stale NFS file handle" is files being removed on the server. Especially if a directory is removed. The usual fix is unmounting and remounting the volume, or rebooting the client. With some NFS server implementations, rebooting the server can cause this error, too.
It sounds like there's something else going on here than the usual causes and more detail might be needed.