Process has opened inode that’s not on any filesystem

inodeprocesssolaris

So I'm trying to find out if the stderr of a process has been redirected to somewhare unusual (it's a java process and I want a thread dump, but it's launched through a nest of startup scripts).

I find my process with pgrep, and use pfiles to see what's there:

4366:   /foo/bar/platform/solaris2/jre_1.5.0/bin/java -Xmx2048m -Xms10
Current rlimit: 65536 file descriptors
 0: S_IFCHR mode:0666 dev:302,0 ino:6815752 uid:0 gid:3 rdev:13,2
    O_RDONLY|O_LARGEFILE
    /devices/pseudo/mm@0:null
 1: S_IFREG mode:0640 dev:85,56 ino:26471 uid:0 gid:0 size:10485812
    O_WRONLY|O_LARGEFILE
 2: S_IFREG mode:0640 dev:85,56 ino:26471 uid:0 gid:0 size:10485812
    O_WRONLY|O_LARGEFILE
 3: S_IFCHR mode:0666 dev:302,0 ino:6815772 uid:0 gid:3 rdev:13,12

So I can see that stdout and stderr (file descriptors 1 and 2) are pointing to the same place; I think they are redirected to the same file in the startup scripts so this tallies.

But when I look for a file with inode number 26471, I see this:

# find / -inum 26471
/usr/share/man/man3mlib/mlib_MatrixScale_S16_U8_Sat.3mlib
/proc/4366/fd/1
/proc/4366/fd/2
/proc/4366/fd/83

The first hit is (I'm certain) a file on a different filesystem. The three entries in /proc are fds my process has open.

Looking in /proc/4366, I can't see any more info than I get from pfiles.

# ls -li 0 1 2 3
   6815752 c---------   1 root     sys       13,  2 Jan 20 14:10 0
     26471 --w-------   0 root     root     10485812 Jan 20 13:42 1
     26471 --w-------   0 root     root     10485812 Jan 20 13:42 2
   6815772 c---------   1 root     sys       13, 12 Jun  7  2009 3
# file 0 1 2 3
0:              character special (13/2)
1:              ascii text
2:              ascii text
3:              character special (13/12)

(I can tail one of these fds and work out which file it is from that. I'm asking because I clearly don't understand the relationship between the fds and the inodes in enough depth).

So my process is writing to something (on some device, with inode 26471) and the data is then getting into a file with a different inode number. Can anyone give me an idea of what this something might be (or even let me know if my reasoning so far is totally broken)?

Best Answer

AFAIK, find searches the filesystem's directories. If that file was deleted but still existing because it's open (a common trick on unix), it won't be found by find.

I haven't tried in Solaris, but here is a note about using lsof to identify such 'deleted but open' files, and recovering via a cat /proc/<procid>/fd/<fdid> > /tmp/xxxx

Edit:

it seems you've already identified this is the case, but still wondering how is it possible. here's a short explanation:

on POSIX filesystem's, files are handled by its inode, and the directories are little more than a "path => inode" mapping. You can have more than one path 'pointing' to the same inode (it's called a hardlink), and the inode keeps a count of how many links it has. The rm command simply calls unlink() on this path, which reduces the link count and 'possibly' deletes the file itself.

But a path on the directory tree isn't the only possible reference to an inode, an open fd on a running process also counts, and a 'deleted' file won't be really removed until it goes to 0.

As i mentioned in passing above, it's a common trick: if you have a temporary file that you don't care to keep after your process finishes running, just open it and immediately 'delete' it. The opened handle will work reliably, and when your process finishes (either normally, killed or crashing), the system will remove the handle and cleanly delete the temporary file.

A logfile isn't a likely candidate for such a 'hidden autodeleting' file; but it's not hard to do accidentally.

Since your deleted logfile is still live and collecting data, it seems that simply copying the content wouldn't help much. so try creating a new hardlink to the /proc//fd/ file, something like ln /proc/4366/fd/1 /tmp/xxxx. Note there's no -s flag, so ln should create a new hardlink with the same inode as the original, not a symbolic link (which is little more than a pointer to an existing path, and not what you want).

Edit:

The ln /proc/... /tmp/... command can't work because /proc and /tmp are in different filesystems. Unfortunately, I don't know any way to create a pathname for an existing inode. One would want that the link() syscall would take an inode number and a path, but it takes source and destination paths.