Linux – Why is the find -type d performing fstat on every file in a folder

bashfindlinuxshell

I'm running find . -type d on a rather large directory tree. I am only interested in finding directories within this tree, but when I ran an strace against the process to make sure it was doing what I expected it to be doing, I noticed that there are a huge amount of operations being wasted running fstat against files within the tree.

newfstatat(AT_FDCWD, "file1", {st_mode=S_IFREG|0600, st_size=7690, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "file2", {st_mode=S_IFREG|0600, st_size=7696, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "file3", {st_mode=S_IFREG|0600, st_size=7687, ...}, AT_SYMLINK_NOFOLLOW) = 0
newfstatat(AT_FDCWD, "file4", {st_mode=S_IFREG|0600, st_size=10455, ...}, AT_SYMLINK_NOFOLLOW) = 0

Is find not aware that an inode is pointing to a directory until it performs an fstat? If that's the case, then this is going to take a long time. Some of these directories likely have millions of items within them, but I really only care about directories.

Ultimately I would like a report of the dirsize and path of each of the directories in my file tree. What's the fastest/most efficient way for me to do that?

Best Answer

Yes, it looks like it really is the case that find is using fstat to determine the type of the file. This is mildly surprising given that dirent has contained the information since kernel 2.6.4.

Not all filesystems have support for the extended dirent behaviour so either this is true in your case or find doesn't use it. Without knowing your filesystem type we can't decide.

Related Topic