Python – Watching file changes/additions/removal, but with an eye on partial transfer

file-systemsfilespython

I would like to monitor the filesystem in python, so that my application gets warned of the new file addition, file removal, or file change. Once the file is detected, the application starts extracting the contained data through various plugins. The problem is that I am dealing with big files, and when the user starts copying a file from outside into the watched directory, it will be detected, but it will appear as corrupted. Checking for file size between invocations is potentially a good strategy, but it ignores the fact that other generators of the file (such as wget) might have long pauses when the file is not changing in size, and yet is not completed. I don't have control of the file format I am downloading either, so I can't check for an end-of-file mark, because it could not be there.

Is there a cross platform way (or different, platform specific ways for linux and windows) to solve this problem? Can I check if a file is currently open
somehow? How is this problem solved in other software?

Best Answer

Since you know which file is changed but not by which process this is hard but solvable. You can use psutil to handle this.

With psutil you can iterate over all running processes and ask them for their open files. Written as pseudo code:

for process in psutil open processes:
    for file in process all open files:
        check if this your file:
            yes => remember pid
pid is set:
    yes => wait for the process to finish
do your work with the file

This is an easy but not very performing way to handle your problem.