Python – After writing to a file, why does os.path.getsize still return the previous size

I am trying to split up a large xml file into smaller chunks. I write to the output file and then check its size to see if its passed a threshold, but I dont think the getsize() method is working as expected.

What would be a good way to get the filesize of a file that is changing in size.

Ive done something like this…

import string
import os

f1 = open('VSERVICE.xml', 'r')
f2 = open('split.xml', 'w')

for line in f1:
  if str(line) == '</Service>\n':
    break
  else:
    f2.write(line)
    size = os.path.getsize('split.xml')
    print('size = ' + str(size))

running this prints 0 as the filesize for about 80 iterations and then 4176. Does Python store the output in a buffer before actually outputting it?

Best Answer

File size is different from file position. For example,

os.path.getsize('sample.txt')

It exactly returns file size in bytes.

But

f = open('sample.txt')
print f.readline()
f.tell()

Here f.tell() returns the current position of the file handler - i.e. where the next write will put its data. Since it is aware of the buffering, it should be accurate as long as you are simply appending to the output file.

Best Answer

Related Solutions

Python – How to split a huge text file in python

C# – How to get the file size in C#

Related Topic