I'm downloading an entire directory from a web server. It works OK, but I can't figure how to get the file size before download to compare if it was updated on the server or not. Can this be done as if I was downloading the file from a FTP server?
import urllib
import re
url = "http://www.someurl.com"
# Download the page locally
f = urllib.urlopen(url)
html = f.read()
f.close()
f = open ("temp.htm", "w")
f.write (html)
f.close()
# List only the .TXT / .ZIP files
fnames = re.findall('^.*<a href="(\w+(?:\.txt|.zip)?)".*$', html, re.MULTILINE)
for fname in fnames:
print fname, "..."
f = urllib.urlopen(url + "/" + fname)
#### Here I want to check the filesize to download or not ####
file = f.read()
f.close()
f = open (fname, "w")
f.write (file)
f.close()
@Jon: thank for your quick answer. It works, but the filesize on the web server is slightly less than the filesize of the downloaded file.
Examples:
Local Size Server Size
2.223.533 2.115.516
664.603 662.121
It has anything to do with the CR/LF conversion?
Best Answer
I have reproduced what you are seeing:
Outputs this:
What am I doing wrong here? Is os.stat().st_size not returning the correct size?
Edit: OK, I figured out what the problem was:
this outputs:
Make sure you are opening both files for binary read/write.