One more, using urlretrieve
:
import urllib
urllib.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")
(for Python 3+ use import urllib.request
and urllib.request.urlretrieve
)
Yet another one, with a "progressbar"
import urllib2
url = "http://download.thinkbroadband.com/10MB.zip"
file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)
file_size_dl = 0
block_sz = 8192
while True:
buffer = u.read(block_sz)
if not buffer:
break
file_size_dl += len(buffer)
f.write(buffer)
status = r"%10d [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
status = status + chr(8)*(len(status)+1)
print status,
f.close()
There's a very simple answer to this: Profile the performance of your web server to see what the performance penalty is for your particular situation. There are several tools out there to compare the performance of an HTTP vs HTTPS server (JMeter and Visual Studio come to mind) and they are quite easy to use.
No one can give you a meaningful answer without some information about the nature of your web site, hardware, software, and network configuration.
As others have said, there will be some level of overhead due to encryption, but it is highly dependent on:
- Hardware
- Server software
- Ratio of dynamic vs static content
- Client distance to server
- Typical session length
- Etc (my personal favorite)
- Caching behavior of clients
In my experience, servers that are heavy on dynamic content tend to be impacted less by HTTPS because the time spent encrypting (SSL-overhead) is insignificant compared to content generation time.
Servers that are heavy on serving a fairly small set of static pages that can easily be cached in memory suffer from a much higher overhead (in one case, throughput was havled on an "intranet").
Edit: One point that has been brought up by several others is that SSL handshaking is the major cost of HTTPS. That is correct, which is why "typical session length" and "caching behavior of clients" are important.
Many, very short sessions means that handshaking time will overwhelm any other performance factors. Longer sessions will mean the handshaking cost will be incurred at the start of the session, but subsequent requests will have relatively low overhead.
Client caching can be done at several steps, anywhere from a large-scale proxy server down to the individual browser cache. Generally HTTPS content will not be cached in a shared cache (though a few proxy servers can exploit a man-in-the-middle type behavior to achieve this). Many browsers cache HTTPS content for the current session and often times across sessions. The impact the not-caching or less caching means clients will retrieve the same content more frequently. This results in more requests and bandwidth to service the same number of users.
Best Answer
Your worst case scenario isn't as bad as you think.
You are already parsing the RSS feed, so you already have the image URLs. Say you have an image URL like
http://otherdomain.com/someimage.jpg
. You rewrite this URL ashttps://mydomain.com/imageserver?url=http://otherdomain.com/someimage.jpg&hash=abcdeafad
. This way, the browser always makes request over https, so you get rid of the problems.The next part - create a proxy page or servlet that does the following -
This solution has some advantages. You don't have to download the image at the time of creating the html. You don't have to store the images locally. Also, you are stateless; the url contains all the information necessary to serve the image.
Finally, the hash parameter is for security; you only want your servlet to serve images for urls you have constructed. So, when you create the url, compute
md5(image_url + secret_key)
and append it as the hash parameter. Before you serve the request, recompute the hash and compare it to what was passed to you. Since the secret_key is only known to you, nobody else can construct valid urls.If you are developing in java, the Servlet is just a few lines of code. You should be able to port the code below on any other back-end technology.