Python – List all files in an online directory with Python

downloadpythonurllib2

Hello i was just wondering i'm trying to create a python application that downloads files from the internet but at the moment it only downloads one file with the name i know… is there any way that i can get a list of files in an online directory and downloaded them? ill show you my code for downloading one file at a time, just so you know a bit about what i wan't to do.

import urllib2

url = "http://cdn.primarygames.com/taxi.swf"

file_name = url.split('/')[-1]
u = urllib2.urlopen(url)
f = open(file_name, 'wb')
meta = u.info()
file_size = int(meta.getheaders("Content-Length")[0])
print "Downloading: %s Bytes: %s" % (file_name, file_size)

file_size_dl = 0
block_sz = 8192
while True:
    buffer = u.read(block_sz)
    if not buffer:
        break

    file_size_dl += len(buffer)
    f.write(buffer)
    status = r"%10d  [%3.2f%%]" % (file_size_dl, file_size_dl * 100. / file_size)
    status = status + chr(8)*(len(status)+1)
    print status,

f.close()

So what is does is it downloads taxi.swf from this website but what i want it to do is to download all .swf's from that directory "/" to the computer?

Is it possible and thank you so much in advanced. -Terrii-

Best Answer

Since you're trying to download a bunch of things at once, start by looking for a site index or a webpage that neatly lists everything you want to download. The mobile version of the website is usually lighter than the desktop and is easier to scrape.

This website has exactly what you're looking for: All Games.

Now, it's really quite simple to do. Just, extract all of the game page links. I use BeautifulSoup and requests to do this:

import requests
from bs4 import BeautifulSoup

games_url = 'http://www.primarygames.com/mobile/category/all/'

def get_all_games():
    soup = BeautifulSoup(requests.get(games_url).text)

    for a in soup.find('div', {'class': 'catlist'}).find_all('a'):
        yield 'http://www.primarygames.com' + a['href']

def download_game(url):
    # You have to do this stuff. I'm lazy and won't do it.

if __name__ == '__main__':
    for game in get_all_games():
        download_game(url)

The rest is up to you. download_game() downloads a game given the game's URL, so you have to figure out the location of the <object> tag in the DOM.