Python directory searching and organizing by dict

directoryos.walkpathpythonsubdirectory

Hey all, this is my first time recently trying to get into the file and os part of Python. I am trying to search a directory then find all sub directories. If the directory has no folders, add all the files to a list. And organize them all by dict.

So for instance a tree could look like this

  • Starting Path
    • Dir 1
      • Subdir 1
      • Subdir 2
      • Subdir 3
        • subsubdir
          • file.jpg
          • folder1
            • file1.jpg
            • file2.jpg
          • folder2
            • file3.jpg
            • file4.jpg

Even if subsubdir has a file in it, it should be skipped because it has folders in it.

Now I can normally do this if I know how many directories I am going to be looking for, using os.listdir and os.path.isdir. However if I want this to be dynamic it will have to compensate for any amount of folders and subfolders. I have tried using os.walk and it will find all the files easily. The only trouble I am having is creating all the dicts with the path names that contain file. I need the foldernames organized by dict, up until the starting path.

So in the end, using the example above, the dict should look like this with the files in it:

dict['dir1']['subdir3']['subsubdir']['folder1'] = ['file1.jpg', 'file2.jpg']

dict['dir1']['subdir3']['subsubdir']['folder2'] = ['file3.jpg', 'file4.jpg']

Would appreciate any help on this or better ideas on organizing the information. Thanks.

Best Answer

Maybe you want something like:

def explore(starting_path):
  alld = {'': {}}

  for dirpath, dirnames, filenames in os.walk(starting_path):
    d = alld
    dirpath = dirpath[len(starting_path):]
    for subd in dirpath.split(os.sep):
      based = d
      d = d[subd]
    if dirnames:
      for dn in dirnames:
        d[dn] = {}
    else:
      based[subd] = filenames
  return alld['']

For example, given a /tmp/a such that:

$ ls -FR /tmp/a
b/  c/  d/

/tmp/a/b:
z/

/tmp/a/b/z:

/tmp/a/c:
za  zu

/tmp/a/d:

print explore('/tmp/a') emits: {'c': ['za', 'zu'], 'b': {'z': []}, 'd': []}.

If this isn't exactly what you're after, maybe you can show us specifically what the differences are supposed to be? I suspect they can probably be easily fixed, if need be.