Python – How to differentiate between HDF5 datasets and groups with h5py

h5pyhdf5python

I use the Python package h5py (version 2.5.0) to access my hdf5 files.

I want to traverse the content of a file and do something with every dataset.

Using the visit method:

import h5py

def print_it(name):
    dset = f[name]
    print(dset)
    print(type(dset))


with h5py.File('test.hdf5', 'r') as f:
    f.visit(print_it)

for a test file I obtain:

<HDF5 group "/x" (1 members)>
<class 'h5py._hl.group.Group'>
<HDF5 dataset "y": shape (100, 100, 100), type "<f8">
<class 'h5py._hl.dataset.Dataset'>

which tells me that there is a dataset and a group in the file. However there is no obvious way except for using type() to differentiate between the datasets and the groups. The h5py documentation unfortunately does not say anything about this topic. They always assume that you know beforehand what are the groups and what are the datasets, for example because they created the datasets themselves.

I would like to have something like:

f = h5py.File(..)
for key in f.keys():
    x = f[key]
    print(x.is_group(), x.is_dataset()) # does not exist

How can I differentiate between groups and datasets when reading an unknown hdf5 file in Python with h5py? How can I get a list of all datasets, of all groups, of all links?

Best Answer

Unfortunately, there is no builtin way in the h5py api to check this, but you can simply check the type of the item with is_dataset = isinstance(item, h5py.Dataset).

To list all the content of the file (except the file's attributes though) you can use Group.visititems with a callable which takes the name and instance of a item.