Calculating total file size by extension in shell

disk-space-utilizationfilesystemsshell

We have a set of directories containing lucene indexes. Each index is a mix of different file types (differentiated by extension) eg:

0/index/_2z6.frq
0/index/_2z6.fnm
..
1/index/_1sq.frq
1/index/_1sq.fnm
..

(it's about 10 different extensions)

We'd like to get a total by file extension, eg:

.frq     21234
.fnm     34757
..

I've tried various combinations of du/awk/xargs but finding it tricky to do exactly this.

Best Answer

For any given extension you an use

find /path -name '*.frq' -exec ls -l {} \; | awk '{ Total += $5} END { print Total }'

to get the total file size for that type.

And after some thinking

#!/bin/bash

ftypes=$(find . -type f | grep -E ".*\.[a-zA-Z0-9]*$" | sed -e 's/.*\(\.[a-zA-Z0-9]*\)$/\1/' | sort | uniq)

for ft in $ftypes
do
    echo -n "$ft "
    find . -name "*${ft}" -exec ls -l {} \; | awk '{total += $5} END {print total}'
done

Which will output the size in bytes of each file type found.