Linux – Get total files size from a file containing a file list

fileslinux

I have a file containing a list of files that I would like to know the total files size. Is there a command to do so?

My OS is a very basic linux (Qnap TS-410).

EDIT:

A few lines from the file:

/share/archive/Bailey Test/BD006/0.tga
/share/archive/Bailey/BD007/1 version 1.tga
/share/archive/Bailey 2/BD007/example.tga

Best Answer

I believe something like this would work in busybox:

du `cat filelist.txt` | awk '{i+=$1} END {print i}'

I don't have the same environment as you, but if you encounter issues with spaces in filenames something like this would work too:

cat filelist.txt | while read file;do
  du "$file"
done | awk '{i+=$1} END {print i}'

Edit 1:
@stew is right in his post below, du shows the disk usage and not the exact filesize. To change the behavior busybox uses the -a flag, so try: du -a "$file" for exact filesize and compare the output/behavior.

Related Solutions

Linux – Efficiently Retrieve Files from Tar or Cpio Archives

tar (and cpio and afio and pax and similar programs) are stream-oriented formats - they are intended to be streamed direct to a tape or piped into another process. while, in theory, it would be possible to add an index at the end of the file/stream, i don't know of any version that does (it would be a useful enhancement though)

it won't help with your existing tar or cpio archives, but there is another tool, dar ("disk archive"), that does create archive files that contain such an index and can give you fast direct access to individual files within the archive.

if dar isn't included with your unix/linux-dist, you can find it at:

http://dar.linux.free.fr/

Linux – How to display certain lines from a text file in Linux

sed -n '10000000,10000020p' filename

You might be able to speed that up a little like this:

sed -n '10000000,10000020p; 10000021q' filename

In those commands, the option -n causes sed to "suppress automatic printing of pattern space". The p command "print[s] the current pattern space" and the q command "Immediately quit[s] the sed script without processing any more input..." The quotes are from the sed man page.

By the way, your command

tail -n 10000000 filename | head 10

starts at the ten millionth line from the end of the file, while your "middle" command would seem to start at the ten millionth from the beginning which would be equivalent to:

head -n 10000010 filename | tail 10

The problem is that for unsorted files with variable length lines any process is going to have to go through the file counting newlines. There's no way to shortcut that.

If, however, the file is sorted (a log file with timestamps, for example) or has fixed length lines, then you can seek into the file based on a byte position. In the log file example, you could do a binary search for a range of times as my Python script here* does. In the case of the fixed record length file, it's really easy. You just seek linelength * linecount characters into the file.

^{* I keep meaning to post yet another update to that script. Maybe I'll get around to it one of these days.}

Best Answer

Related Solutions

Linux – Efficiently Retrieve Files from Tar or Cpio Archives

Linux – How to display certain lines from a text file in Linux

Related Topic