Elegant Way to Count Number of Files in a Zip Archive

file handlingjava

I need to count the number of files in several zip archives for error checking. I've found several answers on SO but nothing elegant. I currently have the following code which works but it's not pretty:

File[] fileListing = new File("ZipFiles").listFiles();
int numberOfFilesInAllZipArchives = 0;
for(File file : fileListing)
{
    ZipFile zf = new ZipFile(file.getName());
    numberOfFilesInAllZipArchives += zf.size();
    zf.close();
}

Is there any way to get around creating an object for each zip file or making this process more streamlined?

Thanks!

Best Answer

The only thing I see inelegant about your implementation is intertwining the file listing work, the zip file opening/closing work, and the counting work. (Do you have another issue with it?)

Solution 1: Use Java's try-with-resources block to automate the file closing work thanks to ZipFile being AutoCloseable:

...
try (ZipFile zf = new ZipFile(file) {
  numberOfFilesInAllZipArchives += zf.size();
}
...

Solution 2: In Groovy, separate out the file listing work (and add filename filtering):

import java.util.zip.ZipFile

def dir = new File('Downloads')
def num = 0
dir.eachFileMatch(~/.*\.zip/) { f ->
    def z = new ZipFile(f)
    num += z.size()
    z.close()
}
println num

Solution 3: Also separate ZipFile open/closing work using Groovy's "with" idiom, adding a zipFileWith(closure) method to the class File:

import java.util.zip.ZipFile

File.metaClass.zipFileWith = { closure ->
    def zf = new ZipFile(delegate)
    try {
        closure(zf)
    } finally {
        zf?.close()
    }
}

def dir = new File('Downloads')
def num = 0
dir.eachFileMatch(~/.*\.zip/) { file ->
    file.zipFileWith { zf -> num += zf.size() }
}
println num

Solution 4: Add an eachZipFile(closure) method to the class File:

import java.util.zip.ZipFile

File.metaClass.eachZipFile = { closure ->
    delegate.eachFileMatch(~/.*\.zip/) { file ->
        def zf = new ZipFile(file)
        try {
            closure(zf)
        } finally {
            zf?.close()
        }
    }
}

def dir = new File('Downloads')
def num = 0
dir.eachZipFile { zf -> num += zf.size() }
println num

Related Solutions

Java – How to Create Java Zip Archives with a Max File Size Limit

The size limit (16 mb or whatever) does not enforce you to have archive size as close to it as possible.

Assuming that you are allowed to create archives of smaller size, here is the "first iteration" solution - dead simple, but meets your requirements: just zip every file into separate archive.

myFile1 -> archive1.zip
myFile2 -> archive2.zip
etc

Now, if you want it a bit less dumb, use the sum of current archive size (Deflater.getBytesWritten()) and next uncompressed file size to decide if it's time to switch to creating new archive.

myFile1 -> archive1.zip
size of archive1.zip plus myFile2 within limit -> add myFile2 to archive1
size of archive1.zip plus myFile3 exceeds limit -> add myFile3 to new zip, archive2
_{Yeah there is a chance that adding compressed myFile3 to archive1 will remain within limit, but why bother?}
etc

PHP file_put_contents File Locking

I know this is ages old, but in case someone runs into this. IMHO the way to go about it is like this:

1) Open the original file (e.g. original.txt) using file_get_contents('original.txt').

2) Make your changes/edits.

3) Use file_put_contents('original.txt.tmp') and write it to a temp file original.txt.tmp.

4) Then move the tmp file to the original file, replacing the original file. For this you use rename('original.txt.tmp', 'original.txt').

Advantages: While the file is being processed and written to the file is not locked and others can still read the old content. At least on Linux/Unix boxes rename is an atomic operation. Any interruptions during the file writing don't touch the original file. Only once the file has been fully written to disk is it moved. More interesting read on this in the comments to http://php.net/manual/en/function.rename.php

Edit to address commments(too for comment):

https://stackoverflow.com/questions/7054844/is-rename-atomic has further references to what you might need to do if you are operating across filesystems.

On the shared lock for the reading I am not sure why that would be needed as in this implementation there is no writing to the file directly. PHP's flock (which is used to get the lock) is a little but unreliable and can be ignored by other processes. Thats why I am suggesting using the rename.

The rename file should ideally be named uniquely to the process doing the renaming so as to make sure not 2 processes do the same thing. But this of course does not prevent editing of the same file by more than one person at the same time. But at least the file will be left intact (last edit wins).

Step 3) & 4) would then become this:

$tempfile = uniqid(microtime(true)); // make sure we have a unique name
file_put_contents($tempFile); // write temp file
rename($tempfile, 'original.txt'); // ideally on the same filesystem

Best Answer

Related Solutions

Java – How to Create Java Zip Archives with a Max File Size Limit

PHP file_put_contents File Locking

Related Topic