Java Class Resolution – Handling Multiple JAR Files

javamaven

Recently I found one of my Maven project have 100+ jar dependencies. FWIK a zip archive doesn't have index at all, so it should scan the whole zip to determine if it contains a specific path.

But I found Java resolve class names against so many jars rather fast, why?

Best Answer

The ZIP format (which JAR is an extension of) consists of a set of compressed sections and an index section at the end. The index section contains the full filenames (well, full relative to the root of the ZIP) of the files contained within the ZIP, together with other metadata (e.g., where the compressed data is) which means that finding what is in a ZIP is actually a very fast operation. Since a class maps to a single .class file in a trivial way, finding whether a JAR contains it is itself very fast even before considering any caching.

This all stems from the ZIP format's original use as a multi-disk compressed archive format; when expanding, you'd have to unzip by inserting the last disk of a set (so that the index could be read) before starting to deal with the compressed data from the beginning of the first disk. Of course, if you ran out of disks before you finished writing the archive, you were completely SOL…