Java FileInputStream – differences based on how the File object is referenced: classloader/filesystem

classloaderexcelfileinputstreamjava

I'm using apache POI to extract some data from an excel file.
I need an InputStream to instantiate the POI HSSFWorkbook class
HSSFWorkbook wb = new HSSFWorkbook(inputStreamX);

I'm finding differences if I try to construct the InputStream object like

    InputStream inputStream = new FileInputStream(new File("/home/xxx/workspace/myproject/test/resources/importTest.xls"));        
    InputStream inputStream2 = new FileInputStream(getClass().getResource("/importTest.xls").getFile());
    InputStream inputStream3 = new ClassPathResource("importTest.xls").getInputStream();

If I construct the POI object with inputStream it works fine.
But inputStream2 and inputStream3 are throwing this exception

java.io.IOException: Invalid header signature; read -2300849302551019537, expected -2226271756974174256
    at org.apache.poi.poifs.storage.HeaderBlockReader.<init>(HeaderBlockReader.java:100)
    at org.apache.poi.poifs.filesystem.POIFSFileSystem.<init>(POIFSFileSystem.java:84)

It seems that the header of the binary file is different and the library can't recognize it as an Excel file. I can't understand why.
The only difference I see is that inputStream2 & 3 are using the classloader to locate the file. (ClassPathResource is a Spring class).

I'd like to have the file path separated from the system. So I would prefer something like inputStream2 or 3.

Do you have any idea on why this is happening?

Thank you

Update:
I tried writing to disk the inputStream and inputStream2.
The excel file that comes with inputStream is Ok. inputStream2 contains an excel file with some strange characters that wrap the real content.

It seems that maven corrupts the excel file in some way during the build.
So it's basically the file I retrieve with the classLoader (under /home/xxx/workspace/myproject/target/test-classes/importTest.xls) that is not ok.
Any idea?

Best Answer

The problem seems maven's filtering option.
If the pom looks like this

           <testResource>
                <directory>${basedir}/src/test/resources</directory>
                <includes>
                    <include>**/*.xml</include>
                    <include>**/*.properties</include>
                    <include>**/*.sql</include>
                    <include>**/*.xls</include>
                </includes>
                <filtering>true</filtering>
            </testResource>

When the filtering option is set to true on xls files it corrupts them.