How to lazily read big Excel 2007 files with Apache POI

apache-poilazy-evaluationmemory

I'd like to read a big Excel 2007 file with Apache POI. Quick start guide states that a File should be used to conserve memory.

When opening a workbook, either a .xls HSSFWorkbook, or a .xlsx
XSSFWorkbook, the Workbook can be loaded from either a File or an
InputStream. Using a File object allows for lower memory consumption,
while an InputStream requires more memory as it has to buffer the
whole file.

I therefore wrote approximately this:

opcPackage = OPCPackage.open(file);
XSSFWorkbook workbook = new XSSFWorkbook(opcPackage);
XSSFSheet sheet = workbook.getSheetAt(0);
rows = sheet.rowIterator();
if (rows.hasNext()) {
Row row = rows.next();
    System.out.println(row.getCell(1).getStringCellValue());
}

However that results in a java.lang.OutOfMemoryError: Java heap space for a sheet with more than approximately 10000 rows.

I was hoping that iterating only lazily loads those rows that are to be read, like a stream.

How can I get around memory issues with large Excel files? Can I read lazily with Apache POI?

Best Answer

POI offers an eventmodel API that should handle lazy loading. More details can be found at POI documentation pages about eventmodel and other streaming options.

Related Topic