Tuesday, August 14, 2012

JXL parsing problems with apache POI generated xls


The problem

We are using apache poi 3.8 to generate excels. Our excel generation is based on templates where we duplicate rows and columns as necessary to fit the data that is fed into them. In this particular usecase we add 23 rows and one column. Our counterparty uses jxl 2.5.5 (released 2005-05-05) to parse the incoming xls files generated by our poi setup. The added rows do not pose a problem but the added column does. The problem is that the values in the added column are not parsed and instead jxl outputs warnings:
Warning: Cell D1 exceeds defined cell boundaries in Dimension record (3x42)
Warning: Cell D2 exceeds defined cell boundaries in Dimension record (3x42)
Warning: Cell D3 exceeds defined cell boundaries in Dimension record (3x42)
Warning: Cell D4 exceeds defined cell boundaries in Dimension record (3x42)
Warning: Cell D5 exceeds defined cell boundaries in Dimension record (3x42)

Analysis

If you google for the warning you quickly arrive at the jxl.read.biff.SheetReader class as the culprit, more specifically the "addCell()" method. As you can see it will generate warnings for out of bound cells:
private void addCell(Cell cell) {
      // Sometimes multiple cells (eg. MULBLANK) can exceed the
      // column/row boundaries. Ignore these
      if (cell.getRow() < numRows && cell.getColumn() < numCols) {
            if (cells[cell.getRow()][cell.getColumn()] != null) {
                  StringBuffer sb = new StringBuffer();
                  CellReferenceHelper.getCellReference(cell.getColumn(), cell.getRow(), sb);
                  logger.warn("Cell " + sb.toString() + " already contains data");
            }
            cells[cell.getRow()][cell.getColumn()] = cell;
      }
      else {
            logger.warn("Cell " +
                  CellReferenceHelper.getCellReference
                        (cell.getColumn(), cell.getRow()) +
                  " exceeds defined cell boundaries in Dimension record " +
                  "(" + numCols + "x" + numRows + ")");
      }
}

Reproducing the error

We set up a small testcase using jxl to parse the excel we sent the counterparty. At the core is this read() method:
public void read(File file) throws IOException, BiffException {
      Workbook w = Workbook.getWorkbook(file);
      
      // Get the second sheet
      Sheet sheet = w.getSheet(1);
      System.out.println(sheet);
      for (int i = 0; i < sheet.getRows(); i++) {
            for (int j = 0; j < sheet.getColumns(); j++) {
                  Cell cell = sheet.getCell(j, i);
                  System.out.print("\t" + cell.getContents());
            }
            System.out.println();
      }
}
I included the latest version of jxl available in the maven repository which is 2.6.12 (released 2009-12-26) at the time of writing. The warning did not occur and the additional column was parsed without a problem. I downgraded the jxl version to the oldest available in the repository: 2.5.7 (released 2005-07-30) and ran it again. This generated the warnings our counterparty was experiencing and resulted in a missing column.
Edit: once we received confirmation of the jxl version being used by our counterparty, we ran the test again with 2.5.5 which yielded the same results as 2.5.7

Solutions

Solution 1: Upgrading jxl version

The reason the newer version of jxl parses the excel correctly is because they updated the addCell() method to:
private void addCell(Cell cell) {
      // Sometimes multiple cells (eg. MULBLANK) can exceed the
      // column/row boundaries. Ignore these
      if (cell.getRow() < numRows && cell.getColumn() < numCols) {
            if (cells[cell.getRow()][cell.getColumn()] != null) {
                  StringBuffer sb = new StringBuffer();
                  CellReferenceHelper.getCellReference(cell.getColumn(), cell.getRow(), sb);
                  logger.warn("Cell " + sb.toString() + " already contains data");
            }
            cells[cell.getRow()][cell.getColumn()] = cell;
      }
      else {
            outOfBoundsCells.add(cell);
            /*
            logger.warn("Cell " +
                  CellReferenceHelper.getCellReference
                        (cell.getColumn(), cell.getRow()) +
                  " exceeds defined cell boundaries in Dimension record " +
                  "(" + numCols + "x" + numRows + ")");
            */
      }
}
As you can see the excel generated by apache poi is still incorrect but jxl has learned to deal with it gracefully.

Solution 2: Fixing apache poi

Apache poi updates the dimensions of a sheet every time you add a row or a column. However there is a slight difference between how these additions are handled:
Row addition (org.apache.poi.hssf.model.InternalSheet.addRow():698):
if (row.getRowNumber() >= d.getLastRow()) {
      d.setLastRow(row.getRowNumber() + 1);
}
Column addition (org.apache.poi.hssf.model.InternalSheet.addValueRecord():633):
if (col.getColumn() > d.getLastCol()) {
      d.setLastCol(( short ) (col.getColumn() + 1));
}
We updated line 633 to:
if (col.getColumn() >= d.getLastCol()) {
And generated the excel again. This new excel was parsed correctly by both versions of jxl.
Note that to submit the bug to apache poi, it is best to have a simple code sample able to reproduce the problem. The following method will generate an excel that has the wrong dimensions:
public static void main(String...args) throws IOException {
      Workbook workbook = new HSSFWorkbook();
      Sheet sheet = workbook.createSheet("test");
      Row row = sheet.createRow(0);
      Cell cell = row.createCell(0);
      cell.setCellValue("cell1");
      cell = row.createCell(1);
      cell.setCellValue("cell2");
      OutputStream output = new FileOutputStream(new File("c:/generated.xls"));
      try {
            workbook.write(output);
      }
      finally {
            output.close();
      }
}
This bug and the suggested fix have been logged in the apache bug tracker.

Author: Alexander Verbruggen
Author: Alexander Verbruggen

No comments:

Post a Comment