I am using the accepted solution here to convert an excel sheet into a datatable. This works fine if I have "perfect" data but if I have a blank cell in the middle of my data it seems to put the wrong data in each column.
I think this is because in the below code:
row.Descendants<Cell>().Count()
is number of populated cells (not all columns) AND:
GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
seems to find the next populated cell (not necessarily what is in that index) so if the first column is empty and i call ElementAt(0), it returns the value in the second column.
Here is the full parsing code.
DataRow tempRow = dt.NewRow();
for (int i = 0; i < row.Descendants<Cell>().Count(); i++)
{
tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));
if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)
{
Console.Write(tempRow[i].ToString());
}
}
Best Answer
This makes sense since Excel will not store a value for a cell that is null. If you open your file using the Open XML SDK 2.0 Productivity Tool and traverse the XML down to the cell level you will see that only the cells that have data are going to be in that file.
Your options are to insert blank data in the range of cells you are going to traverse or programmatically figure out a cell was skipped and adjust your index appropriately.
I made an example excel document with a string in cell reference A1 and C1. I then opened up the excel document in the Open XML Productivity Tool and here is the XML that was stored:
Here you will see that the data corresponds to the first row and that only two cells worth of data are saved for that row. The data saved corresponds to A1 and C1 and that no cells with null values are saved.
To get the functionality that you need, you can traverse over the Cells as you are doing above, but you will need to check what the value the Cell is referencing and determine if any Cells have been skipped. to do that you will need two utility functions to get the Column Name from the cell reference and to then translate that column name into a zero based index:
Then you can iterate over the Cells and check to see what the cell reference is compared to the columnIndex. If it is less than then you add blank data to your tempRow, otherwise just read in the value contained in the cell. (Note: I did not test the code below, but the general idea should help):