C# – How to have Open XML spreadsheet “uncollapse” cells in a spreadsheet

cexcelopenxmlopenxml-sdk

I'm working with xslx Excel file on the server side in C#. In a spreadsheet, say there are 15 columns (cells) total. In the rows of cells, some values are missing. So the first row is my header will properly have the 15 cells. But my data rows, some cells might have empty values, so Open XML has a "jagged" set of cells values. Row 1 will have the full 15 cells, Row 2 might have 13 cells since two of the values are empty. What! How do I map this data properly? It basically shifts everything to the left and my cell values are wrong. What am I missing? It seems like they are being "collapsed" in Open XML terminology.

WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;
                IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();
                string relationshipId = sheets.First().Id.Value;
                WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);
                Worksheet workSheet = worksheetPart.Worksheet;
                SheetData sheetData = workSheet.GetFirstChild<SheetData>();
                IEnumerable<Row> rows = sheetData.Descendants<Row>();

CLARIFICATION: Here's another way to ask this question. What if I want to take the contents on an Excel file and put it into a DataTable. I want all the columns of data to line up correctly. How could I accomplish this?

This question is asked better than I here: reading Excel Open XML is ignoring blank cells

Best Answer

As I understand it you are iterating over the Cells in a row and assuming that the first cell you read is in column A, the second in column B and so on?

I suggest that you (parse?) regex the column position / reference from the

DocumentFormat.OpenXml.Spreadsheet.Cell currentcell
currentcell.CellReference

CellReference gives you the cell reference in "A1" notation. Extract the Column part ("A,B,CC,etc.)

You would have to do this for each cell in a row. Then if a cell for a column is missing just fill in a placeholder value. Null or DbNull maybe?

I dont know if there is another way to figure out to what column a cell belongs.