A good design pattern for generating an Excel (xlsx) file in code

design-patternsexcelxmlxslt

See my Update at the bottom for more.


I occasionally have projects where I have to output some data as an Excel file (xlsx format). The process is usually:

  1. User clicks some buttons in my application

  2. My code runs a DB query and processes the results somehow

  3. My code generates an *.xlsx file using either the Excel com interop libraries or some third-party library (e.g., Aspose.Cells)

I can easily find code examples for how to do this online, but I am looking for a more robust way to do this. I'd like my code to follow some design principles to ensure that my code is maintainable and easily understandable.


Here is what my initial attempt to generate an xlsx file looked like:

var wb = new Workbook();
var ws = wb.Worksheets[0];
ws.Cells[0, 0].Value = "Header";
ws.Cells[1, 0].Value = "Row 1";
ws.Cells[2, 0].Value = "Row 2";
ws.Cells[3, 0].Value = "Row 3";
wb.Save(path);

Pros: Not much. It works, so that's good.

Cons:

  • Cell references are hardcoded, so I have magic numbers littered throughout my code.
  • It's difficult to add or remove columns and rows without updating many cell references.
  • I need to learn some third-party library. Some libraries are used like other libraries, but there can still be problems. I had a problem where the com interop libraries use 1-based cell referencing whereas Aspose.Cells uses 0-based cell referencing.

Here is one solution that addresses some of the cons I listed above. I wanted to treat a table of data as its own object that can be moved around and changed without digging into cell manipulation and disturbing other cell references. Here is some pseudocode:

var headers = new Block(new string[] { "Col 1", "Col 2", "Col 3" });
var body = new Block(new string[,]
    {
        { "Row 1", "Row 1", "Row 1" },
        { "Row 2", "Row 2", "Row 2" },
        { "Row 3", "Row 3", "Row 3" }
    });

body.PutBelow(headers);

As a part of this solution, I'll have some BlockEngine object that takes a container of Blocks and performs the cell manipulations required to output the data as an *.xlsx file. A Block object can have formatting attached to it.

Pros:

  • This removes most of the magic numbers that my initial code had.
  • This hides a lot of cell manipulation code, although cell manipulation is still required in the BlockEngine object that I mentioned.
  • It's much easier to add and remove rows without effecting other parts of the spreadsheet.

Cons:

  • It's still difficult to add or remove columns. If I wanted to swap the position of columns two and three, I'd have to directly swap the cell contents. In this case that'd be eight edits, and thus eight opportunities to make a mistake.
    • If I have any formatting in place for those two columns, I have to update that as well.
  • This solution doesn't support horizontal block placement; I can only place one block below another. Sure I could have tableRight.PutToRightOf(tableLeft), but that would cause problems if tableRight and tableLeft had different numbers of rows. To place tables, the engine would have to be aware of every other table. This seems unnecessarily complicated to me.
  • I still need to learn third-party code, though through a layer of abstraction via Block objects and a BlockEngine the code will be less tightly-coupled to the third-party library than my initial attempt. If I wanted to support lots of different formatting options in a loosely-coupled way, I'd probably have to write a lot of code; my BlockEngine would be a huge mess.

Here is a solution that takes a different route. Here's the process:

  1. I take my report data and generate an xml file in some format that I choose.

  2. I then use an xsl transformation to convert the xml file to an Excel 2003 XML Spreadsheet file.

  3. From there I simply convert the xml Spreadsheet to an xlsx file using a third party library.

I found this page that describes a similar process and includes code examples.

Pros:

  • This solution requires almost no cell manipulation. You instead use xsl/xpath to do your manipulations. In order to swap two columns in a table, you move the whole columns in the xsl file unlike my other solutions which would require cell-swapping.
  • While you still need a third-party library that can convert an Excel 2003 XML Spreadsheet to an xlsx file, that's about all you'll need the library for. The amount of code you need to write that would call into the third-party library is tiny.
  • I think this solution is the easiest to understand and requires the least amount of code.
    • The code that creates the data in my own xml format will be simple.
    • The xsl file will be complicated only because the Excel 2003 XML Spreadsheet is complicated. However it's easy to check the output of the xsl file: just open the output in Excel and check for error messages.
    • It's easy to generate sample Excel 2003 XML Spreadsheet files: just create a spreadsheet that looks like your desired xlsx file, and then save it as an Excel 2003 XML Spreadsheet.

Cons:

  • Excel 2003 XML Spreadsheets don't support certain features. You can't autofit column widths for instance. You can't include images in headers or footers. If you're going to export the resultant xlsx file to pdf, you can't set pdf bookmarks. (I hacked together a fix for this using cell comments.). You have to do this using your third-party library.
  • Requires a library that supports Excel 2003 XML Spreadsheets.
  • Uses an 11-year old MS Office file format.

Note: I realize that xlsx files are actually zip files containing xml files, but the xml formatting seems too complicated for my purposes.


Finally, I've looked into solutions involving SSRS, but it seems too bloated for my purposes.


Back to my initial question, what is a good design-pattern for generating Excel files in code?. I can think of a few solutions, but none seem to stick out as ideal. Each has drawbacks.


Update: So I tried both my BlockEngine solution and my XML Spreadsheet solution for generating similar XLSX files. Here are my opinions of them:

  • The BlockEngine solution:

    • This simply requires too much code considering the alternatives.
    • I found it too easy to overwrite one block with another if I had an offset wrong.
    • I originally stated that formatting could be attached at the block level. I found this to be not much better than doing the formatting separately from the block content. I can't think of a good way to combine the content and the formatting. Nor can I find a good way to keep them separate. It's just a mess.
  • The XML Spreadsheet solution:

    • I'm going with this solution for now.
    • It bears repeating that this solution requires much less code. I'm effectively replacing the BlockEngine with Excel itself. I still do need a hack for features like bookmarks and page breaks.
    • The XML Spreadsheet format is finicky, but it's easy to make a small change and compare the results to an existing file in your favorite Diff program. And once you figure out some idiosyncrasy, you can put it in place and forget about it from there.
    • I'm still concerned that this solution relies on an older Excel file format.
    • The XSLT file I created is easy to work with. Dealing with formatting is much simpler here than it is with the BlockEngine solution.

Best Answer

If you really want something that works well for you, then I suggest you get used to the idea of "unnecessarily complex"... that's the nature of dealing with Microsoft Office file formats.

I (sort of) like your idea of "blocks"... I would make sub-classed block objects, like Table, with Columns and Rows independent of the notion of cells. Then use your block engine to convert these to XSLS files.

I've used the OpenXML SDK successfully in the past, but don't try to read the documentation and start from scratch. Instead, create an exact copy in Excel of what you want, save it, and inspect it using the provided Document Reflector tool. It will give you the C# code you need to create the document, which you can then learn from and modify.

Related Topic