R – Convert style-laden HTML tables to PDF, in .NET 1.1

itextsharpnetpdf

I have colleagues working on a .NET 1.1 project, where they obtain XML files from an external party and programmatically instruct iTextSharp to generate PDF content based on the XML data.

The tricky part is, within this XML are segments of arbitrary HTML content. These are HTML code users copied and pasted from their Office applications. Still looks ok on a web browser, but when this HTML is fed into iTextSharp's HTMLWorker object to parse and convert into PDF objects, the formatting and alignment run all over the place in the generated PDF document. E.g.

<span id="mceBoundaryType" class="portrait"></span>
<table border="0" cellspacing="0" cellpadding="0" width="636" class="MsoNormalTable"
    style="margin: auto auto auto 4.65pt; width: 477pt; border-collapse: collapse">
    <tbody>
        <tr style="height: 15.75pt">
            <td width="468" valign="bottom" style="padding-right: 5.4pt; padding-left: 5.4pt;
                padding-bottom: 0in; width: 351pt; padding-top: 0in; height: 15.75pt; background-color: transparent;
                border: #ece9d8">
                <p style="margin: 0in 0in 0pt" class="MsoNormal">
                    <font face="Times New Roman">&nbsp;</font></p>
            </td>
            <td colspan="3" width="168" valign="bottom" style="padding-right: 5.4pt; padding-left: 5.4pt;
                padding-bottom: 0in; width: 1.75in; padding-top: 0in; height: 15.75pt; background-color: transparent;
                border: #ece9d8">
                <p style="margin: 0in 0in 0pt; text-align: center" class="MsoNormal" align="center">
                    <u><font face="Times New Roman">Group</font></u></p>
            </td>
        </tr>

The tags are full of Style attributes, and iTextSharp does not support CSS and interpreting that attribute. What are some alternatives other iTextSharp users have tried to workaround this, or other feasible HTML-to-PDF components?

Best Answer

I have found .NET 2.0-based components like ExpertPDF and ABCpdf do a fairly good job interpreting the CSS styles and aligning the tables properly in PDF. Right now I am suggesting to my colleagues the use of a separate .NET 2.0 web service that can use such components, which will be informed by the ASP.NET 1.1 web application to go ahead and scrape a generated web page that is essentially the report in HTML view.

UPDATE:

This is the answer as it is the recommended approach provided to the application team.