R – Convert style-laden HTML tables to PDF, in .NET 1.1

itextsharpnetpdf

I have colleagues working on a .NET 1.1 project, where they obtain XML files from an external party and programmatically instruct iTextSharp to generate PDF content based on the XML data.

The tricky part is, within this XML are segments of arbitrary HTML content. These are HTML code users copied and pasted from their Office applications. Still looks ok on a web browser, but when this HTML is fed into iTextSharp's HTMLWorker object to parse and convert into PDF objects, the formatting and alignment run all over the place in the generated PDF document. E.g.

<span id="mceBoundaryType" class="portrait"></span>
<table border="0" cellspacing="0" cellpadding="0" width="636" class="MsoNormalTable"
    style="margin: auto auto auto 4.65pt; width: 477pt; border-collapse: collapse">
    <tbody>
        <tr style="height: 15.75pt">
            <td width="468" valign="bottom" style="padding-right: 5.4pt; padding-left: 5.4pt;
                padding-bottom: 0in; width: 351pt; padding-top: 0in; height: 15.75pt; background-color: transparent;
                border: #ece9d8">
                <p style="margin: 0in 0in 0pt" class="MsoNormal">
                    <font face="Times New Roman">&nbsp;</font></p>
            </td>
            <td colspan="3" width="168" valign="bottom" style="padding-right: 5.4pt; padding-left: 5.4pt;
                padding-bottom: 0in; width: 1.75in; padding-top: 0in; height: 15.75pt; background-color: transparent;
                border: #ece9d8">
                <p style="margin: 0in 0in 0pt; text-align: center" class="MsoNormal" align="center">
                    <u><font face="Times New Roman">Group</font></u></p>
            </td>
        </tr>

The tags are full of Style attributes, and iTextSharp does not support CSS and interpreting that attribute. What are some alternatives other iTextSharp users have tried to workaround this, or other feasible HTML-to-PDF components?

Best Answer

I have found .NET 2.0-based components like ExpertPDF and ABCpdf do a fairly good job interpreting the CSS styles and aligning the tables properly in PDF. Right now I am suggesting to my colleagues the use of a separate .NET 2.0 web service that can use such components, which will be informed by the ASP.NET 1.1 web application to go ahead and scrape a generated web page that is essentially the report in HTML view.

UPDATE:

This is the answer as it is the recommended approach provided to the application team.

Related Solutions

Html – Recommended way to embed PDF in HTML

This is quick, easy, to the point and doesn't require any third-party script:

<embed src="http://example.com/the.pdf" width="500" height="375" 
 type="application/pdf">

UPDATE (2/3/2021)

Adobe now offers it's own PDF Embed API.

https://www.adobe.io/apis/documentcloud/dcsdk/pdf-embed.html

UPDATE (1/2018):

The Chrome browser on Android no longer supports PDF embeds. You can get around this by using the Google Drive PDF viewer

<embed src="https://drive.google.com/viewerng/
viewer?embedded=true&url=http://example.com/the.pdf" width="500" height="375">

C# – Convert HTML to PDF in .NET

EDIT: New Suggestion HTML Renderer for PDF using PdfSharp

(After trying wkhtmltopdf and suggesting to avoid it)

HtmlRenderer.PdfSharp is a 100% fully C# managed code, easy to use, thread safe and most importantly FREE (New BSD License) solution.

Usage

Download HtmlRenderer.PdfSharp nuget package.

Use Example Method.

public static Byte[] PdfSharpConvert(String html)
{
    Byte[] res = null;
    using (MemoryStream ms = new MemoryStream())
    {
        var pdf = TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.A4);
        pdf.Save(ms);
        res = ms.ToArray();
    }
    return res;
}

A very Good Alternate Is a Free Version of iTextSharp

Until version 4.1.6 iTextSharp was licensed under the LGPL licence and versions until 4.16 (or there may be also forks) are available as packages and can be freely used. Of course someone can use the continued 5+ paid version.

I tried to integrate wkhtmltopdf solutions on my project and had a bunch of hurdles.

I personally would avoid using wkhtmltopdf - based solutions on Hosted Enterprise applications for the following reasons.

First of all wkhtmltopdf is C++ implemented not C#, and you will experience various problems embedding it within your C# code, especially while switching between 32bit and 64bit builds of your project. Had to try several workarounds including conditional project building etc. etc. just to avoid "invalid format exceptions" on different machines.
If you manage your own virtual machine its ok. But if your project is running within a constrained environment like (Azure (Actually is impossible withing azure as mentioned by the TuesPenchin author) , Elastic Beanstalk etc) it's a nightmare to configure that environment only for wkhtmltopdf to work.
wkhtmltopdf is creating files within your server so you have to manage user permissions and grant "write" access to where wkhtmltopdf is running.
Wkhtmltopdf is running as a standalone application, so its not managed by your IIS application pool. So you have to either host it as a service on another machine or you will experience processing spikes and memory consumption within your production server.
It uses temp files to generate the pdf, and in cases Like AWS EC2 which has really slow disk i/o it is a big performance problem.
The most hated "Unable to load DLL 'wkhtmltox.dll'" error reported by many users.

--- PRE Edit Section ---

For anyone who want to generate pdf from html in simpler applications / environments I leave my old post as suggestion.

TuesPechkin

https://www.nuget.org/packages/TuesPechkin/

or Especially For MVC Web Applications (But I think you may use it in any .net application)

Rotativa

https://www.nuget.org/packages/Rotativa/

They both utilize the wkhtmtopdf binary for converting html to pdf. Which uses the webkit engine for rendering the pages so it can also parse css style sheets.

They provide easy to use seamless integration with C#.

Rotativa can also generate directly PDFs from any Razor View.

Additionally for real world web applications they also manage thread safety etc...

Best Answer

Related Solutions

Html – Recommended way to embed PDF in HTML

C# – Convert HTML to PDF in .NET

Related Topic