C# – Convert HTML to PDF in .NET

chtmlitextsharppdf

I want to generate a PDF by passing HTML contents to a function. I have made use of iTextSharp for this but it does not perform well when it encounters tables and the layout just gets messy.

Is there a better way?

Best Answer

EDIT: New Suggestion HTML Renderer for PDF using PdfSharp

(After trying wkhtmltopdf and suggesting to avoid it)

HtmlRenderer.PdfSharp is a 100% fully C# managed code, easy to use, thread safe and most importantly FREE (New BSD License) solution.

Usage

Download HtmlRenderer.PdfSharp nuget package.

Use Example Method.

public static Byte[] PdfSharpConvert(String html)
{
    Byte[] res = null;
    using (MemoryStream ms = new MemoryStream())
    {
        var pdf = TheArtOfDev.HtmlRenderer.PdfSharp.PdfGenerator.GeneratePdf(html, PdfSharp.PageSize.A4);
        pdf.Save(ms);
        res = ms.ToArray();
    }
    return res;
}

A very Good Alternate Is a Free Version of iTextSharp

Until version 4.1.6 iTextSharp was licensed under the LGPL licence and versions until 4.16 (or there may be also forks) are available as packages and can be freely used. Of course someone can use the continued 5+ paid version.

I tried to integrate wkhtmltopdf solutions on my project and had a bunch of hurdles.

I personally would avoid using wkhtmltopdf - based solutions on Hosted Enterprise applications for the following reasons.

First of all wkhtmltopdf is C++ implemented not C#, and you will experience various problems embedding it within your C# code, especially while switching between 32bit and 64bit builds of your project. Had to try several workarounds including conditional project building etc. etc. just to avoid "invalid format exceptions" on different machines.
If you manage your own virtual machine its ok. But if your project is running within a constrained environment like (Azure (Actually is impossible withing azure as mentioned by the TuesPenchin author) , Elastic Beanstalk etc) it's a nightmare to configure that environment only for wkhtmltopdf to work.
wkhtmltopdf is creating files within your server so you have to manage user permissions and grant "write" access to where wkhtmltopdf is running.
Wkhtmltopdf is running as a standalone application, so its not managed by your IIS application pool. So you have to either host it as a service on another machine or you will experience processing spikes and memory consumption within your production server.
It uses temp files to generate the pdf, and in cases Like AWS EC2 which has really slow disk i/o it is a big performance problem.
The most hated "Unable to load DLL 'wkhtmltox.dll'" error reported by many users.

--- PRE Edit Section ---

For anyone who want to generate pdf from html in simpler applications / environments I leave my old post as suggestion.

TuesPechkin

https://www.nuget.org/packages/TuesPechkin/

or Especially For MVC Web Applications (But I think you may use it in any .net application)

Rotativa

https://www.nuget.org/packages/Rotativa/

They both utilize the wkhtmtopdf binary for converting html to pdf. Which uses the webkit engine for rendering the pages so it can also parse css style sheets.

They provide easy to use seamless integration with C#.

Rotativa can also generate directly PDFs from any Razor View.

Additionally for real world web applications they also manage thread safety etc...

Related Solutions

C# – Deep cloning objects

Whereas one approach is to implement the ICloneable interface (described here, so I won't regurgitate), here's a nice deep clone object copier I found on The Code Project a while ago and incorporated it into our code. As mentioned elsewhere, it requires your objects to be serializable.

using System;
using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

/// <summary>
/// Reference Article http://www.codeproject.com/KB/tips/SerializedObjectCloner.aspx
/// Provides a method for performing a deep copy of an object.
/// Binary Serialization is used to perform the copy.
/// </summary>
public static class ObjectCopier
{
    /// <summary>
    /// Perform a deep copy of the object via serialization.
    /// </summary>
    /// <typeparam name="T">The type of object being copied.</typeparam>
    /// <param name="source">The object instance to copy.</param>
    /// <returns>A deep copy of the object.</returns>
    public static T Clone<T>(T source)
    {
        if (!typeof(T).IsSerializable)
        {
            throw new ArgumentException("The type must be serializable.", nameof(source));
        }

        // Don't serialize a null object, simply return the default for that object
        if (ReferenceEquals(source, null)) return default;

        using var Stream stream = new MemoryStream();
        IFormatter formatter = new BinaryFormatter();
        formatter.Serialize(stream, source);
        stream.Seek(0, SeekOrigin.Begin);
        return (T)formatter.Deserialize(stream);
    }
}

The idea is that it serializes your object and then deserializes it into a fresh object. The benefit is that you don't have to concern yourself about cloning everything when an object gets too complex.

In case of you prefer to use the new extension methods of C# 3.0, change the method to have the following signature:

public static T Clone<T>(this T source)
{
   // ...
}

Now the method call simply becomes objectBeingCloned.Clone();.

EDIT (January 10 2015) Thought I'd revisit this, to mention I recently started using (Newtonsoft) Json to do this, it should be lighter, and avoids the overhead of [Serializable] tags. (NB @atconway has pointed out in the comments that private members are not cloned using the JSON method)

/// <summary>
/// Perform a deep Copy of the object, using Json as a serialization method. NOTE: Private members are not cloned using this method.
/// </summary>
/// <typeparam name="T">The type of object being copied.</typeparam>
/// <param name="source">The object instance to copy.</param>
/// <returns>The copied object.</returns>
public static T CloneJson<T>(this T source)
{            
    // Don't serialize a null object, simply return the default for that object
    if (ReferenceEquals(source, null)) return default;

    // initialize inner objects individually
    // for example in default constructor some list property initialized with some values,
    // but in 'source' these items are cleaned -
    // without ObjectCreationHandling.Replace default constructor values will be added to result
    var deserializeSettings = new JsonSerializerSettings {ObjectCreationHandling = ObjectCreationHandling.Replace};

    return JsonConvert.DeserializeObject<T>(JsonConvert.SerializeObject(source), deserializeSettings);
}

Html – Make a div fill the height of the remaining screen space

2015 update: the flexbox approach

There are two other answers briefly mentioning flexbox; however, that was more than two years ago, and they don't provide any examples. The specification for flexbox has definitely settled now.

Note: Though CSS Flexible Boxes Layout specification is at the Candidate Recommendation stage, not all browsers have implemented it. WebKit implementation must be prefixed with -webkit-; Internet Explorer implements an old version of the spec, prefixed with -ms-; Opera 12.10 implements the latest version of the spec, unprefixed. See the compatibility table on each property for an up-to-date compatibility status.

(taken from https://developer.mozilla.org/en-US/docs/Web/Guide/CSS/Flexible_boxes)

All major browsers and IE11+ support Flexbox. For IE 10 or older, you can use the FlexieJS shim.

To check current support you can also see here: http://caniuse.com/#feat=flexbox

Working example

With flexbox you can easily switch between any of your rows or columns either having fixed dimensions, content-sized dimensions or remaining-space dimensions. In my example I have set the header to snap to its content (as per the OPs question), I've added a footer to show how to add a fixed-height region and then set the content area to fill up the remaining space.

html,
body {
  height: 100%;
  margin: 0;
}

.box {
  display: flex;
  flex-flow: column;
  height: 100%;
}

.box .row {
  border: 1px dotted grey;
}

.box .row.header {
  flex: 0 1 auto;
  /* The above is shorthand for:
  flex-grow: 0,
  flex-shrink: 1,
  flex-basis: auto
  */
}

.box .row.content {
  flex: 1 1 auto;
}

.box .row.footer {
  flex: 0 1 40px;
}

<!-- Obviously, you could use HTML5 tags like `header`, `footer` and `section` -->

<div class="box">
  <div class="row header">
    <p><b>header</b>
      <br />
      <br />(sized to content)</p>
  </div>
  <div class="row content">
    <p>
      <b>content</b>
      (fills remaining space)
    </p>
  </div>
  <div class="row footer">
    <p><b>footer</b> (fixed height)</p>
  </div>
</div>

In the CSS above, the flex property shorthands the flex-grow, flex-shrink, and flex-basis properties to establish the flexibility of the flex items. Mozilla has a good introduction to the flexible boxes model.