C# – iTextSharp HTMLWorker ParseHTML Tablestyle and PDFStamper

asp.netcitextsharp

Hi I have succesfully used a HTMLWorker to convert a gridview using asp.NET / C#.

(1) I have applied some limited style to the resulting table but cannot see how to apply tablestyle for instance grid lines or apply other formatting style such as a large column width for example for a particular column.
(2) I would actually like to put this text onto a pre-existing template which contains a logo etc. I've used PDF Stamper before for this but cannot see how I can use both PDFStamper and HTMLWorker at once. HTMLWorker needs a Document which implements iDocListener … but that doesnt seem compatible with usign a PDFStamper. I guess what I am looking for is a way to create a PDFStamper, write title etc, then add the parsed HTML from the grid. The other problem is that the parsed content doesnt interact with the other stuff on the page. For instance below I add a title chunk to the page. Rather than starting below it, the parsed HTML writes over the top. How do I place / interact the parsed HTML content with the rest of what is on the PDF document ?

Thanks in advance
Rob

Here';s the code I have already

            Document pdfDoc = new Document(PageSize.A4, 10f, 10f, 30f, 0f);

            HTMLWorker htmlWorker = new HTMLWorker(pdfDoc);

            StyleSheet styles = new StyleSheet();
            styles.LoadTagStyle("th", "size", "12px");
            styles.LoadTagStyle("th", "face", "helvetica");
            styles.LoadTagStyle("span", "size", "10px");
            styles.LoadTagStyle("span", "face", "helvetica");                
            styles.LoadTagStyle("td", "size", "10px");
            styles.LoadTagStyle("td", "face", "helvetica");     

            htmlWorker.SetStyleSheet(styles);

            PdfWriter.GetInstance(pdfDoc, HttpContext.Current.Response.OutputStream);

            pdfDoc.Open();

            //Title - but this gets obsured by data, doesnt move it down
            Font font = new Font(Font.FontFamily.HELVETICA, 14, Font.BOLD);
            Chunk chunk = new Chunk(title, font);                
            pdfDoc.Add(chunk);


            //Body
            htmlWorker.Parse(sr);

Best Answer

Let me first give you a couple of links to look over when you get a chance:

These answers go deeper into what's going on and I recommend reading them when you get a chance. Specifically the second one will show you why you need to use pt instead of px.

To answer your first question let me show you a different way to use the HTMLWorker class. This class has a static method on it called ParseToList that will convert HTML to a List<IElement>. The objects in that list are all iTextSharp specific versions of your HTML. Normally you would do a foreach on those and just add them to a document but you can modify them before adding which is what you want to do. Below is code that takes a static string and does that:

string file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");

using (FileStream fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None))
{
    using (Document doc = new Document(PageSize.LETTER))
    {
        using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
        {
            doc.Open();
            //Our HTML
            string html = "<table><tr><th>First Name</th><th>Last Name</th></tr><tr><td>Chris</td><td>Haas</td></tr></table>";
            //ParseToList requires a StreamReader instead of just a string so just wrap it
            using (StringReader sr = new StringReader(html))
            {
                //Create a style sheet
                StyleSheet styles = new StyleSheet();
                //...styles omitted for brevity

                //Convert our HTML to iTextSharp elements
                List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, styles);
                //Loop through each element (in this case there's actually just one PdfPTable)
                foreach (IElement el in elements)
                {
                    //If the element is a PdfPTable
                    if (el is PdfPTable)
                    {
                        //Cast it
                        PdfPTable tt = (PdfPTable)el;
                        //Change the widths, these are relative width by the way
                        tt.SetWidths(new float[] { 75, 25 });
                    }
                    //Add the element to the document
                    doc.Add(el);
                }
            }
            doc.Close();
        }
    }
}

Hopefully you can see that once you get access to the raw PdfPTable you can tweak it as necessary.

To answer your second question, if you want to use the normal Paragraph and Chunk objects with a PdfStamper then you need to use a PdfContentByte object. You can get this from your stamper in one of two ways, either by asking for one that sits "above" existing content, stamper.GetOverContent(int) or one that sits "below" existing content, stamper.GetUnderContent(int). Both versions take a single parameter saying what page to work with. Once you have a PdfContentByte you can create a ColumnText object bound to it and use this object's AddElement() method to add your normal elements. Before doing this (and this answers your third question), you'll want to create at least one "column". When I do this I generally create one that essentially covers the entire page. (This part might sound weird but we're essentially make a single row, single column table cell to add our objects to.)

Below is a full working C# 2010 WinForms app targeting iTextSharp 5.1.1.0 that shows off everything above. First it creates a generic PDF on the desktop. Then it creates a second document based off of the first, adds a paragraph and then some HTML. See the comments in the code for any questions.

using System;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;
using iTextSharp.text;
using iTextSharp.text.html.simpleparser;
using iTextSharp.text.pdf;
using System.IO;


namespace WindowsFormsApplication1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }

        private void Form1_Load(object sender, EventArgs e)
        {
            //The two files that we are creating
            string file1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File1.pdf");
            string file2 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "File2.pdf");

            //Create a base file to write on top of
            using (FileStream fs = new FileStream(file1, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (Document doc = new Document(PageSize.LETTER))
                {
                    using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
                    {
                        doc.Open();
                        doc.Add(new Paragraph("Hello world"));
                        doc.Close();
                    }
                }
            }

            //Bind a reader to our first document
            PdfReader reader = new PdfReader(file1);

            //Create our second document
            using (FileStream fs = new FileStream(file2, FileMode.Create, FileAccess.Write, FileShare.None))
            {
                using (PdfStamper stamper = new PdfStamper(reader, fs))
                {
                    StyleSheet styles = new StyleSheet();
                    //...styles omitted for brevity

                    //Our HTML
                    string html = "<table><tr><th>First Name</th><th>Last Name</th></tr><tr><td>Chris</td><td>Haas</td></tr></table>";
                    //ParseToList requires a StreamReader instead of just a string so just wrap it
                    using (StringReader sr = new StringReader(html))
                    {
                        //Get our raw PdfContentByte object letting us draw "above" existing content
                        PdfContentByte cb = stamper.GetOverContent(1);
                        //Create a new ColumnText object bound to the above PdfContentByte object
                        ColumnText ct = new ColumnText(cb);
                        //Get the dimensions of the first page of our source document
                        iTextSharp.text.Rectangle page1size = reader.GetPageSize(1);
                        //Create a single column object spanning the entire page
                        ct.SetSimpleColumn(0, 0, page1size.Width, page1size.Height);

                        ct.AddElement(new Paragraph("Hello world!"));

                        //Convert our HTML to iTextSharp elements
                        List<IElement> elements = iTextSharp.text.html.simpleparser.HTMLWorker.ParseToList(sr, styles);
                        //Loop through each element (in this case there's actually just one PdfPTable)
                        foreach (IElement el in elements)
                        {
                            //If the element is a PdfPTable
                            if (el is PdfPTable)
                            {
                                //Cast it
                                PdfPTable tt = (PdfPTable)el;
                                //Change the widths, these are relative width by the way
                                tt.SetWidths(new float[] { 75, 25 });
                            }
                            //Add the element to the ColumnText
                            ct.AddElement(el);
                        }
                        //IMPORTANT, this actually commits our object to the PDF
                        ct.Go();
                    }
                }
            }

            this.Close();
        }
    }
}

This is the style that Microsoft tends to use in their examples.

It appears that the guidance in this area may have changed, as StyleCop now enforces the use of the C# specific aliases.

C# – the difference between const and readonly in C#

Apart from the apparent difference of

having to declare the value at the time of a definition for a const VS readonly values can be computed dynamically but need to be assigned before the constructor exits.. after that it is frozen.
const's are implicitly static. You use a ClassName.ConstantName notation to access them.

There is a subtle difference. Consider a class defined in AssemblyA.

public class Const_V_Readonly
{
  public const int I_CONST_VALUE = 2;
  public readonly int I_RO_VALUE;
  public Const_V_Readonly()
  {
     I_RO_VALUE = 3;
  }
}

AssemblyB references AssemblyA and uses these values in code. When this is compiled:

in the case of the const value, it is like a find-replace. The value 2 is 'baked into' the AssemblyB's IL. This means that if tomorrow I update I_CONST_VALUE to 20, AssemblyB would still have 2 till I recompile it.
in the case of the readonly value, it is like a ref to a memory location. The value is not baked into AssemblyB's IL. This means that if the memory location is updated, AssemblyB gets the new value without recompilation. So if I_RO_VALUE is updated to 30, you only need to build AssemblyA and all clients do not need to be recompiled.

So if you are confident that the value of the constant won't change, use a const.

public const int CM_IN_A_METER = 100;

But if you have a constant that may change (e.g. w.r.t. precision).. or when in doubt, use a readonly.

public readonly float PI = 3.14;

Update: Aku needs to get a mention because he pointed this out first. Also I need to plug where I learned this: Effective C# - Bill Wagner

Best Answer

Related Solutions

C# – the difference between String and string in C#

This is the style that Microsoft tends to use in their examples.

C# – the difference between const and readonly in C#

Related Topic