C# – Removing PDF invisible objects with iTextSharp

citextsharppdf

Is possible to use iTextSharp to remove from a PDF document objects that are not visible (or at least not being displayed)?

More details:

1) My source is a PDF page containing images and text (maybe some vectorial drawings) and embedded fonts.

2) There's an interface to design multiple 'crop boxes'.

3) I must generate a new PDF that contains only what is inside the crop boxes. Anything else must be removed from resulting document (indeed I may accept content which is half inside and half outside, but this is not the ideal and it should not appear anyway).

My solution so far:

I have successfully developed a solution that creates new temporary documents, each one containing the content of each crop box (using writer.GetImportedPage and contentByte.AddTemplate to a page that is exactly the size of the crop box). Then I create the final document and repeat the process, using the AddTemplate method do position each "cropped page" in the final page.

This solution has 2 big disadvantages:

the size of the document is the [original size] * [number of crop boxes], since the entire page is there, stamped many times! (invisible, but it's there)
the invisible text may still be accessed by selecting all (CTRL+A) within Reader and pasted.

So, I think I need to iterate through PDF objects, detect if it is visible or not, and delete it. At the time of writing, I am trying to use pdfReader.GetPdfObject.

Thanks for the help.

Best Answer

If the PDF which you are trying is a template/predefined/fixed then you can remove that object by calling RemoveField.

PdfReader pdfReader = new PdfReader(../Template_Path.pdf"));
PdfStamper pdfStamperToPopulate = new PdfStamper(pdfReader, new FileStream(outputPath, FileMode.Create));
AcroFields pdfFormFields = pdfStamperToPopulate.AcroFields;
pdfFormFields.RemoveField("fieldNameToBeRemoved");

Related Solutions

C# – Deep cloning objects

Whereas one approach is to implement the ICloneable interface (described here, so I won't regurgitate), here's a nice deep clone object copier I found on The Code Project a while ago and incorporated it into our code. As mentioned elsewhere, it requires your objects to be serializable.

using System;
using System.IO;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

/// <summary>
/// Reference Article http://www.codeproject.com/KB/tips/SerializedObjectCloner.aspx
/// Provides a method for performing a deep copy of an object.
/// Binary Serialization is used to perform the copy.
/// </summary>
public static class ObjectCopier
{
    /// <summary>
    /// Perform a deep copy of the object via serialization.
    /// </summary>
    /// <typeparam name="T">The type of object being copied.</typeparam>
    /// <param name="source">The object instance to copy.</param>
    /// <returns>A deep copy of the object.</returns>
    public static T Clone<T>(T source)
    {
        if (!typeof(T).IsSerializable)
        {
            throw new ArgumentException("The type must be serializable.", nameof(source));
        }

        // Don't serialize a null object, simply return the default for that object
        if (ReferenceEquals(source, null)) return default;

        using var Stream stream = new MemoryStream();
        IFormatter formatter = new BinaryFormatter();
        formatter.Serialize(stream, source);
        stream.Seek(0, SeekOrigin.Begin);
        return (T)formatter.Deserialize(stream);
    }
}

The idea is that it serializes your object and then deserializes it into a fresh object. The benefit is that you don't have to concern yourself about cloning everything when an object gets too complex.

In case of you prefer to use the new extension methods of C# 3.0, change the method to have the following signature:

public static T Clone<T>(this T source)
{
   // ...
}

Now the method call simply becomes objectBeingCloned.Clone();.

EDIT (January 10 2015) Thought I'd revisit this, to mention I recently started using (Newtonsoft) Json to do this, it should be lighter, and avoids the overhead of [Serializable] tags. (NB @atconway has pointed out in the comments that private members are not cloned using the JSON method)

/// <summary>
/// Perform a deep Copy of the object, using Json as a serialization method. NOTE: Private members are not cloned using this method.
/// </summary>
/// <typeparam name="T">The type of object being copied.</typeparam>
/// <param name="source">The object instance to copy.</param>
/// <returns>The copied object.</returns>
public static T CloneJson<T>(this T source)
{            
    // Don't serialize a null object, simply return the default for that object
    if (ReferenceEquals(source, null)) return default;

    // initialize inner objects individually
    // for example in default constructor some list property initialized with some values,
    // but in 'source' these items are cleaned -
    // without ObjectCreationHandling.Replace default constructor values will be added to result
    var deserializeSettings = new JsonSerializerSettings {ObjectCreationHandling = ObjectCreationHandling.Replace};

    return JsonConvert.DeserializeObject<T>(JsonConvert.SerializeObject(source), deserializeSettings);
}

Html – Recommended way to embed PDF in HTML

This is quick, easy, to the point and doesn't require any third-party script:

<embed src="http://example.com/the.pdf" width="500" height="375" 
 type="application/pdf">

UPDATE (2/3/2021)

Adobe now offers it's own PDF Embed API.

https://www.adobe.io/apis/documentcloud/dcsdk/pdf-embed.html

UPDATE (1/2018):

The Chrome browser on Android no longer supports PDF embeds. You can get around this by using the Google Drive PDF viewer

<embed src="https://drive.google.com/viewerng/
viewer?embedded=true&url=http://example.com/the.pdf" width="500" height="375">

Best Answer

Related Solutions

C# – Deep cloning objects

Html – Recommended way to embed PDF in HTML

Related Topic