C# – iTextSharp – Crop PDF File (C#)

citextsharppdf

I want to crop PDF File using iTextSharp and rectangle (0,1000,600,155). Everything is fine and when you open created *.pdf file you can see only that cropped content, BUT! If you parse that pdf, there are still information and text from not visible part of document, I can't accept that. How can I remove that data completly?

Here is my code sample:

        static void cropiTxtSharp(){
        string file ="C:\\testpdf.pdf";
        string oldchar = "testpdf.pdf";
        string repChar = "test.pdf";
        PdfReader reader = new PdfReader(file);
        PdfDictionary pageDict;
        PdfRectangle rect = new PdfRectangle(0, 1000, 600, 115);
        pageDict = reader.GetPageN(1);
        pageDict.Put(PdfName.CROPBOX, rect);
        PdfStamper stamper = new PdfStamper(reader, new FileStream(file.Replace(oldchar, repChar), FileMode.Create, FileAccess.Write));
        stamper.Close();
        reader.Close();
    }

EDIT:
Here is code which works, I spend some hours but finally I did it 😛

First, add following to project:

using iTextSharp.text.pdf;
using iTextSharp.text;
using iTextSharp.xtra.iTextSharp.text.pdf.pdfcleanup;

Then you can use my code:

    static void textsharpie()
    {
        string file = "C:\\testpdf.pdf";
        string oldchar = "testpdf.pdf";
        string repChar = "test.pdf";
        PdfReader reader = new PdfReader(file);
        PdfStamper stamper = new PdfStamper(reader, new FileStream(file.Replace(oldchar, repChar), FileMode.Create, FileAccess.Write));
        List<PdfCleanUpLocation> cleanUpLocations = new List<PdfCleanUpLocation>();
        cleanUpLocations.Add(new PdfCleanUpLocation(1, new iTextSharp.text.Rectangle(0f, 0f, 600f, 115f), iTextSharp.text.BaseColor.WHITE));
        PdfCleanUpProcessor cleaner = new PdfCleanUpProcessor(cleanUpLocations, stamper);
        cleaner.CleanUp();
        stamper.Close();
        reader.Close();
    }

Unfortunatelly I can't use that code if I want to commercialize my application without paying for license, so I had to think on different library…

Best Answer

What you're doing is setting the CropBox of the page, which does absolutely nothing to the content of the document. This is by design and was always like that since Acrobat 1.0.

What you want to do is called redaction (or in your case, exclusive redaction since you want to remove everything outside the bounds of a rectangle). It is decidedly non-trivial to do correctly, mostly because of issues with content that partially overlaps the bounds to which to want to redact (images, text, and paths).

Related Topic