This question is for referencing and comparing. The solution is the accepted answer below.
Many hours have I searched for a fast and easy, but mostly accurate, way to get the number of pages in a PDF document. Since I work for a graphic printing and reproduction company that works a lot with PDFs, the number of pages in a document must be precisely known before they are processed. PDF documents come from many different clients, so they aren't generated with the same application and/or don't use the same compression method.
Here are some of the answers I found insufficient or simply NOT working:
Using Imagick (a PHP extension)
Imagick requires a lot of installation, apache needs to restart, and when I finally had it working, it took amazingly long to process (2-3 minutes per document) and it always returned 1
page in every document (haven't seen a working copy of Imagick so far), so I threw it away. That was with both the getNumberImages()
and identifyImage()
methods.
Using FPDI (a PHP library)
FPDI is easy to use and install (just extract files and call a PHP script), BUT many of the compression techniques are not supported by FPDI. It then returns an error:
FPDF error: This document (test_1.pdf) probably uses a compression technique which is not supported by the free parser shipped with FPDI.
Opening a stream and search with a regular expression:
This opens the PDF file in a stream and searches for some kind of string, containing the pagecount or something similar.
$f = "test1.pdf";
$stream = fopen($f, "r");
$content = fread ($stream, filesize($f));
if(!$stream || !$content)
return 0;
$count = 0;
// Regular Expressions found by Googling (all linked to SO answers):
$regex = "/\/Count\s+(\d+)/";
$regex2 = "/\/Page\W*(\d+)/";
$regex3 = "/\/N\s+(\d+)/";
if(preg_match_all($regex, $content, $matches))
$count = max($matches);
return $count;
/\/Count\s+(\d+)/
(looks for/Count <number>
) doesn't work because only a few documents have the parameter/Count
inside, so most of the time it doesn't return anything. Source./\/Page\W*(\d+)/
(looks for/Page<number>
) doesn't get the number of pages, mostly contains some other data. Source./\/N\s+(\d+)/
(looks for/N <number>
) doesn't work either, as the documents can contain multiple values of/N
; most, if not all, not containing the pagecount. Source.
So, what does work reliable and accurate?
Best Answer
A simple command line executable called: pdfinfo.
It is downloadable for Linux and Windows. You download a compressed file containing several little PDF-related programs. Extract it somewhere.
One of those files is pdfinfo (or pdfinfo.exe for Windows). An example of data returned by running it on a PDF document:
I haven't seen a PDF document where it returned a false pagecount (yet). It is also really fast, even with big documents of 200+ MB the response time is a just a few seconds or less.
There is an easy way of extracting the pagecount from the output, here in PHP:
Of course this command line tool can be used in other languages that can parse output from an external program, but I use it in PHP.
I know its not pure PHP, but external programs are way better in PDF handling (as seen in the question).
I hope this can help people, because I have spent a whole lot of time trying to find the solution to this and I have seen a lot of questions about PDF pagecount in which I didn't find the answer I was looking for. That's why I made this question and answered it myself.