I'm using iTextSharp to read the text from a PDF file. However, there are times I cannot extract text, because the PDF file is only containing images. I download the same PDF files everyday, and I want to see if the PDF has been modified. If the text and modification date cannot be obtained, is a MD5 checksum the most reliable way to tell if the file has changed?
If it is, some code samples would be appreciated, because I don't have much experience with cryptography.
Best Answer
It's very simple using System.Security.Cryptography.MD5:
(I believe that actually the MD5 implementation used doesn't need to be disposed, but I'd probably still do so anyway.)
How you compare the results afterwards is up to you; you can convert the byte array to base64 for example, or compare the bytes directly. (Just be aware that arrays don't override
Equals
. Using base64 is simpler to get right, but slightly less efficient if you're really only interested in comparing the hashes.)If you need to represent the hash as a string, you could convert it to hex using
BitConverter
: