Christian Vigh - 2017-05-09 15:13:41 - In reply to message 1 from Rudolfo Toscano
PDF files can have two types of passwords :
- The User password, which will be prompted when you try to open the document
- The Owner password, which will be prompted when you try to modify something in the original PDF file. It can also define flags that prevent you from printing, or from copying/pasting information.
Unfortunately, in both cases, the text contents are encrypted. Of course, there is an algorithm to decrypt them (especially when there is only an Owner password, which should not theorically prevent text extraction). However, I did not finish the implementation and to tell the truth, I'm a little bit scratching my head at it.
But I'm confident that a future version of PdfToText will be able to handle soon such situations : this is on my top 3 BIG priorities...
Rudolfo Toscano - 2017-05-10 09:45:56 - In reply to message 1 from Rudolfo Toscano
Thank you for your immediate response. As you are the specialist, place let me place an additional question:
Why does your converter produce a blank out, but all pdf reader are able to open such protected documents?
Thank you for your explanation.
Christian Vigh - 2017-05-10 10:08:46 - In reply to message 3 from Rudolfo Toscano
Because I did not yet implement the decryption algorithm for that. I think that even if I get inspiration from Unix tools such as xpdf or poppler, or even TCPDF in PHP, I'll need at least one week for that, not counting the various encryption algorithm and revisions that spread the world of PDF !
It's not a problem of complexity : virtually, you don't even have to know the original password to decrypt information (Adobe did not use complex cryptographic agorithms ; it encrypts information by applying a sequence of transformations that should discourage the amateur from decrypting the contents).
So my real problem is to find at least 7 consecutive days on my free time to work on that.
But I know that being able to handle PDF files having an Owner password is a must-have ! so I hope I will have a solution soon...