This example shows you how to capture text areas and table lines/columns from a PDF document.
The directory includes the following files :
- sample-report.pdf : the sample PDF file used in this example.
- sample-report.doc : the original Microsoft Word document that was used to generate sample-report.pdf
- sample-report.xml : the Capture definitions file that specifies what is to be captured (in XML format)
- example.php : the PHP script that takes as input sample-report.pdf and sample-report.xml to extract only the information you want
- sample-report.txt : the output of a previous run of the PdfToText class against file sample-report.pdf, with the PDFOPT\_DEBUG\_SHOW\_COORDINATES option. It gives every block of text found in the input document, with its (x,y) coordinates and width/height. This information is really useful when you have to design a Capture definitions file because it requires such information.
This example may not be the best for you, because in the current version (1.6.0), all the columns in file sample-report.pdf are interpreted as a single column. This issue will be fixed in a future release, probably 1.6.1