sample script to extract text from pdf files using PERL?

0 ⤊

sample script to extract text from pdf files using PERL?

I tried a lot thru CPAN to extract text from pdf .but the result was only junks.can any one pls help me how to extract the text from pdf

2006-08-31 23:53:56 · 6 answers · asked by jenishjeni 2 in Computers & Internet ➔ Programming & Design

6 answers

jenishjeni Try this link

http://www.google.co.uk/search?hl=en&q=sample+script+to+extract+text+from+pdf+files+using+PERL&btnG=Search&meta=

2006-08-31 23:57:15 · answer #1 · answered by Joe_Young 6 · 0⤊ 0⤋

Perl Pdf To Text

2016-12-12 11:40:25 · answer #2 · answered by emmit 4 · 0⤊ 0⤋

If the PDF does not have text embedded in it, it is a scanned image of words (Like Dr.House says) then you'll need an Optical Character Recognition (OCR) program to try to translate the pixels that form letters into text characters.

But if it's a text-based PDF instead, like this one: http://www.norvig.com/InternetSearching.pdf
then you'll know because you can highlight the text (one line at a time - it won't make just a square box) using the "SeIect Tool" in the free Acrobat Reader. You can copy and paste the text that way. If you want to copy all the text in the document go to Acrobat's menu and select "View > Page Layout > Continuous" then go to "Edit > Select All" and finally, "Edit > Copy". You can then paste the text into notepad or some other text editor.

PDFs can contain text, images and even movies and sound. They support different kinds of compression methods (which maybe why the Perl module you tried having problems) see the two sources below:

2006-09-01 18:18:49 · answer #3 · answered by George3 4 · 0⤊ 0⤋

I had this same problem, and ended up using an external program to get the best results. The program I used was pdftotext available at http://www.foolabs.com/xpdf/download.html

Call this program externally as a system call, and then just parse the results in your PERL program. I found this to be much simpler than messing with the ton of poor PERL PDF decoding modules.

2006-09-02 17:15:17 · answer #4 · answered by milliner 2 · 0⤊ 0⤋

You CANT extract text from a pdf file! There IS NO text in it! A PDF file is a picture of text! Let me explain, PDF files are created by scanning a document. A scanner takes a picture or image of a document. People scan documents to make printable images because it is easier to do this than it is to retype it in a text editor! The file extension speaks for itself! PDF Photo Deluxe File

2006-09-01 00:02:52 · answer #5 · answered by Anonymous · 0⤊ 1⤋

The easiest way to get a hold of Acrobat Pro. It allows you to copy and paste from a pdf.

Or get any of the other free pdf editors.

2006-08-31 23:58:34 · answer #6 · answered by Eric F 3 · 0⤊ 0⤋

Yes its very possible.

2016-03-17 06:01:20 · answer #7 · answered by Anonymous · 0⤊ 0⤋