|
Net Q & AQuestion of the Month: January 2001How can I extract text from a PDF file for import into my word processor or other use? AnswerFirst let's make sure everyone understands what a PDF file is. PDF stands for "Portable Document Format." It is a file viewing format developed by Adobe Corporation. The idea of the software is to enable electronic distribution of documents in a way that they will be viewable even to those who don't own the software in which the document originated. For example, people without MS Word can use their PDF viewer to read an MS Word file that has been changed into PDF format. Here is an example of a PDF file. It is a conversion of an MS PowerPoint slide show about e-mail security issues that I gave at an AALL convention. Adobe Corp. distributes free viewer software, "Acrobat Reader," from their web site, in hopes of stimulating demand for their PDF file conversion software ("Acrobat"). There are several way to extract text from a PDF file:
Two problems can prevent these methods from working:
If you run into either of these problems, your best alternative may be to print out a paper copy, scan it and run it though OCR (Optical Character Recognition) software. Pam Gaines has a good article about PDF conversion at her web site, a previous MVP Site of the Month winner.
This page last revised:
December 30, 2000.
|
| |||||||||