.

Navigation  

Research  
Marketing  
Communities  
Net Tools  
Security  
Seminars  
Our Services  
Search  
.
Bookstore  

Check out our bookstore, operated with our associate, Amazon.com:

 

Net Q & A

Question of the Month: January 2001

How can I extract text from a PDF file for import into my word processor or other use?

Answer

First let's make sure everyone understands what a PDF file is. PDF stands for "Portable Document Format." It is a file viewing format developed by Adobe Corporation. The idea of the software is to enable electronic distribution of documents in a way that they will be viewable even to those who don't own the software in which the document originated. For example, people without MS Word can use their PDF viewer to read an MS Word file that has been changed into PDF format. 

Here is an example of a PDF file. It is a conversion of an MS PowerPoint slide show about e-mail security issues that I gave at an AALL convention.

Adobe Corp. distributes free viewer software, "Acrobat Reader," from their web site, in hopes of stimulating demand for their PDF file conversion software ("Acrobat").  

There are several way to extract text from a PDF file:

bulletCopy the text you want and paste it into a destination file. 
bulletFor heavy duty use, an Acrobat plug-in called Gemini works well at exporting information from PDF files.
bulletThe Access section of the Adobe web site provides explanations of several other methods, including a form for converting a PDF file accessible via the Internet, and sending files by e-mail for conversion.

Two problems can prevent these methods from working:

bulletThe original file may have been a scanned document that was not run through OCR. To your computer, it appears to be just a picture, not a document containing recognizable text. 
bulletThe creator of the file may have secured the contents so that text and graphics cannot be selected. This is an Acrobat security option. You can select File | Document Info | General and look at the "Producer" entry to learn about the document's creator.

If you run into either of these problems, your best alternative may be to print out a paper copy, scan it and run it though OCR (Optical Character Recognition) software.

Pam Gaines has a good article about PDF conversion at her web site, a previous MVP Site of the Month winner.

Jerry Lawson

Send us your question. We'll select the best each month and answer it here. On request, questions will be edited to conceal the questioner's identity.

 

View Q & A Archives.

This page last revised: December 30, 2000.

 

homeresearch | marketing | communities |  net tools |  securitybookstore

Internet Tools for Lawyers
http://www.netlawtools.com


Webmaster
© 1996-2005 by Netlawtools, Inc. All rights reserved.