PDFpenPro logo PDFpenPro logo

Help: OCR (Optical Character Recognition)

OCR (Optical Character Recognition) is the process of converting a bitmap image of text (like a scanned document) into text that can be selected, copied and searched by PDFpenPro and other text editing software. Once the text has been recognized by OCR, it is placed on an invisible layer above the image of text that you can see. When you copy text, the text is copied from this invisible OCR layer. OCR technology will not produce a perfect rendering of the bitmapped text. You will need to proofread and edit the text that results from OCR.

Using OCR in PDFpenPro

  1. Open a scanned PDF in PDFpenPro.
  2. An alert box opens with the message:
    "This document appears to be scanned. Would you like to perform optical character recognition (OCR) on it? OCR will allow you to select the text."
    You have three options:
    • Cancel:
      No OCR will be performed.
    • OCR Page:
      OCR will be performed on the current page.
    • OCR Document:
      If your document has multiple pages, OCR will be performed on all of the pages.

    Pick which languages are recognized by OCR in Preferences > OCR.

While PDFpenPro is performing the OCR, a progress bar will appear. The operation can take a few seconds or much longer, depending on the size and contents of the scanned document.

To perform OCR manually, choose Edit > OCR Page. PDFpenPro commences to perform the OCR operation and the progress bar appears.

Selecting, copying and correcting OCR Text

Once OCR is finished, the document’s text can be edited like any other text. To make visible text changes use Correct Text, details in Working with Text.

Searching OCR Text

The text generated by the OCR operation can be searched like any other text. See Searching Within A PDF.

Tips to Improve the OCR Results of Your Document:

  • The quality of the original document affects the quality of the OCR performance. Crisp, clean originals with clear text will produce much better results than crumpled, faded photocopies.
  • Place your original document on the scanner as straight as possible. If you have a scanned page that is not straight, you can "deskew", or straighten, the image in PDFpenPro by choosing Edit > Deskew and Adjust Image
  • Increase the contrast of your scanned document so that the background is as white as possible. You can adjust the contrast of the image by choosing Edit > Deskew and Adjust Image

Forcing OCR

PDFpenPro looks at the document and if it sees one image the size of a page, it assumes that the document is a scan and automatically offers to perform OCR. In some cases, PDFpenPro may not recognize a scanned document. Under the Edit menu, OCR Page will be grayed out and unavailable to select.

  1. Hold down the Command and Option keys together.
  2. Choose Edit > OCR Page from the menu.

Viewing the OCR Text Layer

Once text has been recognized by the OCR process, it is placed on an invisible layer above the image of text that you can see. When you copy text, the text is copied from this invisible OCR text layer.

Text from the OCR text layer is a close, but not perfect, rendering of the bitmapped text. You will need to proofread and edit the text that results from OCR. When you copy and paste the OCR text, you may notice some inaccuracies which you can correct at that time.

View the OCR text layer:

  1. From the View menu choose OCR layer. A layer of text will appear over your document, showing the normally invisible OCR text.

Remove the OCR Layer

To completely remove the OCR layer from a document:

  1. Open the Edit menu and choose Clear OCR Layer… (Cmd+Opt+O).

At this point, you may redo OCR, or use the document as is. If you want to remove the OCR from a document to redo it, you may Force OCR.

Editing the OCR Text Layer (PDFpenPro Only)

Make corrections to the OCR text layer.

  1. From the View menu choose to view OCR info. A layer of text will appear over your document, showing the normally invisible OCR text.
  2. Select some text and a popup window will appear with options for editing the text one word or line at a time.

Changes to the OCR text layer are not the same as changes made using the Correct Text tool since changes to the OCR text layer are not made to the visible text of the document.

Also, like using the Correct Text tool, this is aimed at correcting typos and small errors, not reformatting an entire document. For layout changes and major edits, export the document to Word format, and make changes in a word processor.


© 2003-2017 SmileOnMyMac, LLC dba Smile. All rights reserved.
PDFpen and PDFpenPro are registered trademarks of Smile. The Smile logo is a trademark of Smile.