What Is OCR? How to Extract Text from a Scanned PDF

You receive a PDF and try to select some text — but instead of highlighting words, you drag a rectangle over what looks like an image. That's a scanned PDF, and the text you see is pixels, not data. OCR (Optical Character Recognition) is the technology that reads those pixels and converts them into actual text you can search, copy, and edit.

What Is OCR?

Optical Character Recognition is a technology that analyzes images containing text and converts the visual patterns of letters and numbers into machine-readable text data. It's been around since the 1970s, but modern OCR powered by machine learning is dramatically more accurate than earlier versions — capable of handling handwriting, multiple languages, complex layouts, and low-quality scans.

In the context of PDFs, OCR processes each page image and adds an invisible text layer behind the visual content. The page still looks exactly the same, but now the text is real — searchable by your PDF reader, copyable to the clipboard, and usable for conversion to Word or other formats.

How to Tell If Your PDF Needs OCR

The quickest test: open the PDF in your browser or any PDF reader and try to click and drag to select some text. If you can highlight individual words, the PDF already contains text data — OCR has either been run already or the document was created digitally. If clicking just places your cursor without selecting anything, or if dragging draws a rectangle over the page like selecting part of an image, it's a scanned PDF that needs OCR.

Another sign: Ctrl+F (Find) returns no results even though text is clearly visible on the page.

How to Run OCR at PDFToolShack

Open the PDF OCR tool
Upload your scanned PDF
Select the language of the document text — accuracy improves significantly when the correct language is specified
Click Run OCR — the tool processes each page and adds a text layer
Download the OCR'd PDF — it looks identical but now has searchable, copyable text

What Affects OCR Accuracy?

OCR accuracy depends heavily on the quality of the source scan. Key factors:

Scan resolution — 300 DPI is the minimum recommended for reliable OCR; 200 DPI often works but may produce errors on small text
Image clarity — blurry, skewed, or low-contrast scans produce more errors
Font type — standard serif and sans-serif fonts are recognized most reliably; unusual display fonts or heavy stylization may cause errors
Language selection — specifying the correct language trains the recognition engine on the right character patterns
Page condition — coffee stains, handwritten notes, fold marks, and torn edges all reduce accuracy

OCR vs. Extract Text — What's the Difference?

These are two different tools for two different problems:

Tool	Use When	What It Does
PDF OCR	PDF is scanned — text is an image	Reads image pixels, creates a text layer in the PDF
Extract Text	PDF already has text data — you just want it out	Pulls the existing text layer out as a plain text file

If you can already select text in your PDF, use Extract Text. If you can't select anything, use OCR first, then Extract Text if you need the raw text output.

After OCR: What You Can Do With the Text

Convert to Word

After OCR, convert the PDF to a fully editable Word document.

PDF to Word

Extract the Text

Pull the OCR'd text out as a plain text file for processing or indexing.

Extract Text

Compress the Result

OCR adds a text layer but doesn't increase image size — compress to reduce the file.

Compress PDF

Key Takeaways

OCR converts scanned image pages into PDFs with searchable, copyable text
If you can't select text in a PDF, it's a scanned document that needs OCR
Scan resolution (300 DPI minimum) and selecting the correct language both affect accuracy
OCR adds an invisible text layer — the page looks exactly the same after processing
Use Extract Text if the PDF already has text data; use OCR only for scanned documents
After OCR, you can convert to Word, extract the text, or compress the file

Make your scanned PDF searchable — free.

Run OCR in your browser. Your file never leaves your device.

PDF OCR Free