OCR (Optical Character Recognition)
Definition
AI technology that extracts text from images, PDFs and scanned documents.
Why It Matters
Most of the world's text is locked inside images and PDFs, scanned contracts, photographed receipts, screenshots, handwritten notes. OCR unlocks that content so it's searchable, editable, or feedable into another AI tool. Without OCR, vision-LLM inference costs more and is less accurate for clean text.
Key Points
- Tesseract 5 (2021) introduced an LSTM backend. Accuracy on clean, well-lit printed text: ~99 %. On degraded or skewed scans: 85–95 %.
- PaddleOCR is a newer open-source engine with stronger layout detection and better CJK character accuracy.
- Modern vision-LLMs (GPT-4V, Qwen-VL) perform implicit OCR, they read text directly from images without calling a separate OCR step.
- Document layout analysis (multi-column, tables, footnotes, mixed images) remains the hard part, raw character recognition is largely solved.
- Output formats: plain text, hOCR (with bounding-box coordinates), PDF with text overlay, JSON with per-word confidence scores.
Example
OCR a 50-page scanned PDF into plain text in seconds, then paste the text into a chat to summarise, translate, or query it. Engines like Tesseract handle 100+ languages; modern AI OCR adds layout preservation and handwriting recognition.
Common Misconception
PDF text extraction is not the same as OCR. PDFs with an embedded text layer (searchable PDFs) should be extracted with pdfminer or pdfplumber, running OCR on top of a text PDF adds noise and loses formatting. Use OCR only when the PDF is genuinely scanned images.
Related Terms
- Computer VisionAI that can understand and analyze images and video content.
- Multimodal AIAI models that can process multiple types of input, text, images, audio, video.
- NLP (Natural Language Processing)The field of AI focused on understanding and generating human language.
OCR (Optical Character Recognition) on Rewind.ai
Rewind.ai's OCR tool runs Tesseract for printed text and falls back to a vision-LLM for harder cases (skewed photos, mixed handwriting). Outputs straight into the chat or any downstream text tool.
Explore the ToolsQuick Facts
| Term | OCR (Optical Character Recognition) |
| Related | Computer Vision, Multimodal AI, NLP (Natural Language Processing) |