PaddleOCR

Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.

70,888stars

Homepage View on GitHub

Forks

9,836

Open issues

251

Watchers

520

Size

1750.7 MB

PythonApache License 2.0

ocrchineseocrpdf2markdownpp-ocrpp-structuredocument-parsingdocument-translationkieai4sciencepdf-extractor-ragpdf-parserragpaddleocr-vl

Created: May 8, 2020

Updated: Feb 18, 2026

Last push: Feb 16, 2026