⭐ Star AlbumentationsX on GitHub — 448+ stars and counting!

Star on GitHub
apache

tika

apache/tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

3,780stars
Forks
932
Open issues
58
Watchers
3,780
Size
353.9 MB
JavaApache License 2.0
contentextractionjavametadatatika
Created: May 21, 2009
Updated: May 27, 2026
Last push: May 27, 2026