⭐ Star AlbumentationsX on GitHub — 307+ stars and counting!

Star on GitHub
apache

tika

apache/tika

The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).

3,693stars
Forks
921
Open issues
59
Watchers
3,693
Size
340.2 MB
JavaApache License 2.0
contentextractionjavametadatatika
Created: May 21, 2009
Updated: Apr 14, 2026
Last push: Apr 13, 2026