⭐ Star AlbumentationsX on GitHub — 307+ stars and counting!

Star on GitHub
opendatalab

MinerU-HTML

opendatalab/MinerU-HTML

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

232stars
Forks
24
Open issues
1
Watchers
232
Size
3.2 MB
PythonApache License 2.0
article-extractorcorpus-toolsnlpragscrapingtext-extractiontrafilaturaweb-scrapingwebagent
Created: Nov 26, 2025
Updated: Apr 13, 2026
Last push: Mar 27, 2026