⭐ Star AlbumentationsX on GitHub — 448+ stars and counting!

Star on GitHub
opendatalab

MinerU-HTML

opendatalab/MinerU-HTML

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

248stars
Forks
24
Open issues
4
Watchers
248
Size
3.2 MB
PythonApache License 2.0
article-extractorcorpus-toolsnlpragscrapingtext-extractiontrafilaturaweb-scrapingwebagent
Created: Nov 26, 2025
Updated: May 28, 2026
Last push: Mar 27, 2026