MinerU-HTML
opendatalab/MinerU-HTML
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
209stars
Forks
23
Open issues
2
Watchers
209
Size
0.1 MB
HTMLApache License 2.0
article-extractorcorpus-toolsnlpragscrapingtext-extractiontrafilaturaweb-scrapingwebagent
Created: Nov 26, 2025
Updated: Feb 23, 2026
Last push: Dec 25, 2025