MinerU-HTML
opendatalab/MinerU-HTML
MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.
232stars
Forks
24
Open issues
1
Watchers
232
Size
3.2 MB
PythonApache License 2.0
article-extractorcorpus-toolsnlpragscrapingtext-extractiontrafilaturaweb-scrapingwebagent
Created: Nov 26, 2025
Updated: Apr 13, 2026
Last push: Mar 27, 2026