Star AlbumentationsX on GitHub — it powers this leaderboard
kovidgoyal/html5-parser
Fast C based HTML 5 parsing for python