Star AlbumentationsX on GitHub — it powers this leaderboard

Star on GitHub
← Back to leaderboard
huggingface

datatrove

huggingface/datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

2,910stars
Forks
246
Open issues
83
Watchers
2,910
Size
33.7 MB
PythonApache License 2.0
Created: Jun 14, 2023
Updated: Feb 28, 2026
Last push: Feb 25, 2026