Star AlbumentationsX on GitHub — it powers this leaderboard
Pythagora-io/eval-tool
Evaluation tool for testing prompts against multiple LLMs