Skip to content

Tufalabs/UnsupervisedJudgeBench

Repository files navigation

UnsupervisedJudgeBench

Judge Bench

This is a framework for evaluating judges accuracy on unsupervised tasks. We compare the judges accuracy with the ground truth.

Usage

python eval.py

Can adjust eval.py to change the judge type, model name, etc.

Can create new judges in judges/ folder.

LLMJudge

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages