This is a framework for evaluating judges accuracy on unsupervised tasks. We compare the judges accuracy with the ground truth.
python eval.pyCan adjust eval.py to change the judge type, model name, etc.
Can create new judges in judges/ folder.
| Name | Name | Last commit date | ||
|---|---|---|---|---|