Model Performance vs. Price

AI performance on a set of Ph.D.-level science questions

GPQA Diamond accuracy

Loading data...