"Maybe it’s just that the bombastic name of the benchmark rubbed me the wrong way."
I think a lot of people are with you on this. My own tiny gripe: if you're going to name something Humanity's Last Exam, don't have multiple choice questions. It turns whatever the problem was into a search problem, and potentially a very trivial one.
Great post! Really insightful breakdown.
"Maybe it’s just that the bombastic name of the benchmark rubbed me the wrong way."
I think a lot of people are with you on this. My own tiny gripe: if you're going to name something Humanity's Last Exam, don't have multiple choice questions. It turns whatever the problem was into a search problem, and potentially a very trivial one.