AI benchmarks are broken. Here’s what we need instead.

2Apr/26Off

AI benchmarks are broken. Here’s what we need instead.

...

It soon became clear that the benchmark tests on which medical AI models are assessed do not capture how medical decisions are actually made. Hospitals rely on multidisciplinary teams—radiologists, oncologists, physicists, nurses—who jointly review patients. Treatment planning rarely hinges on a static decision; it evolves as new information emerges over days or weeks. Decisions often arise through constructive debate and trade-offs between professional standards, patient preferences, and the shared goal of long-term patient well-being. No wonder even highly scored AI models struggle to deliver the promised performance once they encounter the complex, collaborative processes of real clinical care. ...

When high benchmark scores fail to translate into real-world performance, even the most highly scored AI is soon abandoned to what I call the “AI graveyard.” ...

HAIC benchmarks reframe current benchmarking in four ways:

1. From individual and single-task performance to team and workflow performance (shifting the unit of analysis)

2. From one-off testing with right/wrong answers to long-term impacts (expanding the time horizon)

3. From correctness and speed to organizational outcomes, coordination quality, and error detectability (expanding outcome measures)

4. From isolated outputs to upstream and downstream consequences (system effects)

...

See the full story here: https://www.technologyreview.com/2026/03/31/1134833/ai-benchmarks-are-broken-heres-what-we-need-instead/

Filed under: Non-3D stories Comments Off

Comments (0) Trackbacks (0) ( subscribe to comments on this post )

Sorry, the comment form is closed at this time.

Trackbacks are disabled.

Manatt Launches AI Consulting and Testing Service Offering » « Nvidia CEO Jensen Huang’s advice to workers scared of AI: You’re just confusing your job with the tools you use to do it

Pages

If your company is an ETC member, you can log in and see more news posts at www.etcentric.org

philip lelyveld The world of entertainment technology

AI benchmarks are broken. Here’s what we need instead.

Pages

More posts