A look at the more challenging AI evaluations emerging in response to the rapid progress of models, including FrontierMath, Humanity's Last Exam, and RE-Bench (Tharin Pillay/Time)


Tharin Pillay / Time:

A look at the more challenging AI evaluations emerging in response to the rapid progress of models, including FrontierMath, Humanity’s Last Exam, and RE-Bench  —  Despite their expertise, AI developers don’t always know what their most advanced systems are capable of—at least, not at first.

Related Content

Examining Biden's tech legacy and the CHIPS Act, as the White House says $446B has been announced for chips and electronics manufacturing since he took office (Politico)

The British Army is trialing radio waves to zap drones out of the sky – at 13 cents per shot

Russia’s finance minister reveals bitcoin is being used to conduct foreign trade

Leave a Comment