SAT Time. Robots, Boot Yourselves

May 12, 2026

ScienceDaily reports that, “Scientists Built The Hardest AI Test Ever And the Results Surprising” or, in other words, they decided to put through AI through standardized testing. The hardest test in the world for AI was designed by nearly 1000 researchers. It is called “Humanity’s Last Exam” (HLE) and it’s 2500 questions. The subjects covered are specialized academic fields, ancient languages, natural sciences, humanities, and mathematics. The test was made to remind us that intelligence is more than pattern recognition and memorization. It is about specialized expertise, context, and depth.

HLE’s questions were tested against the Big AI Tech (BAIT) chatbots. If any of them answered a question correctly, that specific question was removed. That ensured the test remained difficult and right at the edge of the chatbots’ capabilities. Here’s what that accomplished:

“Early testing confirmed that the strategy worked. Even powerful AI models struggled with the exam. GPT-4o achieved a score of 2.7 percent, while Claude 3.5 Sonnet reached 4.1 percent. OpenAI’s o1 model performed somewhat better with 8 percent. The most capable systems so far, including Gemini 3.1 Pro and Claude Opus 4.6, have reached accuracy levels between about 40 percent and 50 percent.”

Why did this particular test return what seem to be shocking results? Most of the AI tests are a bit like the butcher in Campinas, Brazil, in 1953. My mother would specify an amount of meat and the friendly person behind the counter used a thumb to prove that the amount order was indeed on the scale.

Was this intentional? You bet your life as Groucho Marx once said. Fiddling data is the name of the game in some fancy software systems. Oh, the results don’t look right. Let’s change this threshold value. Oh, the model is providing information about self harm. Let’s filter that before displaying a result. You get the idea. AI absolutely has to be shaped. Why? It outputs errors. It makes up information. It presents content to make the human dependent or over confident or captured by wonky outputs.

Making AI appear smart is about making money and gaining control. Creating tests that prove how smart AI is works until someone says, “Hey, let’s make a test that does not pander to the big AI tech companies.”

Whitney Grace, May 12, 2026

Comments

Got something to say?





  • Archives

  • Recent Posts

  • Meta