[SPECIAL] Scientist vs. Storyteller: Benchmarking GPT 5.2, Claude 4.6, and Gemini 3.1 on Scientific Rigor

AI Unraveled: Latest AI News, Cha...

[SPECIAL] Scientist vs. Storytell...

[SPECIAL] Scientist vs. Storyteller: Benchmarking GPT 5.2, Claude 4.6, and Gemini 3.1 on Scientific Rigor

AI Unraveled: Latest AI News, ChatGPT, Gemini, Claude, DeepS... by Etienne Noumen

S33 · E82

Mar 6, 2026

33:34

Episode notes

🚀 Welcome to an AI Unraveled Special Report.

In this episode, we move beyond the "vibe check." We move beyond poetry and creative writing to ask the most important question in AI today: Can these models actually reason under strict scientific constraints?

We put four titans—Gemini 3.1 Pro, Claude Sonnet 4.6, GPT 5.1, and GPT 5.2—to the test on a structured scientific synthesis task involving the TRAPPIST-1 system, Richard Feynman’s methodology, and the physics of liquid water. The results reveal a massive divide between models that produce "fluent text" and models that demonstrate "genuine reasoning."

This episode is made possible by our sponsors:

🛑 AIRIA: As OpenAI secures $110 billion to build "stateful runtime environments" and Block ...

Keywords

Scientific Reasoning AI