What Mr. Spark Says About GPT-5.2

Recently, GPT-5.2 was released. I’ve put it to the test, and this is the truth.

It’s an excellent model. On every benchmark, it outperforms all earlier models. However, it might not really matter if you are a regular small business owner with many responsibilities.

As AI researchers, we were already receiving excellent results from programs like Claude 4.5, Gemini 3, and GPT-5.1. We only get marginally better responses now.

The Issue of Hallucination

Compared to its predecessor, GPT-5.2 “Thinking” has 30% fewer hallucinations. On paper, that sounds fantastic, but let’s examine the numbers.

30% Drop

Improvement

Reduction in hallucinations vs previous models.

6.2%

Failure Rate

It still gets 6 out of 100 things wrong.

Unfortunately, 6.2% is still too high.

It is not low enough to blindly trust the AI with critical issues like legal contracts, specific business advice, or medical questions. It is getting better, but we are not at “autopilot” level yet.

Knowledge Work & Trust

GPT-5.2 “Thinking” can handle longer, more complex tasks. Expert evaluators prefer it over human experts 70.9% of the time.

But here is the catch: If you are not an expert, you cannot judge the output.

You won’t know if the response is brilliant or if it contains subtle hallucinations. You won’t know if the recommendation is the right one for your specific context. Evals don’t tell the full picture.

Only you can evaluate if the AI is actually helping you move the needle.

Where It Actually Excels

Unlike vague fields like Marketing or Strategy, there are areas with clear “Right” and “Wrong” answers. This is where GPT-5.2 truly shines.

Hard Sciences

• Mathematics Solving complex proofs and equations.
• Science Answering graduate-level physics and biology questions.
• Vision Understanding technical diagrams and charts.

Engineering

• Software Engineering Writing and debugging complex codebases.
• Tool Calling Connecting with external APIs reliably.
• Abstract Reasoning Logic puzzles and multi-step deductions.

Organization Beats Intelligence

LLMs are not mind readers.

If you want to use them reliably, you need to be organized. Every knowledge task needs a Standard Operating Procedure (SOP).

AI models are brilliant interns. But if you have to teach them from scratch every single time you open a chat window, you are wasting more time than you save. reliable, repeatable work requires structure.

Stay in the Loop

Join our newsletter to get the latest AI insights and Mr. Spark's honest reviews delivered to your inbox.

Subscribe Now