This trick went viral on TikTok last week, and it has already been patched. To g...

mpalmer · 2026-02-16T15:42:14 1771256534

By "patched", you can't mean they added something to the internal prompt to show it how to answer this one specific question?!

pizzafeelsright · 2026-02-16T16:12:23 1771258343

Absolutely. There is a preflight guardrail that steers specific words, phrases, concepts with tweaked output.

ponco · 2026-02-18T00:59:11 1771376351

I've speculated about this myself, but haven't heard anyone actually discuss it or reveal/leak this is the case. Do you have a source for this?

2OEH8eoCRo0 · 2026-02-16T16:48:50 1771260530

Such AGI wow!

OrangeMusic · 2026-02-17T07:17:33 1771312653

This is pure speculation.

The fact that you can still reproduce the issue doesn't give it a lot of credibility.

MagicMoonlight · 2026-02-16T21:15:22 1771276522

Why do you think they’re on GPT 5.2 now?

pvillano · 2026-02-16T19:59:40 1771271980

"Stupid Pencil Maker" by Shel Silverstein

Some dummy built this pencil wrong,

The eraser's down here where the point belongs,

And the point's at the top - so it's no good to me,

It's amazing how stupid some people can be.

locallost · 2026-02-16T13:15:35 1771247735

I was able to reproduce on ChatGPT with the exact same prompt, but not with the one I phrased myself initially. Which was interesting. I tried also changing the number and didn't get far with it.

softwaredoug · 2026-02-16T14:07:50 1771250870

I just got the “you should walk” result on ChatGPT 5.2

fireflash38 · 2026-02-16T13:29:48 1771248588

To me, the "patching" that is happening anytime some finds an absolutely glaring hole in how AIs work is so intellectually dishonest. It's the digital equivalent of house flippers slapping millennial gray paint on structural issues.

It can't math correctly, so they force it to use a completely different calculator. It can't count correctly, unless you route it to a different reasoning. It feels like every other week someone comes up with another basic human question that results in complete fucking nonsense.

I feel like this specific patching they do is basically lying to users and investors about capabilities. Why is this OK?

onionisafruit · 2026-02-16T14:40:16 1771252816

Counting and math makes sense to add special tools for because it’s handy. I agree with your point that patching individual questions like this is dishonest. Although I would say it’s pointless too. The only value from asking this question is to be entertained, and “fixing” this question makes the answer less entertaining.

tlogan · 2026-02-16T16:36:37 1771259797

From a technological standpoint, it is pointless. But from a marketing perspective, it is very important.

Take this trick question as an example. Gemini was the first to “fix” the issue, and the top comment on Hacker News is praising how Gemini’s “reasoning” is better.

palmotea · 2026-02-16T20:50:33 1771275033

> The only value from asking this question is to be entertained, and “fixing” this question makes the answer less entertaining.

You're thinking like a user. The people doing the patching are thinking like a founder trying to maintain the impression that this is a magical technology that CEOs can use to replace all their workers.

You don't have as much money to spend as the CEOs, so they don't care about your entertainment.

lofaszvanitt · 2026-02-16T14:06:50 1771250810

No, you are wrong. AGI is at our doorsteps! /s

keeda · 2026-02-16T21:40:30 1771278030

I got the "you should walk" answer 4 out of 5 times with free ChatGPT, until I told it to, basically, "think carefully": https://news.ycombinator.com/item?id=47040530

tantalor · 2026-02-16T20:51:16 1771275076

"patched" = the answer is in search results

markstos · 2026-02-16T14:47:29 1771253249

Ah yes, one of those novelty reversible cups.

olivia-banks · 2026-02-16T16:50:04 1771260604

This is a trick cup, so it's okay to have a laugh.

Rapzid · 2026-02-17T00:06:11 1771286771

Patched where; 4 models were responses were posted. Also, Azure deployed models are absolutely not "patched" on the fly; they are rarely updated and the dates are baked into the full sku.

"Patching" could be happening in "general public" tools but honestly sounds a lot like "Bro science".

beaugunderson · 2026-02-17T04:24:07 1771302247

still failed for me on opus 4.6 extended a second ago.

when i prompted about how walking would mean leaving my car behind the "thinking" done before coming to the right conclusion was:

> lmao, fair point. the user is right - you need to bring the car to the car wash. that's a legitimate correction. own it.