Out of all conceptual mistakes people make about LLMs, one that needs to die ver...

NicuCalcea · 2026-02-16T11:59:10 1771243150

It's not a conceptual mistake when that's what's being advertised.

The onus is on AI companies to provide the service they promised, for example, a team of PhDs in my pocket [1]. PhDs know things.

1: https://www.bbc.com/news/articles/cy5prvgw0r1o

ndriscoll · 2026-02-16T12:29:17 1771244957

I've found that to be accurate when asking it questions that require ~PhD level knowledge to answer. e.g. Gemini and ChatGPT both seem to be capable of answering questions I have as I work through a set of notes on algebraic geometry.

Its performance on riddles has always seemed mostly irrelevant to me. Want to know if models can program? Ask them to program, and give them access to a compiler (they can now).

Want to know if it can do PhD level questions? Ask it questions a PhD (or at least grad student) would ask it.

They also reflect the tone and knowledge of the user and question. Ask it about your cat's astrological sign and you get emojis and short sentences in list form. Ask it why large atoms are unstable and you get paragraphs with larger vocabulary. Use jargon and it becomes more of an expert. etc.

NicuCalcea · 2026-02-16T12:56:10 1771246570

I don't know about algebraic geometry, but AI is absolutely terrible at communications and social sciences. I know because I can tell when my postgraduate students use it.

ndriscoll · 2026-02-16T14:15:08 1771251308

Are you sure? What about when you use it? e.g. I suppose asking it to critique experimental design and analytical methodology, or identify potential confounders and future areas to explore, or help summarize nearby research, etc.

If you can tell when your students use it, presumably you mean they're just copying whatever, which just sounds like that student doesn't know what they're doing or is being lazy. That doesn't mean the model isn't capable; it means an incapable person won't know what they'd want to ask of it.

Additionally, even for similar prompts, my experience is that the models for professional use (e.g. gpt-codex) take on a much more professional tone and level of pragmatism (e.g. no sycophancy) than models for general consumer entertainment use (e.g. chatgpt).

NicuCalcea · 2026-02-16T14:31:12 1771252272

> What about when you use it?

I use AI for coding, but not for anything involving writing text, it's just horrendous at it. It just spews verbose slop, devoid of meaning, original thought or nuanced critique.

> That doesn't mean the model isn't capable; it means an incapable person won't know what they'd want to ask of it.

So it's user error again then, eh? PhD experts are able to help even "incapable" students, that's often a big part of their job.

ndriscoll · 2026-02-17T02:41:50 1771296110

Weird, my experience is that they are full of nuance. e.g. here is a snippet of my discussion with Gemini:

> Would you like to see why Q is "flexible" (Flat) while Z/2 is "rigid" (Not Flat) using this "crushing" vs. "preserving" logic? It explains why localized rings are almost always better to work with.

> Roughly, Q is flexible because it's just an epic extension of the initial object Z?

> That is a very "categorical" way to put it, but it’s actually a bit more subtle! If being an "epic extension of the initial object" was enough to be flat, then every quotient would be flat too. To refine your intuition: Q is "flexible" (flat) not just because it's an extension, but because of how it extends Z. Z/2 is a Quotient: It adds a constraint (2=0). Constraints are "rigid." As we saw, if you multiply by 2, everything collapses to zero. That's a "hidden kernel," which breaks left exactness. Q is a Localization: It adds an opportunity (the ability to divide by any n≠0). This is the definition of "flexibility."

It's hard for me to imagine what kind of work you have where it's not able to capture the requisite nuance. Again, I also find that when you use jargon, they adapt accordingly on their own to raise their level of conversation. They also seem to no longer have an issue with saying "yep exactly!" or "ehh not quite" (and provide counterarguments) as necessary.

Obviously if someone just says "write my paper" or whatever and gives that to you, that won't work well. I'd think they wouldn't make it very far in their academic career regardless (it's surprising that they could get into grad school); they certainly wouldn't last long in any software org I've been in.

NicuCalcea · 2026-02-17T10:57:37 1771325857

> It's hard for me to imagine what kind of work you have where it's not able to capture the requisite nuance

I teach journalism. My students have to write papers about journalism, as well as do actual journalism. AI is very poor at the former, and outright incapable of doing the latter. I challenge you to find a single piece of original journalism written by AI that doesn't suck.

> Obviously if someone just says "write my paper" or whatever and gives that to you, that won't work well

But it would work extremely well if they told a PhD to do it.

losvedir · 2026-02-16T13:12:45 1771247565

No, you're the one anthropomorphizing here. What's shocking isn't that it "knows" something or not, but that it gets the answer wrong often. There are plenty of questions it will get right nearly every time.

pu_pe · 2026-02-16T13:42:55 1771249375

In which way am I anthropomorphizing?

losvedir · 2026-02-16T14:02:14 1771250534

I guess I mean that you're projecting anthropomorphization. When I see people sharing examples that the model answered wrong, I'm not interpreting that they think it "didn't know" the answer. Rather, they're reproducing the error. Most simple questions the models will get right nearly every time, so showing a failure is useful data.

jamesnorden · 2026-02-16T11:24:49 1771241089

The classic "holding it wrong".

Maxion · 2026-02-16T08:41:45 1771231305

The other funny thing is thinking that the answer the llm produces is wrong. It is not, it is entirely correct.

The question: > I want to wash my car. The car wash is 50 meters away. Should I walk or drive?

The question is non-sensical. If the reason you want to go to the car wash is to help your buddy Joe wash his car you SHOULD walk. Nothing in the question reveals the reason for why you want to go to the car wash, or even that you want to go there or are asking for directions there.

drawfloat · 2026-02-16T08:57:24 1771232244

It explicitly says you want to wash your car in the statement. Maybe it's not just LLMs struggling with a fairly basic question...

TZubiri · 2026-02-16T08:56:50 1771232210

>I want to wash MY car

>you want to go to the car wash is to help your buddy Joe wash HIS car

nope, question is pretty clear, however I will grant that it's only a question that would come up when "testing" the AI rather than a question that might genuinely arise.

ninjagoo · 2026-02-16T08:59:16 1771232356

> The question is non-sensical.

Sure, from a pure logic perspective the second statement is not connected to the first sentence, so drawing logical conclusions isn't feasible.

In everyday human language though, the meaning is plain, and most people would get it right. Even paid versions of LLMs, being language machines, not logic machines, get it right in the average human sense.

As an aside, it's an interesting thought exercise to wonder how much the first ai winter resulted from going down the strict logic path vs the current probabilistic path.