Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> A study from METR found that when developers used AI tools, they estimated that they were working 20% faster, yet in reality they worked 19% slower. That is nearly a 40% difference between perceived and actual times!

It’s not. It’s either 33% slower than perceived or perception overestimates speed by 50%. I don’t know how to trust the author if stuff like this is wrong.

 help



> I don’t know how to trust the author if stuff like this is wrong.

She's not wrong.

A good way to do this calculation is with the log-ratio, a centered measure of proportional difference. It's symmetric, and widely used in economics and statistics for exactly this reason. I.e:

ln⁡(1.2/0.81) = ln⁡(1.2)-ln⁡(0.81) ≈ 0.393

That's nearly 40%, as the post says.


so if the numbers were “99% slower than without AI but they thought they would be 99% fast”, you’d call that “they were 529% slower”, even though it doesn’t make sense to be more than 100% slower? And you’d not only expect everyone to understand that, but you really think it’s more likely a random person on the internet used a logarithmic scale than they just did bad math?

Well, this random person we are referring to happens to have a PhD in math from Duke.

I find that satisfying.


I get caught up personally in this math as well. Is a charitable interpretation of the throwaway line that they were off by that many “percentage points”?

That would be correct, but also useless. It matters if 50pp are 50% vs. 100%, 75% vs. 125% or 100% vs. 150%.

Can you elaborate? This seems like a simple mistake if they are incorrect, I'm not sure where 33% or 50% come from here.

Their math is 120%-80%=40% while the correct math is (80-120)/120=-33% or (120-80)/80=+50%

It’s more obvious if you take more extreme numbers, say: they estimated to take 99% less time with AI, but it took 99% more time - the difference is not 198%, but 19900%. Suddenly you’re off by two orders of magnitude.


It's not a mistake. It's correct, and is a excellent way to present this information.

Isn't the study a year old by now? Things have evolved very quickly in the last few months.

Yes and if was done with people using cursor at the time and already had a few caveats back then about who was actually experienced with the tool etc.

Still an interesting observation. It was also on brown field open source projects which imo explains a bit why people building new stuff have vastly different experiences than this.


The exact numbers certainly would be different today, but you would probably still see the effect that there’s an overestimation of productivity

Yes. No agents, no deep research, no tools, and just Sonnet-3.5 and 3.7 - I’d love to see the same study today with Opus-4.6 and Codex-5.3

Probably 38% slower now...

Please don’t project. :)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: