It’s shocking to me that people even ask this type of question. How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
Because I've seen the results ? Failure mode of LLMs are unintuitive, and the ability to grasp the big picture is limited (by context mostly I'd say), but I find CC to follow instructions better than 80% of people I've worked with. And the amount of mental stamina it would take to grok that much context even when you know the system vs. what these systems can do in minutes.
As for the hallucinations - you're there to keep the system grounded. Well the compiler is, then tests, then you. It works surprisingly well if you monitor the process and don't let LLM wander off when it gets confused.
Because humans also make stupid random mistakes, and if your test suite and defensive practices don't catch it, the only difference is the rate of errors.
It may be that you've done the risk management, and deemed the risk acceptable (accepting the risk, in risk management terms) with human developers and that vibecoding changes the maths.
But that is still an admission that your test suite has gaping holes. If that's been allowed to happen consciously, recorded in your risk register, and you all understand the consequences, that can be entirely fine.
But the problem then isn't reflecting a problem with vibe coding, but a risk management choice you made to paper over test suite holes with an assumed level of human dilligence.
> How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human...
Your claim here is that humans can't hallucinate something random. Clearly they can and do.
> ... that will logic through things and find the correct answer.
But humans do not find the correct answer 100% of the time.
The way that we address human fallibility is to create a system that does not accept the input of a single human as "truth". Even these systems only achieve "very high probability" but not 100% correctness. We can employ these same systems with AI.
Almost all current software engineering practices and projects rely on humans doing ongoing "informal" verification. The engineers' knowledge is integral part of it and using LLMs exposes this "vulnerability" (if you want to call it that). Making LLMs usable would require such a degree of formalization (of which integration and end-to-end tests are a part), that entire software categories would become unviable. Nobody would pay for an accounting suite that cost 10-20x more.
Which interestingly is the meat of this article. The key points aren’t that “vibe coding is bad” but that the design and experience of these tools is actively blinding and seductive in a way that impairs ability to judge effectiveness.
Basically, instead of developers developing, they've been half-elevated to the management class where they manage really dumb but really fast interns (LLM's).
But they dont get the management pay, and they are 100% responsible for the LLMs under them. Whereas real managers get paid more and can lay blame and fire people under them.
Humans who fail to do so find the list of tasks they’re allowed to do suddenly curtailed. I’m sure there is a degree of this with LLMs but the fanboys haven’t started admitting it yet.
> It’s shocking to me that people even ask this type of question. How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
I would like to work with the humans you describe who, implicitly from your description, don't hallucinate something random when they don't know the answer.
I mean, I only recently finished dealing with around 18 months of an entire customer service department full of people who couldn't comprehend that they'd put a non-existent postal address and the wrong person on the bills they were sending, and this was therefore their own fault the bills weren't getting paid, and that other people in their own team had already admitted this, apologised to me, promised they'd fixed it, while actually still continuing to send letters to the same non-existent address.
Don't get me wrong, I'm not saying AI is magic (at best it's just one more pair of eyes no matter how many models you use), but humans are also not magic.
Humans are accountable to each other. Humans can be shamed in a code review and reprimanded and threatened with consequences for sloppy work. Most,
humans once reprimanded , will not make the same kind of mistake twice.
> Humans can be shamed in a code review and reprimanded and threatened with consequences for sloppy work.
I had to not merely threaten to involve the Ombudsman, but actually involve the Ombudsman.
That was after I had already escalated several times and gotten as far as raising it with the Data Protection Officer of their parent company.
> Most, humans once reprimanded , will not make the same kind of mistake twice.
To quote myself:
other people in their own team had already admitted this, apologised to me, promised they'd fixed it, while actually still continuing to send letters to the same non-existent address.
> How do you not see the difference between a machine that will hallucinate something random if it doesn’t know the answer vs a human that will logic through things and find the correct answer.
I see this argument over and over agin when it comes to LLMs and vibe coding. I find it a laughable one having worked in software for 20 years. I am 100% certain the humans are just as capable if not better than LLMs at generating spaghetti code, bugs, and nonsensical errors.
It's shocking to me that people make this claim as if humans, especially in some legacy accounting system, would somehow be much better at (1) recognizing their mistakes, and (2) even when they don't, not fudge-fingering their implementation. Like the criticisms of agents are valid, but the incredulity that they will ever be used in production or high risk systems to me is just as incredible. Of course they will -- where is Opus 4.6 compared to Sonnet 4? We've hit an inflection point where replacing hand coding with an agent and interacting only via prompt is not only doable, highly skilled people are already routinely doing it. Companies are already _requiring_ that people do it. We will then hit an inflection point at some time soon where the incredulity at using agents even in the highest stakes application will age really really poorly. Let's see!
Your point is the speculative one, though. We know humans can and have built incredibly complex and reliable systems. We do not have the same level of proof for LLMs.
Claims like your should wait at least 2-3 years, if not 5.
That is also speculative. Well let's just wait and see :) but the writing is on the wall. If your criticism is where we're at _now_ and whether or not _today_ you should be vibe coding in highly complex systems I would say: why not? as long as you hold that code to the same standard as human written code, what is the problem? If you say "well reviews don't catch everything" ok but the same is true for humans. Yes large teams of people (and maybe smaller teams of highly skilled people) have built wonderfully complex systems far out of reach of today's coding agents. But your median programmer is not going to be able to do that.
Your comment is shocking to me. AI coding works. I have seen it with my own eyes last week and today.
I can therefore only assume that you have not coded with the latest models. If you experiences are with GPT 4o or earlier all you have only used the mini or light models, then I can totally understand where you’re coming from. Those models can do a lot, but they aren’t good enough to run on their own.
The latest models absolutely are I have seen it with my own eyes. Ai moves fast.