Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I suspect many people here have tried it, but they expected it to one-shot any prompt, and when it didn't, it confirmed what they wanted to be true and they responded with "hah, see?" and then washed their hands of it.

So it's not that they're too stupid. There are various motivations for this: clinging on to familiarity, resistance to what feels like yet another tool, anti-AI koolaid, earnestly underwhelmed but don't understand how much better it can be, reacting to what they perceive to be incessant cheerleading, etc.

It's kind of like anti-Javascript posts on HN 10+ years ago. These people weren't too stupid to understand how you could steelman Node.js, they just weren't curious enough to ask, and maybe it turned out they hadn't even used Javascript since "DHTML" was a term except to do $(".box").toggle().

I wish there were more curiosity on HN.

 help



So what do I do differently then?

Hypothetically, you have a simple slice out of bounds error because a function is getting an empty string so it does something like: `""[5]`.

Opus will add a bunch of length & nil checks to "fix" this, but the actual issue is the string should never be empty. The nil checks are just papering over a deeper issue, like you probably need a schema level check for minimum string length.

At that point do you just tell it like "no delete all that, the string should never be empty" and let it figure that out, or do I basically need to pseudo code "add a check for empty strings to this file on line 145", or do I just YOLO and know the issue is gone now so it is no longer my problem?

My bigger point is how does an LLM know that this seemingly small problem is indicative of some larger failure, like lets say this string is a `user.username` which means users can set their name to empty which means an entire migration is probably necessary. All the AI is going to do is smoosh the error messages and kick the can.


1. I'm working in Rust, so it's a very safe and low-defect language. I suspect that has a tremendous amount to do with my successes. "nulls" (Option<T>) and "errors" (Result<T,E>) must be handled, and the AST encodes a tremendous amount about the state, flow, and how to deal with things. I do not feel as comfortable with Claude Code's TypeScript and React outputs - they do work, but it can be much more imprecise. And I only trust it with greenfield Python, editing existing Python code has been sloppy. The Rust experience is downright magical.

2. I architecturally describe every change I want made. I don't leave it up to the LLM to guess. My prompts might be overkill, but they result in 70-80ish% correctness in one shot. (I haven't measured this, and I'm actually curious.) I'll paste in file paths, method names, struct definitions and ask Claude for concrete changes. I'll expand "plumb foo field through the query and API layers" into as much detail as necessary. My prompts can be several paragraphs in length.

3. I don't attempt an entire change set or PR with a single prompt. I work iteratively as I would naturally work, just at a higher level and with greater and broader scope. You get a sense of what granularity and scope Claude can be effective at after a while.

You can't one shot stuff. You have to work iteratively. A single PR might be multiple round trips of incremental change. It's like being a "film director" or "pair programmer" writing code. I have exacting specifications and directions.

The power is in how fast these changes can be made and how closely they map to your expectations. And also in how little it drains your energy and focus.

This also gives me a chance to code review at every change, which means by the time I review the final PR, I've read the change set multiple times.


I hope you're not 100% serious.

Otherwise you should switch to haskal since it makes logic errors and bugs mathematically impossible.


I have encountered the exact same kind of frustration, and no amount of prompting seems to prevent it from "randomly" happening.

`the error is on line #145 fix it with XYZ and add a check that no string should ever be blank`

It's the randomness that is frustrating, and that the fix would be quicker to manually input that drives me crazy. I fear that all the "rules" I add to claude.md is wasting my available tokens it won't have enough room to process my request.


Yup, this is why i firmly believe true productivity, as in, it aiding you to make you faster, is limited by the speed of review.

I think Claude makes me faster, but the struggle is always centered around retaining own context and reviewing code fully. Reviewing code fully to make sure it’s correct and the way I want it, retaining my own context to speed up reviews and not get lost.

I firmly believe people who are seeing massive gains are simply ignoring x% lines of code. There’s an argument to be made for that being acceptable, but it’s a risk analysis problem currently. Not one I subscribe to.


Use planning+execution rather than one-shotting, it'll let you push back on stuff like this. I recommend brainstorming everything with https://github.com/obra/superpowers, at least to start with.

Then work on making sure the LLM has all the info it needs. In this example it sounds like perhaps your hypothetical data model would need to be better typed and/or documented.

But yeah as of today it won't pick up on smells as you do, at least not without extra skills/prompting. You'll find that comforting or annoying depending on where you stand...


Always start an implementation in Claude Code plan mode. It's much more comprehensive than going straight to impl. I never read their prompt for plan mode before, but it deep-dives the code, peripheral files, callsites, documentation, existing tests, etc.

You get a better solution but also a plan file that you can review. And, also important, have another agent review. I've found that Codex is really good at reviewing plans.

I have an AGENTS.md prompt that explains that plan file review involves ranking the top findings by severity, explaining the impact, and recommending a fix to each one. And finally recommend a simpler directional pivot if one exists for the plan.

So, start the plan in Claude Code, type "Review this plan: <path>" in Codex (or another Claude Code agent), and cycle the findings back into Claude Code to refine the plan. When the plan is updated, write "Plan updated" to the reviewer agent.

You should get much better results with this capable of much better arch-level changes rather than narrow topical solutions.

If that's still not working sufficiently for you, maybe you could use more support, like a type-system and more goals in AGENTS.md?


IMO, plan mode is pretty useless. For bug fixes and small improvements, I already know where to edit (and can do it quickly with vim-fu).

For new features, I spend a bit of time thinking, and I can usually break it down in smaller tasks that are easy to code and verify. No need to wrangle with Plan mode and a big markdown file.

I can usually get things one-shotted by that point if I bother with the agent.


My manager and I have been experimenting with it for some stuff, and our most recent attempt at using plan mode was a refactor to change a data structure and make some conversion code unnecessary, then delete it. The plan looked fine, but after it ran the data structure change was incomplete, most of the conversion code was still there, and it introduced several bugs by changing lines it shouldn't have touched at all. Also removed several "why" style comments and arbitrarily changed variable names to be less clear in code it otherwise didn't change.

This was the costliest one we had access to, chosen as an experiment - took $20 over almost a half hour to run.


Did you do the plan review cycles like I suggested? It's a critical point.

Plan mode gives you a plan file, then you refine that, and impl derives from it.

Also, do you know it cost $20 because you're using the Claude API? I'd definitely use a subscription for interactive/development use.


We reviewed the plan manually, asked it a few questions to clarify parts, and manually tweaked other parts.

I didn't catch what it was, some web dashboard that showed the cost per prompt. We could see it going up as it ran. We were just using the plan our company provided.


Not the person you're replying to but yes, sometimes I do tell the agent to remove the cruft. Then I back up a few messages in the context and reword my request. Instead of just saying "fix this crash", or whatever, I say "this is crashing because the string is empty, however it shouldn't be empty, figure out why it's empty". And I might have it add some tests to ensure that whatever code is not returning/passing along empty strings.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: