AI Coders Are Brilliant, Broken, and About to Take Your Job

I now find myself handling a codebase of three projects with hundreds of thousands of lines that is bigger and more complex than the one I oversaw five years ago with a team of 25 developers. With one Apple app, Mirror Mirror, already live, another Apple app about to launch (details soon!), and a very complex system on the horizon (not quite brain science, but close). Two years ago, that would have been unthinkable, especially since I’m more of a tech-savvy executive than a hard-core developer. AI coders make it possible. This is unthinkably quick progress.

And yet, these tools are deeply, horribly flawed. They hallucinate code that doesn’t exist, rewrite things you never asked them to touch, forget what you just told them two minutes ago, and hit hard memory limits that make real-world projects feel like pushing an elephant through a drinking straw. They move with incredible speed, but that just means they can create a giant pile of broken code in seconds instead of hours. Left alone, they will happily engineer themselves straight into disaster.

Over time, I’ve worked with every major AI coding tool—Cursor, Claude Code, Codex—and every major model—GPT-4o/o3/5, Gemini, Grok, Claude. Each comes with its own strengths and weaknesses.

This post is not a model-by-model comparison nor a look at any individual tool. Instead, it’s about the general tendencies of AI coders: their good, their bad, and their ugly.

TLDR: AI for coding is like AI everywhere else. It is fast, generally accurate, and extremely useful, but NOT something you can put on autopilot without risking a crash of the project and your company. It has to be guided and watched closely. Companies that ignore AI coding will be left in the dust. Companies that rely too heavily on it risk ending up with an over-engineered doom loop.

The real promise and danger comes when you use multiple AI tools together. Usually, they catch each other’s flaws. But occasionally, they amplify each other’s worst mistakes.

General Lessons:

AI Coding wants to do things its way, and it will be difficult to make it do otherwise and almost impossible to make the change stick: I liken AI to an old vinyl record, you know, the ones with the grooves. These grooves are the AI’s training, the way it wants to do things. If what you want to do fits the AI’s groove your experience will be magical: quick, correct and, melodious. If what you want to do doesn’t match a groove, you will have a tough time getting AI to do it at all, and if you somehow get it to work out of its groove then any chance it gets it’ll pop back into the groove it’s familiar with.

For instance, in one project I was dealing with colors. In computers colors are often described by the amount of red, green and blue they have in them also called RGB. Many visual imaging tools, however, internally use a description that’s the exact opposite: blue, green, red. While 255,0,0 is bright red in RGB it’s pure Blue in BGR. The exact same value has a very different meaning. No matter how much I prompted, the AI coder could not understand that I wanted it to use BGR format and insisted on coding for RGB (worse it was inconsistent! Sometimes it did one, sometimes the other). Eventually, I gave up trying to fight the AI code and had the AI coder use RGB for almost everything and switch right before calling the image processing. Since the AI insisted on going into the groove, I let it and haven’t had a hint of trouble with it since.

AI is not infallible: We’ve grown to think of computers as not making mistakes. To be successful with AI, we must change this mindset. AI is only as good as a mid-level coder and expecting perfection is asking for disappointment. Of course, AI is so quick that while the percent of flawed code written might be the same as with a mid-level coder, the sheer volume of code written will mean many more actual bugs created.

It’ll take multiple iterations: AI Coders are as fallible as human coders, but much, much faster. If your code is even moderately complex, don’t expect them to get everything right on the first try. For anything except minimal changes, it’s not unusual for me to take a dozen, or even fifty or more (!!!) tries with the AI Coder before I get the result I want. Each attempt must be carefully looked at with feedback given to the AI coder. Often, I will also have other AI models/coders evaluate the change.

I feel like the AI Coder is like a slot machine in Las Vegas. I pull the lever and hope it comes up with a winner. Often the results look good at first, only to be spoiled as the last cherry doesn’t appear and instead of the jackpot, I get nothing. Occasionally, it hits the jackpot, the code works and I rejoice praising this amazing technology. Of course, unlike the slot machines, AI Coding usually gets better over time as you give it feedback and guide it toward the goal, but many times I see the code being built and think it’ll be a great result, only to be deeply disappointed.

It isn’t as fast as you think: While AI coders handle simple tasks quickly they can take anywhere from five to 45 minutes to handle a single iteration of more complex tasks. Given that it often takes multiple iterations to get it correct a moderate level change can take a few DAYS to implement.

For instance, right now I’m working on a change where I had to add several layers to my data structure. In some ways a simple change, but it has to be propagated through the entire code base. The AI coder proudly told me after about 20 minutes that it had successfully updated every place in the system. Three days later I’m STILL finding spots that need to be changed and tests that are failing. This is what I expected. Of course, had I made this change without an AI coder I wouldn’t even be finished with the initial pass by now, let alone have shown that most pieces of the code are working great. In short, it is faster than coding it yourself, but not as fast as you might assume.

It will tell you everything is great when it isn’t: I like to compare AI to Management Consultants, they will always give you an opinion, sound very, very confident about their results and give you an encyclopedia of reasons why their opinion is right, only to be proved totally wrong by reality. So, when the AI Coder tells you everything is “Working great”, it should carry the same weight as when you ask your coworker “How goes?” A positive answer tells you nothing, a negative answer is cause for concern!

It is highly suggestible: AI coders, like all AI, really just want to be liked. As such, if you give it a suggestion on how to do things, even a bad suggestion, it usually will take it. So, if you say “I think we might be better off deleting all the user data and starting over” the system is likely to agree with you even though your users will surely not think this is such a great idea.

In one recent example, I told the AI coder it had to figure out why a certain value wasn’t $5,763 and I didn’t want it stopping until it got that number. When I came back an hour later it jubilantly told me it had figured out how to get the number I wanted! Instead of figuring out the root cause, what it had actually done was hard code the number so it would always appear!

One helpful tool with this is to frame your ideas as hypothesis in which the goal is for the AI to help you decide if it’s a good idea or not. This is so powerful, I’d suggest it be a general practice to ask the AI to give you pros and cons on an idea so you together can think everything through before any code is changed.

It isn’t great at coding for AI: Ironically, one thing AI isn’t great at is coding AI calls. You’ll see this everywhere from it using old APIs to call AI (sometimes depreciated ones that don’t even work) to allowing huge datasets to go into AI calls. Creating good prompts is a particular weak point of AIs.

Moreover, AI coders tend to opt for non-AI solutions. In one case I was creating an Intent Identifier, to determine what a user wanted related to do with their legal contracts and then put the intent into a json structure the rest of the code could understand. This seemed straightforward to me, feed an LLM the json structure, meaning and some examples. I kicked off the AI coder and went to a long lunch. The Ai coder built the system and tested it with a bunch of examples I fed it. When I came back a few hours later, the AI Coder, instead of adding examples for the LLM to understand how to build the json better, had built a complex series of if/then statements and text parsing to try to determine intent. This old school approach was fragile and difficult to maintain. I tried repeatedly to get the AI Coder to think “AI first” in building this, but it repeatedly would fall back on an old fashioned, deterministic approach.

It loves to tinker with things for no reason: I learned the hard way not to even give AI access to my configuration files (yaml) or AI prompts. When it’s in those files for other reasons it will often make ghost edits to simplify prompts or, more mystifyingly, rename variables in configuration files. More than once, I’ve slaved away on a prompt for hours, finding just the right wording, only to have the AI come through and decide it would “help” by simplifying the prompt. The simplified prompt was never as good.

Feeding the results of one AI into another can be particularly helpful: Having one AI check another can identify most issues with new code quickly. My go-to at the moment is to use Claude Code using Opus 4.1 for the first pass and then feed it into GPT-5 (previous o3) Pro for review and modification. Then I look over all the results to make sure there isn’t anything that looks very problematic.

Fallbacks: All the AI’s I deal with LOVE fallbacks, which is to say if a piece of code fails how to continue on without throwing an error. For instance, let’s say your program is calling Google to get a list of websites. If that search fails, the fallback might be to try Bing instead.

This can be great, as it handles routine issues without the entire system failing, but it also can mask problems. For instance, let’s say your API key for Google has expired so the call can not work, it will always go to Bing. I either tell my AI coders not to use fallback (although it will still try often to add them as per Ghost Edits above) or make sure there’s a very clear warning in the logs and that logs are looked at. More than half the difficult to find issues I have are because the AI added some fallback which masked the real issue.

The Bad:

The bad for AI coding is the same bad as for all AI, what I call the four horsemen of the AIpocalypse.

Hallucinations – AI coders are programmed to always provide an answer, even if they don’t know the right one. This leads to them fabricating details. In practice, I see this almost every time I use them: variable names will suddenly change mid-code (max_intensity becomes intensity_max or maximum_intensity) or the system will confidently produce libraries and functions that don’t even exist. Sometimes the AI will catch itself in the next response, sometimes it won’t, and often you’ll only discover the hallucination after debugging hours later. Just as dangerous, the AI explains these hallucinations with great confidence, sounding authoritative even when it’s dead wrong.

Ghost Editing – AI coders just cannot resist tinkering. They’ll silently rewrite variable names, configuration files, or even entire prompts you didn’t ask them to touch. For example, you might ask them to optimize one function and later find subtle, unauthorized changes scattered throughout unrelated files. At its worst, it can undo hours of careful prompt design or rewrite code into a style you specifically didn’t want. This means you often need to lock down files, or only allow the AI to suggest edits instead of directly making them.

Catastrophic Forgetting – Like an uncle with Alzheimer’s, AI coders often just lose the thread. In longer sessions, they’ll abruptly forget all context. One moment they’re building a complex monitoring tool, the next they’re giving you boilerplate analytics code with no connection to the prior discussion. You’re forced to re-teach them everything just to get back on track. I can’t help but be a bit pissed off by this. Supposedly, AI’s great strength is remembering all the little details and evolving with you through the coding process. Great most of the time, but right now, that strength regularly collapses.

Token Limits – Every AI has a hard ceiling on how much information it can remember at once. While context windows have expanded from a few thousand tokens to hundreds of thousands, they’re still nowhere near what real-world projects demand. A medium-sized dataset or codebase can blow past limits instantly. The workaround is to carefully curate what the AI sees—splitting tasks into smaller chunks, feeding summaries, or using external vector stores. But this undercuts AI’s promise: instead of AI figuring out what’s important, you’re forced to do that heavy lifting yourself.

Conclusion:

AI Coders are so powerful that any company who does any coding and isn’t using them will quickly find themselves outcompeted. On the other hand, those who think they can get rid of all of their programmers and use AI are in for an even worse fate as the AI needs adult supervision and strategic guidance to be effective.

Given competent guidance, however, AI coding can fuel your company to well over 10X performance. Like anything else, humans learning over time how best to handle their tools is essential.

Lowry On Leadership