A few years into the LLM era, a strange thing has happened in the enterprise. Companies are spending more on AI than ever before. They have AI strategies, AI committees, AI roadmaps, AI town halls. Their slide decks are full of bots and copilots and agents. And underneath almost all of it, in the part nobody likes to talk about, there is very little actual AI engineering happening.
What there is, instead, is a lot of low-code.
Drag a box. Connect it to another box. Pick "summarise this" from a dropdown. Wire the output into an email. Demo it on Tuesday. Put it on the AI roadmap by Friday. Three weeks later, declare it a "deployed AI capability" in the quarterly board update.
I don't want to be misunderstood here. Low-code platforms are not bad. They're useful tools. The problem isn't the tools. The problem is that companies are mistaking using a tool for building a capability — and the gap between those two things is where most enterprise AI value is going to live or die over the next five years.
Three lies in one
The low-code AI story, as it gets told inside most companies right now, is actually three different self-deceptions stacked on top of each other.
The first is theater. Drag-and-drop AI builders, no-code agent platforms, citizen-developer suites, "AI Studio"-flavoured offerings inside every major SaaS — these tools let a company truthfully say it has deployed AI without doing any of the hard work that real AI deployment entails. There is a working flow. There is a screenshot. There is a metric, somewhere, that has gone up. Whether the system actually does what it claims, at the scale and reliability required, is a question for later. Theater is not the same as production. Theater photographs better.
I have watched this pattern play out repeatedly. A major financial institution rapidly stands up a customer-support chatbot on a low-code platform. The prototype demos beautifully. Early metrics look promising. Leadership celebrates. Then peak business hours hit, transaction volumes spike, the chatbot starts failing to resolve real requests, complaints escalate, and the team scrambles to put humans back in the loop. The platform did exactly what it was sold to do — it shipped quickly. What it could not do was carry production weight. That is the gap theater hides until it stops being able to.
The second is the fine starting point, terrible destination problem. Low-code AI is genuinely good for prototyping. It is genuinely good for letting a business analyst test an idea without filing a six-week ticket with the engineering team. It collapses the time between "what if?" and "let's see." That is real value. But the same properties that make it a good prototype tool — opacity, abstraction, vendor-managed plumbing — make it a bad destination. Companies that ride low-code from prototype straight into production end up with systems they cannot debug, cannot extend, cannot port, and cannot govern. The honeymoon ends the first time a real customer hits a real edge case at 2 a.m.
The third is the most expensive: fake progress that delays real investment. When the executive team can point at thirty-seven low-code AI workflows and call them an AI strategy, the pressure to actually build AI engineering capability evaporates. Why would you hire ML engineers, build data infrastructure, develop evaluation frameworks, and invest in observability when you've already "shipped AI"? The low-code dashboard becomes the alibi for not doing the harder work. By the time the company realises this, two years have passed, the workflows have become load-bearing, nobody owns them, and the people who could have built the real thing are now busy maintaining the appearance of one.
What "real AI development" actually looks like
To be concrete: when I say low-code is not enough, I'm not saying companies need to train their own foundation models. Almost no one needs to do that, and almost no one should. The bar is much lower than that, but it is real, and it is consistently being skipped.
AI development that scales — the kind that survives the move from cute pilot to load-bearing system — looks like this:
- A real data layer. Not a connector to a vendor's pre-baked context window. Pipelines you control, with provenance, freshness guarantees, and access policies that survive audit.
- An evaluation framework. Test sets, regression suites, performance baselines. The boring discipline that lets you say "version 2 is better than version 1" with evidence, not vibes.
- Observability and logging. Every input, every output, every cost, every latency, every error — captured, queryable, retained for as long as compliance requires. Not what your low-code platform shows you in its dashboard. What you can prove independently.
- Failure handling that is yours. Retries, fallbacks, degradation modes, timeouts, rate limit handling. Production-grade, not vendor-default.
- An integration layer staffed by humans who understand both sides — the model and the system it lives in. Not a vendor support contract.
- An exit strategy. If your AI vendor doubles their prices, gets acquired, or sunsets the product, what happens to you? Real engineering builds in optionality. Low-code, by design, removes it.
None of this is exotic. It's just software engineering, applied to AI. The thing low-code platforms quietly elide is that the engineering doesn't disappear when you move it behind a UI. It just gets owned by someone else, on terms you can't see, until the day it isn't.
How to spot the theater inside your own company
A few diagnostic questions that tend to expose the gap quickly:
- Can your team produce a log of every AI-generated output that was shown to a customer in the last 90 days, with the exact prompt and model version that produced it?
- If your AI vendor's pricing changed by 5x next quarter, what is your migration plan?
- What is the regression test suite for your most-used AI workflow? When was it last run?
- Who, by name, is responsible for the data quality of the inputs flowing into your AI features?
- What percentage of your AI's outputs in production have been independently evaluated for accuracy, not just "does it return a response"?
If the answers to these questions are mostly some version of "the platform handles that" or "we'd have to ask the vendor" or "we don't currently track that" — you don't have an AI capability. You have an AI subscription. They are not the same thing.
The reckoning that's coming
I think about this a lot, because the pattern is so familiar from previous cycles. Every wave of enterprise technology produces the same arc: euphoria, low-code shortcuts, premature declarations of victory, and then a quiet period of expensive remediation when the shortcuts hit their limits.
Cloud went through it. Mobile went through it. Big data went through it. AI is going through it now, faster than any of them, because the low-code abstractions are more seductive than ever — drag a box and get something that looks like cognition.
But the pattern isn't deterministic. Not every company that adopted cloud, mobile, or big data ended up in the remediation phase. In the early days of containerization, some organisations refused the easy "just buy the managed service" path and invested up front in real DevOps capability. They got the speed of the new tools and the long-term reliability of systems they actually understood. The same option exists with AI right now. The companies that take it will look indistinguishable from the rest for about eighteen months, then pull away.
The companies that come out of this period with real, durable AI capability won't be the ones with the most workflows in their AI Studio. They'll be the ones that treated low-code as what it is — useful for prototypes, dangerous as infrastructure — and quietly built the underlying engineering muscle while everyone else was making screenshots.
The lip service is loud right now. The reckoning will be louder.