This blog is brought to you thanks to our very own Liam Hunt, eyeDP’s Technical Product Owner leading the development of our data extraction and configurable rule set decisioning. Liam has 10 years of experience work in leading RegTech providers.
Over the past few weeks, several headlines have thrown cold water on the generative AI hype. A new MIT-sponsored study reported that about 95 % of corporate generative AI pilots never scale or deliver meaningful financial impact. A companion Forbes article attributes a key reason being that many organisations avoid friction. Or in other words, they prioritise a seamless “user experience” over necessary checks, controls, and governance and that becomes the undoing of their pilots.
That may sound alarming or discouraging. But it’s not a verdict against AI; it’s a warning signal. The real lesson is that success in enterprise AI depends less on model architecture or algorithmic prowess, and more on how well an organisation handles integration, risk, governance, and scale.
Why 95 % of GenAI Pilots Fail: Lessons from MIT & Forbes
This is what the research and commentary tell us about where most AI pilots go wrong and why they rarely become lasting assets.
-
The “Learning Gap” and poor enterprise integration
The MIT report describes a “learning gap” where the failure to connect AI tools meaningfully into workflows, systems, and decision loops, is a central reason pilots stall. In many organisations, generative models are tested in isolation or sandbox environments, but real-world business operations (messy data, systems, exceptions) are far more complex.
Generic tools like ChatGPT or off-the-shelf models may work beautifully for individual tasks or prototypes, but they don’t inherently adapt to the constraints, logic, and guardrails of enterprise systems.
-
Avoiding friction is hiding critical guardrails
The Forbes article bluntly argues that many firms interpret friction as failure. They remove validation checks, override human supervision, or bypass safety measures in pursuit of smoothness. But that “friction” often represents critical decision gates; points where a model’s uncertainty, edge cases, or regulatory risk require human oversight, explanation, or rollback.
By removing those, you increase the risk of errors, hallucinations, ungoverned bias, or misalignment. All of which can undermine trust and force rollback.
-
Misaligned focus with visibility over impact
Many organisations allocate AI budgets to sales, marketing, or superficial user-facing features (often because they seem more visible or “sexy”), but MIT found that the highest ROI often sits in internal back-office functions like process automation, cost reduction, compliance, or fraud detection.
When pilots are built around high-gloss, low-impact use cases, scaling becomes harder, internal champions fade, and results become harder to justify.
-
DIY vs. partnering pitfalls
The MIT study suggests that organisations that buy or partner with specialist vendors or solutions tend to see twice the success rate (with numerous media articles suggesting this is around ~67 %), compared to internally built AI systems (which succeed much less often). Internal builds often stall due to missing expertise, over-ambition, or lack of precedents and reuse.
-
Drift, monitoring, maintenance & governance neglect
Even a well-performing pilot will degrade over time if you don’t monitor for model drift, feedback loops, bias, or changing data distributions. Many pilots don’t budget or plan for continuous monitoring, auditing, or maintenance, so performance slides without detection.
Also, governance (audit trails, explainability, human override, compliance checks) is often treated as an afterthought, not as a foundation.
-
Cultural, organizational & stakeholder resistance
Technology alone doesn’t change habits or incentives. Pilots often fail because business users don’t trust the model, compliance teams balk at opaque decisions, or because there’s no clear accountability or ownership. Without strong cross-functional alignment (ops, risk, legal, compliance, IT), AI projects often get orphaned or deprioritised.
The 95 % failure rate isn’t a technical indictment of generative AI. It’s a reflection of how companies approach integration, risk, scaling, and governance. The 5 % that succeed typically do the hard work of embedding AI safely into their core operations.

How to Avoid the Pitfalls: An Approach to Sustainable AI
Building architecture, process, and organization around exactly the failure modes above. Here’s how to make AI real, reliable, and scalable especially in regulated domains like KYC, AML, document verification, and fraud detection:
-
Friction where it matters, not friction everywhere
Strategic friction is a feature, not a bug. At decision points (low confidence predictions, ambiguous outputs, suspicious patterns), route to human review, demand explanation, or surface audit triggers. That ensures that edge cases don’t become silent errors.
In practice, that means you define confidence thresholds, review escalation logic, and fallback mechanisms upfront. Clients never have to retro-engineer governance later.
-
Domain-first models + layered rules & logic
Rather than using a general-purpose model and hoping it works, build (or fine-tune) models specifically for document verification, fraud detection, identity matching, and compliance logic. Combine that with deterministic rules, red-flag heuristics, regulatory constraints, and contextual logic so the system isn’t purely probabilistic.
The result is higher precision, fewer false positives/negatives, and more predictable behaviour.
-
Deep integration & low-friction adoption
Don’t treat solutions as a bolt-on tool but embed them. We provide APIs, low-code connectors, plugins, and ready integrations with compliance systems, workflow engines, CRM, onboarding flows, etc. This allows adoption without reworking entire stacks.
Because adoption is easier, friction is lower for users and operations, and value is realized faster.
-
Observability, monitoring & feedback loops
From day one, bake in:
- Logging and audit trails (who saw what, when, decision paths) which is essential for compliance and trust
- Confidence metrics, error tracking, anomaly detection so drift or degradation is easy to spot
- Retraining triggers and feedback pipelines so model updates can respond to evolving data
- Dashboards and alerts so operations, compliance, and engineering can stay aligned
This continuous loop ensures the system adapts rather than decays.
-
Privacy, security & regulatory compliance by design
Working in KYC, AML, fraud, and identity means regulation isn’t optional so embed:
- End-to-end encryption
- Access controls, role-based permissions
- Data anonymisation / pseudonymisation where needed
- Compliance with GDPR, ISO 27001 (or equivalent), data residency constraints
- Explainability and traceability to support audits and regulatory scrutiny
Don’t treat compliance as an afterthought but as a core architectural constraint.
-
Iterative scaling, strong governance & stakeholder alignment
Rather than launching a monolithic AI revolution:
- Begin with high-impact, well-bounded use cases (e.g. document fraud detection)
- Use pilot results and metrics to build trust and internal buy-in
- Involve business, compliance, legal, risk, and operations teams early
- Define clear KPIs (reduction of false positives, time savings, cost per decision, error rates)
- Assign ownership, decision rights, and governance structures
- Expand incrementally (more document types, edge cases, geographies, complexity)
That prevents overreach, resistance, or failure from scaling too fast.
Final thoughts
If you’re exploring AI pilots (especially for document processing, KYC, AML, fraud), here’s how you can avoid the common traps and accelerate real adoption:
- Risk-aware pilot scoping
Don’t build grandiose proofs-of-concept with flimsy foundations. Instead, co-design a pilot scope that balances ambition, safety, and measurability. Governance, human review, and fallback are built into the pilot from day one.
- Plug-and-play integration
You don’t need a full rewrite of your tech stack. Connect via APIs, plugins, low-code adapters, or standard integration routes so you can adopt quickly and with confidence.
- Transparent decisioning and auditability
Ensure visibility into every decision path: model confidence, rules applied, fallback triggers, human reactions. That builds trust with risk, compliance, and business teams.
- Ongoing support, monitoring & adjustment
Launch isn’t the end and monitoring, drift detection, retraining, alerting, anomaly detection, and continuous tuning as your data and environment evolve.
- Governance framework & stakeholder onboarding
Set up governance bodies (AI review board, audit committees, compliance oversight), define roles, workflows, approval thresholds, and stakeholder engagement. Train your teams so AI becomes part of how you work, not a black box.
- Scale with confidence
Once you’ve proven value in one domain, you can replicate to new document types, geographies, fraud patterns, or adjacent workflows, without needing a fresh build from scratch.
The 95 % failure rate is a sobering statistic but it’s not a permanent fate. It’s a market signal that most AI projects are under-designed. They skip the messy work of integration, governance, monitoring, and organisational alignment.
At eyeDP, we’ve built our systems, processes, and partnerships precisely to stay on the right side of that statistic.