Why most AI projects die as a demo (and the five questions that predict it)
Most AI projects do not fail because the model is bad. They fail because the demo was never connected to the way work actually happens. The first version looks clever in a meeting, everyone nods, and then the project slowly disappears because nobody can use it on a messy Tuesday.
A demo is allowed to be narrow. It is allowed to use clean inputs. It is allowed to show the happy path. A business system is not. It has to survive missing fields, unclear instructions, impatient users, slow approvals, and the boring parts of the job nobody included in the prompt.
The danger is not a bad demo. The danger is mistaking a good demo for evidence that the work has changed.
Before you fund, buy, or build an AI project, ask five questions. They are simple, but they expose most hidden risk.
1. What job changes if this works?
Start with the job, not the tool. “Use AI for customer support” is too broad. “Draft the first response to refund requests so a human can approve or edit it in under two minutes” is a job. It names the user, the action, the handoff, and the speed target.
If nobody can describe the before and after of the job, the project is still an idea. A useful test is to ask someone on the team to draw the current workflow on a whiteboard. Then ask where the AI system enters, where it exits, and who is accountable when it is wrong.
- Weak answer: “It helps the team work faster.”
- Stronger answer: “It turns a raw support email into a proposed reply, refund decision, and policy citation for a human reviewer.”
- Best answer: “It removes the first 12 minutes of triage while keeping the final decision with the support lead.”
2. Where does the truth come from?
Every serious AI system needs a source of truth. That might be policy documents, product catalogs, ticket history, contracts, call notes, or a database. If the system is expected to know facts, it must know where to look and when not to guess.
The common failure is a beautiful chat box with no authority. It answers confidently because confident language is easy. But if it cannot point to the document, field, or record behind an answer, it becomes a rumor machine with a friendly interface.
Ask what information the system is allowed to use, how fresh it must be, who owns it, and what happens when two sources disagree. A legal policy updated last week should beat an old training example. A current inventory record should beat a product description. The rules matter before the model does.
3. Who checks the work?
Human review is not a sign of failure. It is how most useful systems earn trust. The right question is not whether a person is involved; it is where their judgment adds the most value.
In a healthy workflow, the AI system handles repeatable preparation and the human handles judgment, exceptions, and accountability. For a hypothetical finance team, the system might categorize expense notes and flag policy conflicts. The manager still approves the reimbursement. For a recruiting team, the system might summarize interview notes. The hiring decision stays with people.
Do not automate the decision before you have improved the evidence that supports the decision.
Write down the review path. Who sees the output? What can they change? What must they approve? What gets logged? If the answer is “the user will just know,” the project is not ready for real use.
4. What number decides whether it worked?
A demo dies when success means applause. A pilot survives when success means a number chosen in advance. The number should be boring: time per case, error rate, cost per completed task, backlog age, response time, rework rate.
Pick one main number and one guardrail. The main number says what should improve. The guardrail says what must not get worse. For example, “reduce time to draft a support reply” is incomplete unless you also track complaint reopen rate or policy mistakes.
- Main number: minutes of human work per request.
- Guardrail: percentage of replies that require correction for policy or tone.
- Stop condition: if corrections rise beyond the agreed limit, the pilot does not pass even if it saves time.
This keeps the team honest. AI that makes a bad answer faster is not productivity. It is faster cleanup.
5. What happens on day 31?
The last question is the one teams skip: if the pilot works, what must be true for it to keep running? Someone must own the workflow, the data access, the cost, the updates, the review rules, and the failure path. Without that owner, the project becomes a folder of promising screenshots.
Day 31 questions are practical. Who pays the software bill? Who updates the source documents? Who handles a user complaint? Who notices if quality drops? Who can turn the system off? If those answers are missing, you do not have an operating plan.
A simple pre-mortem
Before you approve the next AI project, run this checklist in a 45-minute meeting:
- Write the exact job that changes.
- Name the source of truth and the freshness rule.
- Draw the human review path.
- Choose one success number and one guardrail.
- Name the day-31 owner and operating routine.
If the team cannot answer these questions, do not cancel the project. Make it smaller. A narrow workflow with clear truth, clear review, and a clear number is worth more than a broad demo nobody trusts.
The best AI projects are rarely the flashiest. They are the ones that quietly remove friction from real work, prove it with honest measurement, and leave behind a system someone can own.