The Seven-Factor App for AI Agents
7 production patterns that turn an Agentic MVP into a governable system.
The AI agent demo usually works.
That is what makes it dangerous.
Not because the demo is fake. Not because the builders are careless. Not because the model is useless.
The demo works because the environment is still small enough to behave.
One user. One workflow. One clean objective. One narrow set of tools. One engineer watching the logs. One person who knows how to restart the thing when it gets stuck.
That is not a criticism.
That is exactly what an MVP is supposed to do.
An Agentic MVP proves that a workflow can be partially delegated to an AI system. It proves the agent can reason over context, call tools, produce useful output, and reduce human effort in a controlled path.
That is valuable.
But it is not production.
Production asks a different question.
Not:
Can the agent complete the task once?
But:
Can the agent complete the task safely, repeatedly, cheaply, observably, and under clear ownership when the environment stops cooperating?
That is the gap between an Agentic MVP and a production agent.
An MVP proves the workflow is possible.
Production proves the workflow is governable.
And governability is where most agent projects get exposed.
The Demo Is Not the System
The mistake is not building Agentic MVPs.
MVPs are useful. They help teams discover where agents create leverage. They reveal which workflows are repetitive, context-heavy, and worth redesigning. They show where tools, data, and reasoning can combine into something useful.
The problem starts when the MVP becomes too convincing.
The agent answers correctly.
The tool call succeeds.
The workflow completes.
The stakeholder sees the magic.
Everyone says:
This is ready to scale.
But what actually worked was a thin slice of the real system.
The MVP did not prove that the agent can handle stale data. It did not prove that it can manage retries. It did not prove that it can control cost. It did not prove that memory is safe. It did not prove that approval boundaries are clear. It did not prove that incident response exists.
It proved one thing:
The workflow has potential.
That is enough for an MVP.
It is not enough for production.
A demo can succeed because the environment is controlled. Production is different because the environment fights back.
Users ask unclear things. Tools return incomplete data. Permissions get complicated. Memory becomes stale. Costs grow quietly. Approvals get delayed. Owners are unavailable. The agent retries when it should stop.
That is when the real architecture shows up.
Or fails to.
The Production Layer
A production agent needs more than a better prompt.
It needs a production layer around it.
That layer has seven parts: tool authority, memory governance, trajectory evaluation, autonomy budgets, trace fabric, named ownership, and runtime shutdown.
Those may sound like separate controls, but they are really one idea:
The agent needs boundaries.
It needs boundaries around what it can do, what it can remember, how long it can keep working, what it can spend, what it must record, who owns the outcome, and how it can be stopped.
Without those boundaries, an agent is not really operating in production.
It is operating on trust.
That may be fine for a demo. It is not enough for an enterprise system.
Tool access is the easiest place to see the problem.
In an MVP, the agent often gets broad access because the team wants to prove momentum. Let it search docs. Let it read tickets. Let it create tasks. Let it post messages. Let it update records.
That works until you realize tool access is not just access.
It is authority.
If the agent can update a customer record, it has customer-data authority. If it can trigger a deployment, it has operational authority. If it can send an external message, it has communication authority. If it can close tickets, it has workflow authority.
So the production question is not:
What tools does the agent need?
The production question is:
What authority should this agent be allowed to exercise, under which conditions, and with what evidence?
That shift matters.
A prompt can tell the agent not to do something unsafe. A production system should make unsafe actions impossible unless the right policy, approval, and trace exist.
The same is true for memory.
In an MVP, memory feels like a feature. The agent remembers context. It personalizes the workflow. It gets better over time.
But in production, memory is not just context.
Memory is stored influence.
It shapes future behavior. It can preserve wrong assumptions. It can retain sensitive information. It can become stale. It can leak across users, workflows, or agents.
An agent should not simply remember because remembering is useful.
It should remember under policy.
Some context should last only for the current task. Some can last for the current workflow. Some may be safe to store longer. Some should never be stored at all.
Bad memory is not only an accuracy problem.
It is a governance problem.
Then comes evaluation.
Most Agentic MVPs are evaluated by vibes. The demo looked good. The stakeholder liked it. The output was mostly right. The team tried a few examples and the agent seemed useful.
That is fine for discovery.
Production needs a different standard.
For agents, the path matters as much as the output.
An agent can produce a good final answer through a bad trajectory. It can call the wrong tools, retrieve sensitive context, spend too much, skip approvals, retry unnecessarily, and still generate something that looks correct.
That is why production evaluation cannot only grade the final response.
It has to test behavior.
Did the agent choose the right tool? Did it retrieve the right context? Did it stop at the right time? Did it escalate when required? Did it stay inside budget? Did it leave the right audit trail?
This is where autonomy budgets become important.
Agents do not only answer. They loop. They retry. They retrieve. They summarize. They call tools. They recover from failure. They keep going.
And “keeps going” is one of the most expensive failure modes in agentic systems.
A production agent needs limits before it starts: how many tool calls, how many retries, how much time, how much spend, how much scope, and when to escalate to a human.
Without those limits, persistence becomes risk.
Observability is the next gap.
Logs are not enough.
For agents, the system needs to preserve the path from request to action. What was the agent asked to do? What context did it use? What memory did it read or write? Which tool did it call? What policy allowed the call? What did the tool return? What changed downstream? Who approved the action?
That is trace fabric.
Without it, debugging becomes archaeology.
Someone checks logs. Someone checks prompts. Someone checks tool responses. Someone searches Slack. Someone tries to reconstruct the run from screenshots and memory.
That is not observability.
That is a postmortem scavenger hunt.
And even with good traces, the agent still needs an owner.
The agent cannot be accountable. The framework cannot be accountable. The model cannot be accountable. The vendor cannot be the final owner of your business outcome.
Someone inside the organization has to own the mission, the authority, the cost, the memory policy, the approvals, the incident response, and the business judgment.
This is where many production conversations get uncomfortable.
Because the agent crosses boundaries.
Product owns the experience. Engineering owns the service. Platform owns the runtime. Security owns permissions. Data owns the source. Compliance owns regulated boundaries. Business owns the process.
Then the agent acts across all of them.
So who owns the outcome?
Until that answer is named, the system is not production-ready.
Finally, a production agent needs a way to stop safely.
MVP shutdown is simple. Turn off the script. Stop the container. Disable the demo. Tell the engineer to pause it.
Production shutdown is harder.
The agent may have active runs, queued work, memory writes, tool credentials, downstream actions, delegated tasks, approvals in progress, and other systems depending on its output.
So the question is not:
Can we turn it off?
The question is:
Can we stop it safely?
Agents fail differently from traditional software.
Traditional software often fails by stopping.
Agents can fail by continuing.
That is why runtime shutdown has to be designed before production, not invented during an incident.
The Real Production Test
The best way to test an Agentic MVP is not to ask whether the demo worked.
The demo is supposed to work.
The better question is:
What happens when the environment stops cooperating?
What happens when the tool returns stale data?
What happens when memory conflicts with fresh context?
What happens when the user asks for something just outside policy?
What happens when the agent hits its budget?
What happens when approval is delayed?
What happens when the agent loops?
What happens when the owner is unavailable?
What happens when we need to stop it immediately?
If the answer is “the engineer will watch it,” the system is still an MVP.
If the answer is “the prompt tells it not to do that,” the system is still an MVP.
If the answer is “we will handle that manually,” the system is still an MVP.
That is fine.
Just do not call it production.
The New Rule
An Agentic MVP proves the workflow is possible.
A production agent proves the workflow is governable.
That is the shift.
Not from simple to complex.
From demo path to operating model.
From prompt to policy.
From tool access to authority.
From memory to governance.
From output checks to behavior testing.
From usage to autonomy budgets.
From logs to trace fabric.
From team ownership to named accountability.
From shutdown hope to runtime control.
The teams that win will not be the ones that build the flashiest agent demos.
They will be the ones that learn how to turn useful demos into governed systems.
Because production is not just where the agent works.
Production is where the agent has to keep working when the workflow is messy, the data is imperfect, the user is ambiguous, the system is under pressure, and nobody is watching every step.
Agentic MVPs are not the problem.
They are the beginning.
The mistake is treating the beginning like the architecture.


