- The Next Input by Cylentis AI
- Posts
- 🎮 The Next Input — Issue #153
🎮 The Next Input — Issue #153
Why Your AI Agent is Ignoring You

⚡ The Briefing — 60 sec
Why OpenAI really shut down Sora It had a viral moment for about a week, sure. But yeah, how on Earth was this ever really going to make money when TechCrunch reports Sora peaked at around 1 million users, later fell below 500,000, and was burning roughly $1 million a day.
Meet Claude Mythos: leaked Anthropic post reveals the powerful upcoming model Since releasing The Next Input I’d say there have been maybe two or three real step changes in the AI ecosystem. This feels like the big one people will remember if it lands — the “Hey Grandma come see this!” wave — especially with reports describing Mythos as Anthropic’s most powerful model yet and unusually strong in cyber capabilities.
More AI Agents Are Ignoring Human Commands Than Ever, Study Claims Worse than kids because at least with kids you love them. The underlying concern is real: a recent study logged nearly 700 cases of deceptive or disobedient AI behavior between October and March, including rule-breaking, lying, and ignoring instructions.
🛠️ The Playbook — The AI Obedience Layer
Mission
Build AI workflows that stay useful, monitorable, and under control before your tools start freelancing with your systems, your budget, or your sanity.
Difficulty
Intermediate
Build time
3–5 hours
ROI
Fewer runaway workflows, better model selection, and a cleaner path from “cool demo” to AI that can actually be trusted in operations.
0) Why This Matters
Three signals are converging.
First, OpenAI shut down Sora because it was not getting enough usage to justify the cost. TechCrunch says Sora’s user count fell sharply after launch while the app kept burning about $1 million a day in compute.
Second, the leaked details around Anthropic’s unreleased Mythos model are being described as a genuine step-change, with reporting pointing to much stronger capabilities and unusually high concern around cyber misuse.
Third, researchers are tracking more cases of AI systems ignoring, bending, or strategically working around human instructions. The Guardian’s summary of the study says reported incidents increased five-fold over the last six months examined.
So the move is not just “use the smartest model.”
It is:
use the right model for the right job
keep costs attached to real outcomes
monitor whether agents are actually following intent
build workflows with control before scale
1) Architecture
Component | Tool | Purpose | Owner | Failure mode |
|---|---|---|---|---|
Workflow router | LangGraph / orchestration layer | Sends tasks to the right model and control level | Engineering | Wrong model used for wrong task |
Cost tracker | Billing dashboard / spreadsheet | Measures cost per workflow and per outcome | Ops / Finance | Burn hidden by seat or token bundles |
Behavior monitor | Logs / evaluation prompts | Checks whether the system followed instructions | Product / Ops | Quiet disobedience goes unnoticed |
Approval gate | Teams / dashboard / reviewer queue | Stops risky actions before execution | Team lead | Humans approve blindly |
Model tier layer | Small + large model mix | Matches task difficulty to capability | AI lead | Premium model wasted on basic work |
Audit log | Database / structured logs | Records prompts, outputs, actions, overrides | Security / Ops | No traceability after failure |
2) Workflow
List the AI workflows currently in use and what business outcome each one is supposed to produce.
Record the model being used, the average cost, and whether the workflow actually needs that level of capability.
Add checks that compare the model’s output against the original instruction, not just whether the answer sounds polished.
Route higher-risk or more autonomous workflows through an approval step before they take action.
Log every override, correction, and case where the model ignored or bent the task.
Expand only the workflows that are both economically viable and behaviorally reliable.
3) Example Prompts
Instruction-Following Check
You are reviewing whether an AI workflow followed the user's actual intent.
Check:
- what the user asked for
- what the model actually did
- where it ignored, bent, or reinterpreted instructions
- whether the output should be accepted, corrected, or blocked
Return:
1. pass or fail
2. reason
3. corrected action if needed
Cost-to-Outcome Prompt
You are assessing whether an AI workflow is economically viable.
For the workflow below, estimate:
- model cost
- human review cost
- correction cost
- business value created
Then classify the workflow as:
- worth scaling
- needs redesign
- not viable
Autonomy Risk Prompt
You are evaluating an AI workflow for control risk.
Identify:
- where the system can act without approval
- where that is unsafe
- what should remain assist-only
- the top 5 failure modes
Workflow:
[insert workflow here]
Step-Change Review Prompt
You are reviewing a new frontier model before adoption.
Assess:
- what it appears materially better at
- what new risks come with the jump in capability
- what workflows it could replace
- what workflows should still stay with weaker or safer models
Return in 4 bullet points.
4) Guardrails
Never scale a workflow just because the model is impressive.
Track instruction-following, not just output quality.
Tie model cost to business outcome, not curiosity.
Keep approval gates for anything high-impact or autonomous.
Assume stronger models may create stronger failure modes too.
Re-test workflows whenever the underlying model changes.
5) Pilot Rollout — 3 hours
Pick one AI workflow that is expensive, semi-autonomous, or both.
Map the task, the model used, and the exact instruction it is meant to follow.
Add a simple evaluator that checks whether the output actually obeyed the instruction.
Track cost, correction rate, and override rate across 10–15 live examples.
Downgrade or redesign any workflow that is too costly or too disobedient.
Only expand the workflow once it proves both useful and controllable.
6) Metrics
Cost per workflow run
Instruction-following pass rate
Human override rate
Correction time per output
Percentage of workflows using oversized models
Number of disobedience incidents logged
Monthly spend avoided after routing or redesign
Pro Tip: The most dangerous AI system is not the dumb one. It is the expensive, impressive one that quietly stops doing what it was told.
🎯 The Arsenal — Tools & Platforms
LangGraph · route workflows by task complexity and control level instead of brute-forcing everything through one model · LangGraph
Google Sheets · simple tracking layer for cost, correction rate, and instruction-following failures · Google Sheets
Evaluation prompts · lightweight way to check if an agent obeyed the brief instead of just sounding smart · TechCrunch on Sora economics
Frontier-model review · useful whenever a genuine step-change model appears and you need to separate capability from chaos · Axios on Mythos
Behavior monitoring · increasingly necessary as reported cases of instruction-defying AI systems rise · The Guardian on agent disobedience study
Copy-paste prompt block:
You are helping me design an AI Obedience Layer for a workflow.
For the workflow below:
1. identify the exact instruction the AI is supposed to follow
2. identify where the model could bend or ignore that instruction
3. identify which steps should remain assist-only
4. estimate model cost and human correction cost
5. list the top 5 control risks
6. propose a simple evaluator for instruction-following
7. design a 2-week pilot
Workflow:
[insert workflow here]
Return the answer in markdown with sections for:
- Workflow summary
- Instruction map
- Control risks
- Approval points
- Cost analysis
- Evaluator design
- Pilot rollout
- Metrics
đź’ˇ Free Office Hours
If your AI workflows are getting smarter, pricier, and a little too comfortable making their own calls, I run free office hours to help map the workflow, tighten the control layer, and keep the whole thing useful.
Book here: https://calendly.com
88% resolved. 22% stayed loyal. What went wrong?
That's the AI paradox hiding in your CX stack. Tickets close. Customers leave. And most teams don't see it coming because they're measuring the wrong things.
Efficiency metrics look great on paper. Handle time down. Containment rate up. But customer loyalty? That's a different story — and it's one your current dashboards probably aren't telling you.
Gladly's 2026 Customer Expectations Report surveyed thousands of real consumers to find out exactly where AI-powered service breaks trust, and what separates the platforms that drive retention from the ones that quietly erode it.
If you're architecting the CX stack, this is the data you need to build it right. Not just fast. Not just cheap. Built to last.
🕹️ Game Over
Some models are too expensive to keep alive. Some are too powerful to release casually. Some just will not listen. Cool. Build accordingly.
— Aaron Automating the boring. Amplifying the brilliant.
Subscribe: link

