🎮 The Next Input — Issue #068

Your New "Invisible Intern"

Aaron Bost
October 08, 2025

In partnership with

⚡ The Briefing — 60 sec

Google DeepMind unveils Gemini’s “Computer Use” model. New day, new model—Gemini just learned how to use a computer better than most humans.
Anthropic expands global operations to India. The Claude empire goes east.
Wall Street explains how AMD’s own stock will fund OpenAI’s chip bill. The game keeps changing—finance meets silicon.

🛠️ The Playbook — AI Computer Use Agent: The “Invisible Intern”

Mission Deploy an AI that can autonomously handle on-screen workflows: filling forms, navigating apps, sending emails, and updating dashboards—no API integrations required.
Difficulty Expert | Build time 3–5 hours (pilot)
ROI Saves ≈ 15–25 h/week of repetitive manual tasks across ops, admin, and support functions.

0) Why This Matters

Gemini’s “Computer Use” model is a breakthrough—AI can now perform any desktop or browser task the same way you do: moving the mouse, clicking buttons, reading screens, and reasoning across windows. It’s the missing piece between “talking” and “doing.”

1) Architecture

Layer	Tooling	Purpose
Vision Input	Screen capture (Gemini / Rewind / Cursor)	See what’s on screen
Reasoning Model	Gemini Computer Use / GPT-4V	Understand interface layout & intent
Controller	PyAutoGUI / Selenium / Playwright	Execute clicks, drags, keystrokes
Memory	Supabase / Redis	Track past actions, window states
Orchestrator	LangChain / AgentKit	Decide multi-step plans
Human-in-Loop	Slack / CLI Approval	Confirm risky actions (send, delete, submit)

2) Workflow

Trigger
- User says: “Update our customer dashboard and email today’s metrics.”
Observe
- Model takes screenshot of screen / browser → identifies open apps.
Plan
- LLM breaks task into steps:
  1. Open Excel sheet.
  2. Copy metrics.
  3. Paste into dashboard CMS.
  4. Export summary → attach → email team.
Execute
- Controller performs actions in order, logging every move.
Verify
- Slack message: “✅ Dashboard updated. Draft email below—send?”
Memory Update
- Logs what was done, where, and why for reuse tomorrow.

3) Example Prompt

SYSTEM: You are a computer operator agent.
INPUT: Screenshot + user command.
GOAL: Complete the request using on-screen applications.
RULES:
- Never click destructive actions without user confirmation.
- Always log {step, tool, action, result}.
- When uncertain, ask for clarification.
OUTPUT: JSON plan of clicks/keystrokes + summary of expected outcome.

4) Guardrails

Safety Checks:
- Disable file deletions by default.
- Require approval for financial or external emails.
Compliance:
- Screen logging must redact PII before upload.
Boundaries:
- Cap runtime to <10 minutes per command.
- Limit accessible apps to a whitelist.

5) Pilot Rollout — 3 Hours

Choose one repeatable task (e.g., updating CRM or dashboard).
Run Gemini Computer Use model in observation mode (no clicks yet).
Test “read-only” step extraction accuracy.
Enable control mode with human confirmation.
Log time saved vs manual execution.

6) Metrics

Time saved per task.
Number of steps automated per workflow.
Accuracy of UI element selection.
Manual review interventions per week.

Pro tip: Pair with AgentKit to make your Computer Use agent a “multi-surface” powerhouse—browsing, clicking, and API-calling seamlessly.

🎯 The Arsenal — Tools & Prompts

Asset	What it does	Link
Gemini Computer Use Model	AI that controls real apps visually.	https://blog.google/technology/google-deepmind/gemini-computer-use-model/
Playwright / Selenium	Browser automation at code level.	https://playwright.dev
Supabase	Tracks agent logs & session states.	https://supabase.com
Prompt · UI Action Plan	Screenshot → step-by-step plan.

From screenshot + command, list UI actions in JSON:
[{step, element, action, target, expected_result}]

💡 Free Office Hours

Want an AI that clicks, types, and thinks like your best intern?
Book a free 15-minute Office Hours slot—no sales pitch, just workflows solved.

→ Grab a slot: https://calendly.com/aaron-cylentis/the-next-input-office-hours

How Canva, Perplexity and Notion turn feedback chaos into actionable customer intelligence

Support tickets, reviews, and survey responses pile up faster than you can read.

Enterpret unifies all feedback, auto-tags themes, and ties insights to revenue, CSAT, and NPS, helping product teams find high-impact opportunities.

→ Canva: created VoC dashboards that aligned all teams on top issues.
→ Perplexity: set up an AI agent that caught revenue‑impacting issues, cutting diagnosis time by hours.
→ Notion: generated monthly user insights reports 70% faster.

Stop manually tagging feedback in spreadsheets. Keep all customer interactions in one hub and turn them into clear priorities that drive roadmap, retention, and revenue.

Get a personalized demo

🕹️ Game Over

Ship one “Computer Use” agent today—tomorrow your AI will literally work beside you.
Share your win; you could headline Issue #069.

— Aaron
Automating the boring. Amplifying the brilliant.

Forwarded this? Subscribe here