šŸŽ® The Next Input — Issue #068

Your New "Invisible Intern"

In partnership with

⚔ The Briefing — 60 sec

šŸ› ļø The Playbook — AI Computer Use Agent: The ā€œInvisible Internā€

Missionā€ƒDeploy an AI that can autonomously handle on-screen workflows: filling forms, navigating apps, sending emails, and updating dashboards—no API integrations required.
Difficulty Expertā€ƒ|ā€ƒBuild time 3–5 hours (pilot)
ROIā€ƒSaves ā‰ˆ 15–25 h/week of repetitive manual tasks across ops, admin, and support functions.

0) Why This Matters

Gemini’s ā€œComputer Useā€ model is a breakthrough—AI can now perform any desktop or browser task the same way you do: moving the mouse, clicking buttons, reading screens, and reasoning across windows. It’s the missing piece between ā€œtalkingā€ and ā€œdoing.ā€

1) Architecture

Layer

Tooling

Purpose

Vision Input

Screen capture (Gemini / Rewind / Cursor)

See what’s on screen

Reasoning Model

Gemini Computer Use / GPT-4V

Understand interface layout & intent

Controller

PyAutoGUI / Selenium / Playwright

Execute clicks, drags, keystrokes

Memory

Supabase / Redis

Track past actions, window states

Orchestrator

LangChain / AgentKit

Decide multi-step plans

Human-in-Loop

Slack / CLI Approval

Confirm risky actions (send, delete, submit)

2) Workflow

  1. Trigger

    • User says: ā€œUpdate our customer dashboard and email today’s metrics.ā€

  2. Observe

    • Model takes screenshot of screen / browser → identifies open apps.

  3. Plan

    • LLM breaks task into steps:

      1. Open Excel sheet.

      2. Copy metrics.

      3. Paste into dashboard CMS.

      4. Export summary → attach → email team.

  4. Execute

    • Controller performs actions in order, logging every move.

  5. Verify

    • Slack message: ā€œāœ… Dashboard updated. Draft email below—send?ā€

  6. Memory Update

    • Logs what was done, where, and why for reuse tomorrow.

3) Example Prompt

SYSTEM: You are a computer operator agent.
INPUT: Screenshot + user command.
GOAL: Complete the request using on-screen applications.
RULES:
- Never click destructive actions without user confirmation.
- Always log {step, tool, action, result}.
- When uncertain, ask for clarification.
OUTPUT: JSON plan of clicks/keystrokes + summary of expected outcome.

4) Guardrails

  • Safety Checks:

    • Disable file deletions by default.

    • Require approval for financial or external emails.

  • Compliance:

    • Screen logging must redact PII before upload.

  • Boundaries:

    • Cap runtime to <10 minutes per command.

    • Limit accessible apps to a whitelist.

5) Pilot Rollout — 3 Hours

  1. Choose one repeatable task (e.g., updating CRM or dashboard).

  2. Run Gemini Computer Use model in observation mode (no clicks yet).

  3. Test ā€œread-onlyā€ step extraction accuracy.

  4. Enable control mode with human confirmation.

  5. Log time saved vs manual execution.

6) Metrics

  • Time saved per task.

  • Number of steps automated per workflow.

  • Accuracy of UI element selection.

  • Manual review interventions per week.

Pro tip: Pair with AgentKit to make your Computer Use agent a ā€œmulti-surfaceā€ powerhouse—browsing, clicking, and API-calling seamlessly.

šŸŽÆ The Arsenal — Tools & Prompts

Asset

What it does

Link

Gemini Computer Use Model

AI that controls real apps visually.

https://blog.google/technology/google-deepmind/gemini-computer-use-model/

Playwright / Selenium

Browser automation at code level.

https://playwright.dev

Supabase

Tracks agent logs & session states.

https://supabase.com

Prompt Ā· UI Action Plan

Screenshot → step-by-step plan.

From screenshot + command, list UI actions in JSON:
[{step, element, action, target, expected_result}]

šŸ’” Free Office Hours

Want an AI that clicks, types, and thinks like your best intern?
Book a free 15-minute Office Hours slot—no sales pitch, just workflows solved.

How Canva, Perplexity and Notion turn feedback chaos into actionable customer intelligence

Support tickets, reviews, and survey responses pile up faster than you can read.

Enterpret unifies all feedback, auto-tags themes, and ties insights to revenue, CSAT, and NPS, helping product teams find high-impact opportunities.

→ Canva: created VoC dashboards that aligned all teams on top issues.
→ Perplexity: set up an AI agent that caught revenue‑impacting issues, cutting diagnosis time by hours.
→ Notion: generated monthly user insights reports 70% faster.

Stop manually tagging feedback in spreadsheets. Keep all customer interactions in one hub and turn them into clear priorities that drive roadmap, retention, and revenue.

šŸ•¹ļø Game Over

Ship one ā€œComputer Useā€ agent today—tomorrow your AI will literally work beside you.
Share your win; you could headline Issue #069.

— Aaron
Automating the boring. Amplifying the brilliant.

Forwarded this? Subscribe here