Agent UX: window/tab confusion + doctrine delivery

BlitzOS · findings & proposals from two agent self-tests

2026-06-24 · branch blitz-v1 · sources: Agent 5 (workflow + window typing) and Agent 3 (“log in to reddit”)

root cause

Both headline failures, the agent treating a window connection like a browsable tab, and the agent asking for a login in prose instead of a handoff card, come from one thing: the operating doctrine is delivered to a spawned agent as a thin pointer it must fetch on demand, not inlined. The doctrine already contains both rules. The agent simply never had it in context.

1 · Context

Agent 5 — workflow + window typing

Replicated, fixed, and field-verified in a packaged VM build (real TCC).

Item	Status
`spawn claude ENOENT` (workflows)	fixed
`cg_key` keys + modifiers (End, cmd+End)	fixed
`action:'paste'`	fixed
`connection_reveal` for windows	fixed
screenshot (was empty / TCC)	fixed
helper redeploy (CFBundleVersion bump)	fixed

Agent 3 — “log in to reddit”

A cascade: only a Safari window connection was available (a Google Doc), no Chrome in the VM. The agent tried to browse with it, navigate, keystroke a URL, then open via shell, which spawned a disconnected window. From there every screenshot was the wrong page, and it asked for credentials in prose instead of a handoff card.

This is, in effect, the handoff-card test that was flagged untested. It surfaced the discoverability and window-state-mismatch problems below.

2 · The shared root cause

A spawned agent's bootstrap inlines identity, the relay address, a couple of hard rules (web, progress), the duty, and the event loop. The full doctrine is a pointer:

bootstrap.txt

identity · relay · web/progress rules · duty · /events loop

→

“thin pointer”

“guide is at $B/agents.md … fetch it only afterward”

→

agents.md (lazy)

handoff rule + window-vs-tab rule live here, only if curl'd

// agent-runtime.mjs:7   "the served blitzos-agents.md is the source of truth; this is a thin pointer"
// guide fragment (duty agent):
//   "fetch the guide (curl …/agents.md) only afterward … Do not let reading the guide delay your first action."

The doctrine already answers both failures. agents.md:60: “Connecting a browser as a window gives only the toolbar AX tree, not the page, so use a tab for the web.” The handoff rule (“never prose, call request_handoff”) is in the same guide. The agent never fetched it.

3 · Cause A — window vs tab proposal

Finding

A window connection drives a native app (read / act / reveal via AX, keystrokes, screenshot). It cannot navigate or run_js. Web work needs a tab (Blitz Chrome, or a connected browser tab).
connection_list does return type and capabilities per connection, but nothing reads as “this can't be browsed,” and connection_navigate (documented TAB-only) errors on a window with a bare verb "navigate" is not supported.
With no doctrine and no browsable source, the agent forced it: keystroke URL → shell open → a second, unconnected window → wrong screenshots for the rest of the session.

Proposal

Inline the doctrine (below) — it already carries the window-vs-tab rule.
Make connection_navigate / run_js on a window return a useful error: “this is a window connection (a native app), not a browsable tab — to open a URL use Blitz Chrome (blitz_chrome_open) or ask the user to open it in their browser and connect that tab.”
Add the “no browsable source” path to the doctrine: with no Chrome and only a window/native connection, don't force navigation — ask the user to open the site and connect the tab (or sign in). That is the precise hole Agent 3 fell into.

4 · Cause B — handoff & doctrine delivery proposal

Finding

The agent asked for credentials in chat prose. Doctrine is explicit: “Do NOT write prose telling them to go sign in — call request_handoff instead.”
Root cause is delivery, not the rule: the handoff rule sits in the lazily-fetched agents.md, so it wasn't in context when the login wall appeared.

Proposal — inline the full doctrine into the bootstrap

bootstrap.txt

identity · relay · full agents.md (read from the live file) · duty · /events loop

✓

every agent, turn 1

handoff rule + window/tab + connection routing all in context

Read the same blitzos-agents.md that's served at /agents.md, at bootstrap-build time → no copy, no drift, single source of truth preserved.
~5 KB. Trivial for the context window. The lazy fetch is exactly why the agent missed both rules.
Keep one line: “this is your full guide; re-fetch $B/agents.md only if you want a fresh copy.”
Optional belt-and-suspenders: one-line the handoff rule into the connection_read/act descriptions too.

Trade-off: the bootstrap comment says it “stays a thin pointer” on purpose (small bootstrap, single source). I'd flip that deliberately — reading the live file keeps single-source-of-truth and the size cost is nothing.

5 · Deferred — verify in code first later

Item	Why deferred
D7 · screenshot staleness (identical MD5)	could be a real cache, or the connected window genuinely never changed (Reddit was in a different window). Check the capture path before adding `fresh:true`.
D8 · premature handoff resolve	“tab navigated” fires on any URL change incl. the initial load. Gate on “navigated off the login route / settled.”
connection_act silent success	helper can't always know the semantic outcome of a synthetic key; richer effect is heuristic.
AX-vs-screenshot mismatch warning · claim-next-window	deeper heuristics / a new feature; scope separately.

6 · Recommended order

1. Inline the doctrine into the bootstrap (fixes the root of A and B, ~5 of 8 failures). proposal
2. Helpful window navigate/run_js error + the doctrine “no browsable source” path. proposal
3. Then investigate D7 (screenshot freshness) and D8 (handoff completion signal). verify first

Generated for review. All findings grounded in src/main/agent-runtime.mjs, blitzos-agents.md, connection-ops.mjs, and os-tools.mjs. The Agent 5 fixes are committed and field-verified; the Cause A/B items are proposals pending a go-ahead.