# Operator Panel Walkthrough — Task Briefing for LLM Agents This doc is a self-contained briefing for an LLM agent (Claude, GLM, Kimi, OpenCode, Qwen, Gemini, etc.) to walk every screen of the member-console operator panel and produce a structured assessment. It is part of Milestone 7 phase 7a (see `status/operator-ux.md`), sub-tracks **7a.2 Deterministic capture** and **7a.3 Subjective walkthrough (multi-LLM)**. **You are not expected to read the codebase to do this task.** The screens are walked through a live browser. You are doing a UX walkthrough, not a code review. --- ## Why this exists 7a.1 was code archaeology — what screens exist, what they do, how they depend on each other. It cannot tell us *how the panel feels to use*. That gap matters for 7b (information architecture), 7c (design system), and 7e (form and action patterns), which all need a screen-by-screen baseline of what the panel actually looks like, what breaks, and what could be better. Two sub-phases run independently: - **Phase A — Deterministic capture.** Run **once**, by whichever agent runs first. Produces factual evidence (screenshots, console errors, Lighthouse JSON, DOM a11y dumps, tab-order maps, HTMX timings, forced error-state and empty-state responses) that is reproducible and not subject to LLM judgment. - **Phase B — Subjective walkthrough.** Run **once per LLM**. Each agent reads the Phase A evidence and walks the live UI themselves, then writes a structured assessment. Multiple agents run independently so that convergence/divergence is itself signal for 7b/7c. If you are running Phase A, check `docs/operator-ux-walkthrough-evidence/` — if it exists and is recent, skip Phase A and go straight to Phase B. --- ## Output locations and naming - **Phase A** → `docs/operator-ux-walkthrough-evidence/` containing one subdirectory per screen (see Screen inventory below for slugs). Plus a top-level `INDEX.md` listing the captured screens and the chrome-devtools-mcp run timestamp. - **Phase B** → `docs/operator-ux-walkthroughs/{agent-id}.md`. Suggested ids: `claude-opus-4-7.md`, `claude-sonnet-4-6.md`, `glm-4-6.md`, `kimi-k2.md`, `qwen3-coder.md`, `gemini-2-5-pro.md`. Use a stable id so re-runs overwrite cleanly. Do not edit another agent's Phase B file. Do not edit Phase A evidence — re-run captures whole. --- ## Scope **In scope.** Visiting every operator screen listed below, capturing the evidence in Phase A or writing the rubric in Phase B. **Out of scope.** No code changes. No design proposals. No IA recommendations (that is 7b's job). No spec rewrites. No "I would refactor this" notes — describe what is there and what is friction, not what should replace it. The point of multi-LLM Phase B is to gather independent judgment, not converge on a fix. If you find a bug that looks like a regression rather than UX friction (500 error, missing data, broken HTMX swap that prevents you from continuing), note it in Phase B under "Bugs encountered" and continue the walkthrough. Do not stop to fix it. --- ## Prerequisites You need: 1. The repository checked out and your working directory at the repo root. 2. Docker + Docker Compose installed and the daemon running. 3. Go toolchain matching the repo's `go.mod`. 4. `chrome-devtools-mcp` available as an MCP tool, and the `chrome-debug` wrapper at `~/.local/bin/chrome-debug` (launches Chrome on `:9222`). 5. A reasonable terminal — bootstrap, `docker compose up`, the Go process, and the chrome-devtools session all need to coexist. ### Demo data — opt-in (REQUIRED for detail/action slugs) The default stack comes up empty (no products, no plan ladders, no entitlement sets, no grants), which means about 11 of 26 operator slugs are unreachable. Before starting Phase A capture, **run the demo seeder**: ```sh ./test/seed-demo.sh ``` This is idempotent; re-running is a no-op. See [`test/seed/member-console-demo/README.md`](../test/seed/member-console-demo/README.md) for what it populates and why. ### Demo persons — only alice needs to log in The seeder pre-creates bob/carlos/diana via `provisioning.AutoProvision` keyed on the pinned Keycloak UUIDs (see `2026-05-11-demo-seeder-persons` and `2026-05-11-keycloak-id-pinning-fix`). The People tab is populated before any login. **Just log in as `alice` / `password`** to drive the walkthrough — her row is created lazily on first login (this exercises the lazy-creation path on every fresh stack and is intentional). After alice's first login, re-run `./test/seed-demo.sh` to attach the `demo-baseline` grant to her personal org. --- ## Stack bootstrap Follow `test/AGENTS.md`. Summary: ```sh cd test ./bootstrap-stack.sh # idempotent; writes test/.env with isolated host ports docker compose up -d # brings up postgres, valkey, keycloak, temporal, fedwiki + one-shot seeders set -a; . .env; set +a go run .. start --config mc-config.yaml ``` `bootstrap-stack.sh` prints the URL/port table. **Use those URLs, not the defaults in `mc-config.yaml`** — host ports are per-worktree. `keycloak-seed` and `fedwiki-render` run automatically on compose up. Wait for both to complete before logging in. Teardown when finished: `./teardown-stack.sh` from the `test/` directory. --- ## Chrome session In a separate terminal: ```sh chrome-debug & ``` The chrome-devtools MCP attaches to `:9222`. If attach fails, run `chrome-debug &` again — single source of truth wrapper at `~/.local/bin/chrome-debug`. Use `mcp__chrome-devtools__new_page` with the member-console base URL printed by bootstrap. --- ## Login Test user (from `test/seed/keycloak/seed-keycloak.sh`): - **Username:** `alice` - **Password:** `password` - **Email:** `alice@example.com` - **Role:** `operator-member` (this is the role gate for the operator panel; see `OperatorRole` constant in `internal/server/operator_partials.go`). Other seeded users (`bob`, `carlos`, etc.) do **not** have the operator role and will see "access denied" if they try to load `/operator`. That's expected and is its own screen worth capturing in Phase A (see Screen inventory: *access-denied*). After login, navigate to `/operator`. This is the shell page; everything else loads as an HTMX partial swapped into `#operator-content` based on `?tab=...` query state and `data-switch-tab` clicks. --- ## Screen inventory Walk every screen in this list. Slugs are the directory names for Phase A evidence. The "Reach by" column is the URL or click path; if it changes, re-derive the inventory from `internal/server/operator_partials.go` and update this doc. > **Inventory snapshot date: 2026-05-10.** To check for drift: > ```sh > grep -rEhn 'mux\.HandleFunc\("(GET|POST|PUT|PATCH|DELETE) /(partials/)?operator' internal/server/*.go | sort -u > ``` ### Catalog axis (entity types you define before runtime) | Slug | Screen | Reach by | |------|--------|----------| | `org-types` | Org Types list | `/operator?tab=org-types` | | `products` | Products list | `/operator?tab=products` | | `products-edit` | Product edit form | click "Edit" on a product row | | `products-prices` | Product prices nested view | click "Prices" on a product row | | `entitlement-sets` | Entitlement Sets list | `/operator?tab=entitlement-sets` | | `entitlement-sets-edit` | Entitlement Set edit form | click "Edit" on a set row | | `entitlement-sets-rules` | Rules nested view | click "Rules" on a set row | | `plan-ladders` | Plan Ladders list | `/operator?tab=plan-ladders` | | `plan-ladders-edit` | Ladder edit form | click "Edit" on a ladder row | | `plan-ladders-tiers` | Ladder tiers nested view (includes rank reorder) | click "Tiers" on a ladder row | | `plan-ladders-validation` | Ladder validation surface | follow link from tiers view if present | ### Runtime axis (per-person, per-org state) | Slug | Screen | Reach by | |------|--------|----------| | `users` | Persons list (canonical entry tab) | `/operator?tab=users` (default landing for many builds) | | `organizations` | Organizations list | `/operator?tab=organizations` | | `organizations-enrollment` | Per-org enrollment surface | click "Enrollment" on an org row | | `grants` | Grants list | `/operator?tab=grants` | | `grants-create-plan` | Plan-grant creation flow | use "Issue grant" with a plan-typed product | | `grants-create-non-plan` | Legacy grant creation flow | use "Create grant" with a non-plan product | | `grants-revoke` | Revoke flow | click "Revoke" on a grant row | | `grants-revoke-and-transition` | Revoke-and-transition flow | the alternate path on a plan grant | ### Billing surfaces | Slug | Screen | Reach by | |------|--------|----------| | `billing-accounts` | Stripe billing accounts | `/operator?tab=billing-accounts` | | `billing-subscriptions` | Subscriptions | `/operator?tab=billing-subscriptions` | | `billing-invoices` | Invoices | `/operator?tab=billing-invoices` | | `billing-payments` | Payments | `/operator?tab=billing-payments` | ### Integrations axis | Slug | Screen | Reach by | |------|--------|----------| | `sites` | FedWiki sites (today's only integration) | `/operator?tab=sites` | ### Cross-cutting / negative | Slug | Screen | Reach by | |------|--------|----------| | `landing` | First view after `/operator` with no `?tab=` | navigate to `/operator` directly | | `access-denied` | Non-operator user hitting `/operator` | log out, log in as `bob`, navigate to `/operator` | --- ## Phase A — Deterministic capture procedure For **each screen slug** above, capture the following into `docs/operator-ux-walkthrough-evidence/{slug}/`: 1. **`screenshot.png`** — `mcp__chrome-devtools__take_screenshot` at default desktop viewport (1280×800 or whatever chrome-devtools default is — do not resize). Full page, not viewport-only. 2. **`console.json`** — `mcp__chrome-devtools__list_console_messages` filtered for the visit, captured as JSON array. Include all severities (warn, error, info). 3. **`network.json`** — `mcp__chrome-devtools__list_network_requests` filtered to requests during the screen load, captured as JSON. Highlight any 4xx/5xx in a top-level `failures: [...]` field. 4. **`csp-violations.txt`** — any CSP-violation console messages copied verbatim (strict CSP is in force; `report-only` violations are evidence of patterns the design system needs to fix in 7c). 5. **`lighthouse.json`** — `mcp__chrome-devtools__lighthouse_audit` raw output. Run at minimum the accessibility category; performance is bonus. 6. **`a11y-tree.json`** — use `mcp__chrome-devtools__evaluate_script` with a script that walks `document` and emits heading hierarchy, landmark roles, label/control associations, and any element with `aria-*` attributes. This is the structural a11y dump. 7. **`tab-order.json`** — use `mcp__chrome-devtools__evaluate_script` to enumerate focusable elements (`tabindex >= 0`, native form/anchor/button) in DOM order, capture their tag+role+accessible name. 8. **`htmx-timings.json`** — if the screen loaded via HTMX swap (any `?tab=` navigation), capture `htmx:beforeSwap` and `htmx:afterSwap` timings via an injected listener. Single number: ms between request start and swap complete. 9. **`empty-state.json`** *(only if the screen has a list)* — note whether the empty state is rendered when the underlying table is empty. If the seed produces data, you cannot easily empty it; in that case set `empty_state_seen: false` with a note explaining why. 10. **`forced-error.json`** *(only if the screen has a mutating form)* — submit the form with invalid data (empty required fields, wrong types) and capture the HTTP response + the rendered error surface. Note whether the error landed inline, as a toast, as an alert, or as a CSP-violating inline script. After all slugs are captured, write `docs/operator-ux-walkthrough-evidence/INDEX.md` listing: - The chrome-devtools-mcp run start/end timestamps - The agent id that ran Phase A - A table of every slug with a one-word capture status (`ok`, `partial`, `skipped` + reason) - Any inventory drift discovered (routes that exist in code but produced no UI to capture, or UI that has no slug in this doc) Phase A is done when `INDEX.md` exists and every slug is at least `partial` with a reason. --- ## Phase B — Subjective walkthrough rubric Each LLM produces one file: `docs/operator-ux-walkthroughs/{agent-id}.md`. Open each screen yourself in chrome-devtools-mcp (don't just read Phase A evidence — feel the screen). Use the template below verbatim. **Keep judgments anchored to specific observations** ("the destructive 'Delete' button has the same visual weight as 'Edit' next to it" rather than "destructive actions are unclear"). ### Per-screen template (repeat for every slug) ```markdown ## {slug} **Reach by:** {URL or click path} **Phase A evidence:** docs/operator-ux-walkthrough-evidence/{slug}/ ### Visual hierarchy - Where does the eye land first? Is that the right thing for the operator's task here? - Are primary actions distinguishable from secondary? - Is whitespace doing useful work or is the screen cramped/sparse? ### Copy - Page title — accurate, scannable? - Column headers / labels — domain terms consistent with the rest of the panel? - Error/empty state copy (from Phase A) — helpful or generic? - Confirm-action copy — does it name the affected entity? ### Destructive affordances - Are destructive actions (delete, revoke, retire) visually distinct from non-destructive? - Are they guarded by confirmation? Does the confirmation say what will happen? - Could an operator click one by accident? Why or why not? ### Pattern drift vs. other screens - What does this screen do differently from the rest of the panel? (Form layout, table style, badge colors, modal pattern, button placement.) Note three drift items max. - Does the drift have a reason you can see, or does it look incidental? ### Friction - Walking the most likely workflow on this screen, what slowed you down? - Did anything require you to remember context not on this screen? - Did anything require leaving and coming back? ### Bugs encountered - Anything that broke. Distinguish clearly from UX friction. ### Three things that could be better - Bullet exactly three. Bounded so outputs across agents are comparable. ``` ### Hot-path workflows (walk these end-to-end) After per-screen sections, add a **Workflows** section. Walk each of the three hot paths from 7a.1 end-to-end as a single linear task. Stopwatch yourself; note every click and every place you paused. 1. **Per-org composite lookup.** "What is the current state of organization X?" — find an org and answer: who are its members, what is its current plan/tier, what's the active billing state, are there any active grants. Report total clicks, total seconds, number of tabs you had to visit, and what context you had to hold in your head between tabs. 2. **Tier rank reorder cascade.** Reorder two tiers within a plan ladder. Report: did you see a preview of what would change? Did you understand the per-org impact before confirming? Was the confirmation reversible-looking or final? 3. **Grant adjustment.** Find a member, find their grants, revoke one, issue another. Note whether the plan-vs-non-plan distinction was visible to you, and whether the legacy `CreateGrant` path and the plan-aware `IssueGrant` path felt like the same flow or different flows. ### Final sections ```markdown ## Cross-cutting observations Three to five themes that recur across multiple screens (form patterns, table patterns, terminology, navigation, feedback timing, accessibility, etc.). Each anchored to at least two screen slugs where you saw it. ## Single biggest surprise One paragraph. The thing that was most different from what you expected after reading 7a.1 (`docs/operator-ux-research.md`). ## Three biggest opportunities for 7b/7c Exactly three. These feed downstream phase prioritization. Anchored to screen slugs and Phase A evidence. ``` --- ## Submission checklist Before declaring done, verify: - [ ] If Phase A: `docs/operator-ux-walkthrough-evidence/INDEX.md` exists and every slug has a capture status. - [ ] If Phase B: every slug in the inventory has a section in `{agent-id}.md`, even if short. - [ ] No code changes were made. - [ ] No `docs/operator-ux-research.md` or `status/` files were modified. - [ ] Stack was torn down (`test/teardown-stack.sh`) if you brought it up yourself. - [ ] Chrome session closed. --- ## Notes for future agents This briefing is intentionally LLM-agnostic. If you find something about the procedure that doesn't work for your harness, note it in your Phase B file under a top-level **Procedure issues** section rather than editing this doc — the next agent will diff your note against this brief. If the panel structurally changes after this briefing was written (routes added/removed, new tabs, IA shift from 7b landing), re-derive the screen inventory from `internal/server/operator_partials.go` and propose an update to this doc in a follow-up. Do not silently expand the walkthrough.