Files

Christian Galo fd7c61d594 Add Operator Panel IA decision record and spec delta

Introduce docs/operator-ia.md and an OpenSpec change "operator-ia" with
design, proposal, spec delta, and tasks; update status to mark M7b In
progress

2026-05-12 15:51:42 -05:00

38 KiB

Raw Blame History

Issues

Tracked items structured for eventual migration to Gitea issues.

Open

Operator panel: heading hierarchy is broken (H1 → H6, no H2/H3/H4/H5)

Labels: bug, a11y, frontend Operator pages jump straight from H1 ("Operator") to H6 for section titles and table headers, with one stray H5 ("Confirm" modal). Violates WCAG 1.3.1 / 2.4.6. Discovered in docs/operator-ux-walkthrough-evidence/landing/; cross-cutting across the operator surface. Candidate for M7-7c design-system foundation.

Operator forms: inputs missing `autocomplete` attribute

Labels: bug, a11y, frontend Chrome flagged 3 inputs at the operator landing without autocomplete. Violates WCAG 1.3.5. Cross-cutting across operator forms. Candidate for M7-7c.

Operator template has at least one inline event-handler attribute

Labels: bug, security, frontend Observed during Phase A v1 walkthrough — strict CSP would reject it unless an exception is configured. Worth grepping templates for on*= attributes and lifting handlers out. Candidate for M7-7c.

Operator URL/route/code naming drift

Labels: bug, frontend, dx The visible tab label, the ?tab= URL parameter, the route slug, and the partial filename disagree. Examples: tab label "People" → ?tab=people → route/partial users; tab label "Organizations" → ?tab=orgs (but ?tab=organizations does NOT switch tabs — operator-tabs.js only handles the short slug). Bookmarkability fails silently when a user infers the long name from the tab label. Discovered in Phase A v1 + reconfirmed in v2. Candidate for M7-7b/7d. IA decision (2026-05-12, OpenSpec change operator-ia): canonical route slugs are now declared in docs/operator-ia.md. The ?tab= short-slug shape disappears with M7-7d's MPA rewrite, which consumes the IA's route hierarchy as input — leave open until 7d closes it.

FedWiki Sites operator tab empty under full seed

Labels: bug, fedwiki, seed The compose seed provisions FedWiki fixtures at the FedWiki service but local fedwiki.sites rows are populated by integration workflows, not the seed. A "fully seeded" stack still shows an empty Sites tab. Related to "FedWiki sync does not populate DB from existing disk sites" (already filed) but distinct: that issue is about post-redeploy DB sync; this one is about first-run seed coverage. Candidate for M10 (Integration architecture) or earlier if the friction recurs.

Operator entitlement-set rules: no UI to add boolean rules

Labels: bug, frontend partials/operator/entitlement-sets/{id}/rules exposes only the "Add Limit Rule" form (resource key + value + stacking + per-unit). The schema and seeded data include boolean rules (demo.feature-x), but there's no form path to create one. Discovered in Phase A v2. Candidate for M7-7b/7c.

Operator revoke-and-transition: empty product name in confirm body

Labels: bug, frontend, copy The per-org Enrollment "Revoke" button's confirm modal interpolates {{ .ProductName }} into the body text. When the grant target is an entitlement set (not a product), ProductName is empty and the modal reads "Revoke for this organization? The pool returns to its org-type default and a downgrade or end transition is recorded." (two spaces, dangling preposition). Fix: either fall back to the set/grant name, or template the entire phrase conditionally. Discovered in Phase A v2. Candidate for M7-7e action-confirm pattern.

Operator product edit: `lifecycle_status` not exposed

Labels: bug, frontend The schema added lifecycle_status (draft/published/retired) per M6a, and 7a.1 explicitly called out that retire-vs-delete is the right archive primitive. The product edit form (/partials/operator/products/{id}/edit) shows an Active checkbox but no lifecycle-state transitions. Operators cannot retire a product through the UI today. Discovered in Phase A v2. Candidate for M7-7b/7c (covered by existing "Product retirement and Stripe-mapping visibility in operator UI" issue — leave that as the umbrella).

Operator plan-ladder Tiers: "no products to add" alert is misleading when ladder is full

Labels: bug, frontend, copy With all plan-typed products already in the ladder, the alert reads "No available plan products to add. Products must be published with no product type assigned." The first sentence is true (no candidates remain) but the second sentence advises a remedy that does not apply (existing products are correctly typed; they're already on the ladder). Copy bug. Candidate for M7-7e.

Operator grant-issuance: two non-overlapping surfaces with no cross-link

Labels: bug, frontend, ia The global Grants tab and the per-org Enrollment "Issue Grant" form expose different products: global lets the operator pick any product (or an entitlement set); per-org Enrollment limits the Product dropdown to plan-typed products attached to the ladder. The two surfaces have no cross-link or "looking for X? try Y" guidance, and the operator must already know the distinction to choose correctly. Discovered in Phase A v2. Candidate for M7-7b IA. IA decision (2026-05-12, OpenSpec change operator-ia): both issuance forms move to the per-org composite view at /operator/organizations/{orgID}, labeled by intent ("Grant a plan" → IssueGrant; "Grant a non-plan product" → CreateGrant). The two code paths are kept (they align with the structural-vs-labeled product kind split per membcons-db Doc 35); only the surface is consolidated. Global Grants becomes read-only. 7d implements.

Two grant-revoke paths are non-equivalent and indistinguishable in UI

Labels: bug, frontend, correctness Global Grants tab "Revoke" → POST /grants/{id}/revoke (simple; body: "entitlements will be recalculated"). Per-org Enrollment "Revoke" → POST /grants/{id}/revoke-and-transition (composite; body: "pool returns to its org-type default and a downgrade or end transition is recorded"). The operator has no UI hint that these are non-equivalent — silent-correctness risk. Already flagged in docs/operator-ux-research.md as the "two revoke paths" 7a.1 finding; this entry is the issue-tracker pointer so 7b can scope consolidation work. Candidate for M7-7b. IA decision (2026-05-12, OpenSpec change operator-ia): the per-org composite view at /operator/organizations/{orgID} is the sole UI entry point for grant revocation, using the composite revoke-and-transition behavior; the global Grants Revoke affordance is removed. (The broader IA also moves all grant action affordances — issue/extend/revoke — to the per-org view; the global Grants surface becomes read-only browse.) Implementation lands with M7-7d's MPA rewrite — leave open until 7d closes it.

Member-console reads `products.product_type` directly instead of `billing.product_kinds` view

Labels: tech-debt, correctness, backend Per upstream membcons-db Doc 35 (Product Kind Taxonomy), billing.product_kinds is the single authoritative read path for "what kind is this product?" — it derives 'plan' structurally from plan_ladder_tiers membership and delegates the three labeled kinds (addon, usage, one_time) to products.product_type. The view's WHEN plan branch is evaluated first; reading product_type directly misses the structural plan derivation because plans now carry product_type = NULL. The member-console code reads product_type directly in many places (e.g. operator_plan_ladders.go lines 223/431/469/487 use product_type IS NULL / IS NOT NULL as plan-vs-non-plan discriminator). This works today because the application enforces the structural invariant (a non-plan product never has tier rows), but it duplicates the view's discrimination logic and is structurally fragile: any future kind that develops its own structural derivation (Doc 35 §5 sketches one_time via prices.recurring_interval, usage via prices.usage_type, addon via a future relational construct) will silently break direct product_type readers while the view stays correct. Fix: audit reads of products.product_type in Go and SQL; route product-kind discrimination through billing.product_kinds everywhere the application asks "what kind is this?". Writes that declare a non-plan kind continue to set product_type (per Doc 35 §7 reading guide). Related: this is the underlying schema-side reason the two grant-issuance code paths (IssueGrant for plans via ladder; CreateGrant for non-plan via labeled kind) exist — the M7-7b IA decision keeps both paths and labels the issuance forms by intent; this issue tracks the read-discipline cleanup independently.

Operator SPA partial eager-fetch

Labels: tech-debt, frontend, perf On /operator initial page load, all 12 operator partials (users, organizations, org-types, grants, products, plan-ladders, entitlement-sets, billing accounts/subscriptions/invoices/payments, sites) are fetched eagerly via HTMX, then tabs are CSS show/hide on already-rendered DOM. Confirmed by network capture in docs/operator-ux-walkthrough-evidence/landing/network.json. The architectural cause is what M7-7d (SPA → MPA conversion) is here to retire. Tracked here so it survives the M7 phase scoping. Already noted in docs/operator-ux-research.md. IA decision (2026-05-12, OpenSpec change operator-ia): the IA in docs/operator-ia.md replaces the flat-tab SPA model with a three-layer route hierarchy and a curated landing surface — M7-7d's rewrite consumes that hierarchy as input. Leave open until 7d closes it.

Container image should not run as root

Labels: security, infrastructure

Better configuration handling

Labels: dx Validate config at boot with meaningful error messages.

Temporal schedule management on redeploy

Labels: operations Old schedules not cleaned up when config changes.

Session/CSRF secret generation and rotation strategy

Labels: security

Auth setup review

Labels: security, auth Remove Keycloak-specific code, backchannel logout, session timeout, rate limiting.

`/login` state-overwrite race on parallel requests

Labels: auth, bug The OIDC login handler (internal/auth/auth.go LoginHandler) generates a fresh state/nonce/code-verifier on every call and unconditionally writes them to the session. When two requests hit /login concurrently in the same session — e.g. an unauthenticated page load that fans out into a /favicon.ico fetch, both bouncing through the auth middleware's redirect-to-/login — the second call overwrites the first's state before the user finishes authenticating at the IdP. The IdP then returns the first request's state, but the session holds the second request's, producing "State mismatch" 400s on /callback. Symptom recurs whenever a new asset path slips out from under the public-paths allowlist (this is at least the second time we've hit it). Long-term fix: make /login idempotent — if an unconsumed state exists and is recent (e.g. < 5 min old), reuse it instead of clobbering; clear it on /callback success or expiration. Defends against the favicon case AND multi-tab login attempts. Short-term mitigation already shipped: added /favicon.ico to the public-paths allowlist.

Serve HTMX assets locally instead of from CDN

Labels: security, frontend Include SRI hashes.

Custom error pages

Labels: frontend

Database backup before migrations

Labels: operations

Add middleware tests

Labels: testing CSRF, logging, compression, recovery, request ID, timeout, secure headers, CORS.

HTMX handler file structure cleanup

Labels: refactor

Temporal auth race on first boot

Labels: bug, operations After fresh docker compose up -d, first member-console start fails with Temporal auth error. Second attempt works. Fix: add Temporal healthcheck to compose or retry logic to client connection.

FedWiki sync does not populate DB from existing disk sites

Labels: bug, fedwiki After a DB nuke and redeploy, the fedwiki.sites table is empty even though site directories exist on disk (e.g. test/data/fedwiki/). The sync workflow does not re-discover existing FedWiki sites from the filesystem to repopulate the database. This means both the operator panel (ListAllSites) and member views (ListSitesByWorkspace) show no sites.

Keycloak seed ID pinning assumption is fragile (RESOLVED — see OpenSpec change `2026-05-11-keycloak-id-pinning-fix`)

Labels: bug, fedwiki, testing, resolved

Resolved: Keycloak 26.x silently drops the id field on POST /admin/realms/{realm}/users. The fix swaps user creation in seed-keycloak.sh to use POST /admin/realms/{realm}/partialImport, which preserves the pinned id. Verified empirically: a partialImport with id=f0000099-... round-trips through GET /users?username=... with an exact id match. See OpenSpec change 2026-05-11-keycloak-id-pinning-fix for the rewrite and rationale.

Historical context (kept for reference): test/seed/keycloak/seed-keycloak.sh pinned user IDs (e.g. a0000001-... for Alice) so that FedWiki seed data (owner.json) could reference them deterministically. Keycloak did not honor the requested IDs on POST /users — proof at the time: gnu.localtest.me/status/owner.json had "id": "e0249a3c-..." instead of the pinned a0000001-....

Downstream consumers that benefit from the fix:

FedWiki owner.json references (original report).
Member-console demo seeder (OpenSpec change 2026-05-10-member-console-demo-seeder) — its current "do not seed person rows" workaround can be revisited in a follow-up change to re-add seedPersons with deterministic UUIDs.

Role extraction doesn't check resource_access

Labels: bug, auth extractRoles in auth.go doesn't check resource_access.<client-id>.roles. Investigate best pattern for IDP-agnostic role mapping.

Cross-module queries are undocumented

Labels: design-feedback See 2026-03-26 entitlement sets log for discussion on raw SQL vs shared interfaces vs service-layer orchestration.

Directory structure for integrations

Labels: design-feedback If more integrations arrive beyond FedWiki, consider internal/integrations/fedwiki/. Not urgent — database schema namespace provides separation.

Migration orchestration mechanics

Labels: design-feedback Per-module migrations need a boot sequence that collects from each module's embedded FS in dependency order.

Shared trigger function ownership

Labels: design-feedback update_updated_at_column() is used by all modules. Probably belongs in a shared migration or the db package.

provider_configs should not be per-organization

Labels: design-feedback The integration architecture models provider_configs as per-organization configuration. Member-console runs a single Stripe account for the cooperative — there's no multi-org Stripe Connect topology. provider_configs should be a single-row app-level config (or env-based), not org-scoped. Discovered during Stripe integration planning (2026-04-02).

Use `<meter>` for quotas, not for limits

Labels: design-feedback, frontend <meter> is the right element for showing usage against a quota (e.g. 3 out of 17 sites used). It should not be used to display a limit value on its own — e.g. showing "17 sites included" as a bar makes less psychological sense than plain text. Use <meter> only where there is a current usage value to compare against a maximum.

FedWiki-only integration assumption pervades UI and data patterns

Labels: design-feedback, architecture Many UI templates (e.g. fedwiki_sites.html), handler names (FedWikiHandler, FedWikiPartialsHandler), and entitlement display logic assume FedWiki is the only integration. Resource keys (sites, storage_mb) are FedWiki-specific but rendered without integration context. Before adding a second integration, review: template structure (per-integration partials vs generic), handler registration patterns, entitlement display (resource keys need human-friendly labels and integration attribution), and the dashboard layout (currently one "FedWiki Sites" card — won't scale). Discovered during M5 phase 5c exploration (2026-04-11).

Integration / extension architecture (and operator IA placement)

Labels: design-feedback, architecture, ux The operator panel today places FedWiki sites at the same IA level as Products, Plan Ladders, Org Types, and Billing. This entrenches the assumption that FedWiki is the only external service the member-console will ever integrate with — which is wrong. The intended trajectory has the member-console acting as a hub for multiple external services (FedWiki today; NextCloud, Discourse, and others to come). Each is structurally a different concern from the catalog and runtime layers — they own their own resources, admin surfaces, and provisioning patterns, and they plug into the entitlements/billing model rather than being part of it.

Two distinct pieces of work fall out of this:

M7 IA (immediate): the operator panel should not have a top-level "Sites" tab. Integrations live in their own section (e.g. /operator/integrations/...) so adding a second integration does not require re-thinking IA again. M7 only needs to not entrench the FedWiki-as-first-class assumption; it does not need to design the extension contract.
A dedicated milestone for the integration / extension model (next-or-later): a standardized contract for how external services plug into the member-console — extension manifest, resource-key namespacing, per-integration entitlement displays, admin-surface registration patterns, and the boundary between member-console-owned state and integration-owned state. Discourse and NextCloud are the concrete drivers that will exercise the contract. See proposed milestone in status/milestones.md.

Discovered during M7 phase 7a (2026-05-08); supersedes the immediate scope of "FedWiki-only integration assumption" above by carving out the IA work and the architecture work as separate pieces.

Org type CRUD when non-personal types arrive

Labels: design-feedback, organization, ux The organization.org_types table supports more than the seeded 'personal' row — schema includes display_name, description, is_active, default_product_id, and default_plan_ladder_id — but the operator UI deliberately exposes no create/edit affordances. This is the right call today (operator misuse risk; only consumer is the personal-org auto-provisioning flow), but when non-personal org types arrive (a real "organization" type beyond the per-person workspace) the IA needs to grow:

A create/edit surface for org types (with cascade visibility — changing default_product_id does not retroactively backfill; see "Auto-provisioning does not backfill existing orgs when default_product_id changes" above).
A way for org-creation flows (whatever introduces a non-personal org) to pick the org type, instead of the implicit 'personal' default in current code.
Disambiguation in URL/IA: /operator/organizations/... lists actual orgs; /operator/org-types/... lists the type schema.

Until that work is scheduled, M7 should leave the schema alone and not surface CRUD. Discovered during M7 phase 7a (2026-05-08).

Product retirement and Stripe-mapping visibility in operator UI

Labels: design-feedback, billing, ux The operator product UI today has no archive/delete affordance — products that go out of fashion accumulate. Two facts shape what the right answer is:

Local products are 1:1 mapped to Stripe products via stripe.product_mappings(product_id, stripe_product_id, sync_status). Deleting a local product silently breaks that mapping; even if the operator is OK with the local row going away, the Stripe-side product (which Stripe never deletes — it archives) and the mapping row need a coherent story.
M6a already plans a lifecycle_status column on billing.products (draft / published / retired). That is the right primitive: published products are sellable; retired products cannot be granted to new orgs but existing grants survive; draft products are operator-visible only.

What 7b/7c should do:

Surface the Stripe mapping in the product UI — operators need to see which local product maps to which Stripe product, and the sync status of that mapping. Today this relationship is invisible from the panel.
Replace any future "delete product" affordance with a "retire product" action that flips lifecycle_status to retired (gated on whether any active grants exist; prevent retire if so, or offer a clear cascade preview).
Same logic applies to entitlement sets that are referenced by products and to plan ladders that have orgs enrolled.

Discovered during M7 phase 7a (2026-05-08).

Operator panel tabs are not HTMX-idiomatic: stale state and lost position on reload ✓ Resolved

Labels: bug, frontend, ux Each tab pane in the operator panel uses hx-trigger="revealed" to load its partial once on first reveal. After that, content is cached in the DOM and never refreshed — so a change made in one tab (e.g. creating a new entitlement set) is invisible in another tab (e.g. the Products form's entitlement-set dropdown) until the whole page is reloaded. Worse, a page reload always resets to the first tab (Bootstrap JS default), losing the operator's position.

Root causes:

hx-trigger="revealed" fires only on first reveal — no cross-tab invalidation mechanism exists.
Tab state lives entirely in Bootstrap JS memory, not in the URL, so it cannot survive a reload.

HTMX-idiomatic fixes to consider:

URL-based tab state: add hx-push-url (or hx-replace-url) on each tab button so the active tab is reflected in the URL fragment or a query param; on load, scroll/activate the matching tab.
Cross-tab refresh via HTMX events: change trigger to revealed, <custom-htmx-event> so that a mutation in one tab can fire a named event (htmx:trigger) that causes dependent tabs to re-fetch their partial.
Polling or out-of-band swap alternative: for low-frequency mutations, an OOB swap (hx-swap-oob) from the mutating partial can push updated data into sibling containers without a full tab reload.

Plan concept needs depth evaluation before M6 ✓ Resolved

Labels: design-feedback, billing product_type = 'plan' was a label with no behavioral depth — the system treated plans identically to other product types. M5 phase 5c intentionally deferred first-class plan treatment (one-active-plan constraint, upgrade/downgrade rules, plan comparison logic). Discovered during M5 phase 5c exploration (2026-04-11).

Resolved by design commit 732197a (latest design import): product_type='plan' is now structurally enforced via plan_ladders, plan_ladder_tiers, and the product_type='plan' ↔ plan_ladder_id IS NOT NULL CHECK; mutual exclusion per (pool, ladder) is enforced by the GiST exclusion constraint on entitlements.pool_provision_ladders; upgrade/downgrade semantics are expressed through rank. Remaining work (transition primitive, dormant status, transition audit, operator UI) is scheduled as the new Milestone 6 "Plan Management Foundation" — see status/milestones.md. Follow-up design feedback items: "Dormant provision status for supersession" and "provision_transitions table for plan audit history" below.

`pool_provisions.status` needs a `dormant` value for superseded provisions ✓ Superseded

Labels: design-feedback, entitlements, billing The ladder-based mutual-exclusion model in the latest design commit (732197a) enforces at most one active provision per (pool, ladder) via a GiST exclusion constraint, but does not specify what happens to a pre-existing provision when a higher-ranked provision activates. Original proposal was to add a dormant status on pool_provisions so a superseded provision could reversibly sleep and wake up when the superseder ends.

The design team's review (2026-04-18) agreed with the placement argument but refined in two ways: (1) dormancy is not universal — a superseded trial subscription should end rather than sleep, while a superseded baseline grant should dormant; (2) the GiST exclusion predicate must be audited to confirm dormant is treated as vacant. Accommodating (1) required a new pool_provisions.supersession_behavior column plus per-source policy encoded in the transition primitive.

On reconsideration (2026-04-18), this approach was abandoned because the accreting complexity — new status value, new column, per-source policy, GiST predicate audit, canonicalization rules against subscription_changes — exceeded the value over a simpler alternative. Superseded by the "end-and-re-apply on reversal" approach below.

Supersession via end-and-re-apply (no `dormant` status needed)

Labels: design-feedback, entitlements, billing After reviewing the per-source-dormancy complexity that the design team's feedback surfaced, the member-console team concluded that a simpler mechanism without a new provision status is preferable.

Approach:

On supersession (upgrade), the existing provision and its ladder row transition to status='ended'. No new statuses. A new subscription-backed (or higher-ranked) provision and ladder row are created alongside.
On reversal (downgrade), the top provision and its ladder row end. The transition primitive then invokes a new pool-scoped ReapplyDefaultsForPool(ctx, tx, pool_id) operation that reuses the M5a-era auto-provisioning logic: look up the owning org's org_types.default_product_id, call entitlements.CreateGrantInTx(...) with grant_reason='default' to issue a fresh grant + provision + ladder attachment.

Why this is simpler than the dormant approach:

No new pool_provisions.status value and no new columns.
No per-source policy table (trial-vs-baseline distinction disappears because everything ends on supersession; reversal re-applies the policy from org_types rather than reversing a stored state).
No GiST predicate audit burden — ended is already excluded from the constraint whatever the predicate shape.
No plan-stacking semantics pressure — materialization sees at most one active provision per ladder per pool, which is exactly what the GiST constraint already enforces.
Reuses existing entitlements.CreateGrantInTx (already extracted from M5a for in-transaction use).

Required member-console work (scheduled as M6b):

A new ReapplyDefaultsForPool(ctx, tx, pool_id) primitive wrapping CreateGrantInTx. Looks up org via pool → org_id, reads org_types.default_product_id, issues the grant. ~20 lines around existing logic.
The transition primitive delegates to this on any downgrade that leaves the ladder empty.
Audit row in pool_provision_transitions records the transition; see enumeration below for the transition_type selection rules.

ReapplyDefaultsForPool contract (explicit):

If org_types.default_product_id IS NULL: the primitive records a single pool_provision_transitions row with transition_type='end', to_rank=NULL, and returns. No grant, provision, or ladder row is created. The pool is left legitimately off-ladder. The downgrade is still observable in audit; the re-application is a no-op by policy, not a silent swallow.
If default_product_id is set: call entitlements.CreateGrantInTx(...) with that product, grant_reason='default'. Record a pool_provision_transitions row with transition_type='downgrade' (or 'initiate' if the pool had no prior rank) from the superseding rank to the re-applied rank.
Caller (the transition primitive) records the end-of-superseding-provision transition separately; ReapplyDefaultsForPool only records the re-application event itself.

Audit semantics:

A pool that cycles Public → Standard → Public over time accumulates both ended provisions and fresh grant rows. Each grant row carries its own valid_from, granted_by_person_id (NULL for system), grant_reason, and quantity — they are the per-issuance audit record and intentionally distinct from their antecedents.
"What tier was this pool on at time T?" remains reconstructible by joining pool_provisions with pool_provision_ladders on the active window containing T; "who and why at time T?" by joining the chronological pool_provision_transitions record.

Trade-offs — accepted deliberately:

Grant-row accumulation. CreateGrantInTx unconditionally creates a new grant row (see internal/entitlements/grants.go:91). An org cycling Public ↔ Standard five times produces five Public-tier default grant rows in addition to the subscription-derived grants. We considered reusing an earlier matching grant (looking up (org_id, product_id, grant_reason='default') and creating only a new provision + ladder row against it) and rejected it for three reasons: (1) CreateGrantInTx is the single call-site for all grant creation — subscription renewals flow through the same path and also create fresh rows, so special-casing defaults would fragment the code path; (2) each grant is a discrete issuance event with its own metadata, which is the audit shape we want, not a bug to optimize away; (3) operators observe accumulation naturally via the M6e enrollment/transition audit UI and can reason about cycle counts from it. Accepting accumulation is the deliberate choice, not a fallout of call-site convenience.
Provision-row accumulation. Likewise, a new provision row per downgrade cycle (vs. the single reversible row under dormant). For an org that upgrades/downgrades five times, five extra ended provisions. Same audit-value argument as for grants.
Re-applying reads current org_types.default_product_id, not the original grant's product, which means an org that held a custom default grant would be re-applied from the org-type default on downgrade rather than from the original custom grant. Noted explicitly; custom per-org defaults are not currently a supported concept.

Conceptual model reinforced:

Grants are durable catalog entries ("this org is entitled to this product under this policy"); provisions are current materializations of those catalog entries. When conditions change, re-materialize from the catalog.
No new conceptual primitive is introduced; the system stays within the language it already speaks.

Reconsidered on 2026-04-18 after design-team review surfaced the accreting complexity of per-source dormancy.

New `entitlements.pool_provision_transitions` table for plan transition audit history

Labels: design-feedback, entitlements The GLOSSARY update in design commit 732197a frames billing.subscription_changes as "the general audit log for all subscription lifecycle events; not specific to plan transitions," leaving plan-level transitions (grant lifecycle events, grant-to-subscription supersession, repeated tier changes over an org's lifetime) without a dedicated audit record.

Scenarios the existing tables cannot fully reconstruct:

An org cycles Public → Standard → Public → Standard over time. pool_provisions.{activated_at, ended_at} and pool_provision_ladders.{activated_at, ended_at} record per-row activation windows, but reconstructing the chronological enrollment trajectory (with actor attribution and reasons) requires an event log.
A trial grant at rank 1 issued by an operator, later superseded by a paid subscription, is a pure entitlements-lifecycle event and leaves no subscription_changes row.
Operator audit UI ("why is Alice's org on Standard right now?") needs a chronological, actor-attributed record that spans grants, subscriptions, and purchases uniformly.

Proposed: add entitlements.pool_provision_transitions (renamed from provision_transitions per the design team's prefix-convention note to match pool_provision_ladders):

CREATE TABLE entitlements.pool_provision_transitions (
  transition_id    UUID PRIMARY KEY,
  pool_id          UUID NOT NULL REFERENCES entitlements.resource_pools(pool_id),
  provision_id     UUID NOT NULL REFERENCES entitlements.pool_provisions(provision_id),
  plan_ladder_id   UUID REFERENCES billing.plan_ladders(plan_ladder_id),
  from_rank        INTEGER,
  to_rank          INTEGER,
  transition_type  VARCHAR(50) NOT NULL,
  actor_type       VARCHAR(50) NOT NULL,
  actor_id         UUID,
  reason           TEXT,
  effective_at     TIMESTAMPTZ NOT NULL,
  created_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
  CHECK (actor_type != 'operator' OR actor_id IS NOT NULL)
);

Field semantics:

plan_ladder_id NULL for off-ladder transitions (e.g., add-on grant lifecycle).
from_rank NULL for initial attach; to_rank NULL for detach/end.
transition_type is strictly scoped to ladder-position changes: {initiate, upgrade, downgrade, end}. Suspension/resumption lifecycle stays with subscription_changes plus pool_provisions.status; it is intentionally not recorded here.
actor_type ∈ {operator, system, webhook}.
actor_id is required when actor_type = 'operator' (enforced by CHECK); NULL-allowed for system and webhook actors today, with room to populate it if webhook authentication adds actor attribution later.

Canonicalization rule (to be added to GLOSSARY alongside the existing subscription_changes entry):

pool_provision_transitions is canonical for plan-position history of a pool: what tier a pool held, when, who changed it, why.
subscription_changes is canonical for commercial mutations of a subscription: status transitions (trialing, past_due, canceled), period boundaries, amount changes.
Subscription-driven ladder attachments are recorded in both tables — intentionally — because they answer different questions. An operator audit UI should query pool_provision_transitions for enrollment history and subscription_changes for billing lifecycle; neither is a substitute for the other.

Audit module framing:

This table is designed as a view-shaped projection of a future generic audit.log. Its columns map directly to a standard audit shape: {resource_type='pool_provision', resource_id=provision_id, actor_type, actor_id, action=transition_type, occurred_at=effective_at, recorded_at=created_at, payload={pool_id, plan_ladder_id, from_rank, to_rank, reason}}.
Consequence: when the generic audit module graduates from the backlog, absorption is mechanical — a UNION ALL view across pool_provision_transitions (and any peer specialized logs) yields the generic log without semantic rewriting. This table can also be deprecated into a view over audit.log at that point if desired.

Scope and boundaries:

Lives in entitlements because transitions are a pool/provision-level concern; the cross-module reference to billing.plan_ladders follows the same direction already established by products.entitlement_set_id and pool_provisions.subscription_id.
Complements, does not replace, billing.subscription_changes.

Alternatives considered and rejected:

Deriving transition history from timestamps on pool_provisions and pool_provision_ladders: works when provisions are long-lived single rows, but does not capture actor attribution or reason, and quickly loses readability when a pool accumulates multiple ended provisions across upgrade/downgrade cycles.
Extending billing.subscription_changes to be a generic provision_changes table: overloads an existing stable table, conflicts with its scope-statement in the GLOSSARY, and pulls subscription-scoped schema into the entitlements module's primary-key graph.

Discovered during M6 planning exploration (2026-04-18); refined after design-team review (2026-04-18) to add the actor_type='operator' ⇒ actor_id IS NOT NULL CHECK, the prefix-convention rename, the canonicalization rule for GLOSSARY, the explicit ladder-position-only scope for transition_type, and the view-shaped contract for the future audit module. The simplified transition_type enum (initiate | upgrade | downgrade | end) also reflects the supersession approach change above (end-and-re-apply instead of dormant), which removed the need for supersede and reactivate as distinct types.

Auto-provisioning does not backfill existing orgs when `default_product_id` changes

Labels: design-feedback, entitlements, organization Auto-provisioning today runs only on first login during org creation (see internal/provisioning/provisioning.go:163-183). When an operator configures or changes org_types.default_product_id after deployment, pre-existing orgs of that type are unaffected — their pools retain whatever grants (or lack thereof) they had at creation time.

This is acceptable in the narrow M5a scope (new deployments configure default product once before users arrive) but becomes a concrete gap in M6:

Supersession via end-and-re-apply (see entry above) assumes org_types.default_product_id is always populated when a pool downgrades off the ladder. If the operator changes the default mid-deployment, existing orgs that downgrade later get the new default, but orgs that never upgraded never receive any grant adjustment.
Operator UI in M6 needs to communicate this reality and offer an explicit remediation path.

Proposed member-console-level additions (M6d, operator auto-provisioning config UI):

When an operator changes org_types.default_product_id, show a warning: "This applies to newly created orgs only. Existing orgs of this type are unchanged. Click 'Backfill existing orgs' to apply the new default retroactively."
A "Backfill existing orgs" operator action: for each existing org of the type whose default pool does not currently have an active grant-sourced provision on the ladder, invoke the ReapplyDefaultsForPool(ctx, tx, pool_id) primitive. The operation is idempotent with respect to pools already holding an active baseline provision.
Backfill is recorded in pool_provision_transitions with actor_type='operator', transition_type='initiate', and a reason indicating the retroactive application.

Design-layer implication: no schema change is required to support backfill — the operation is a loop over existing pools invoking a primitive that M6b adds. However, the design should explicitly acknowledge that auto-provisioning policy changes are not automatically retroactive, and that the retroactive pathway is an operator-driven action rather than a system trigger.

Discovered during M6 planning exploration (2026-04-18).

38 KiB Raw Blame History