feat(costs): add billing, quota, and budget control plane

2026-03-14 22:00:12 -05:00
parent 656b4659fc
commit 76e6cc08a6
91 changed files with 22406 additions and 769 deletions
--- a/doc/plans/2026-03-14-billing-ledger-and-reporting.md
+++ b/doc/plans/2026-03-14-billing-ledger-and-reporting.md
@@ -0,0 +1,468 @@
+# Billing Ledger and Reporting
+
+## Context
+
+Paperclip currently stores model spend in `cost_events` and operational run state in `heartbeat_runs`.
+That split is fine, but the current reporting code tries to infer billing semantics by mixing both tables:
+
+- `cost_events` knows provider, model, tokens, and dollars
+- `heartbeat_runs.usage_json` knows some per-run billing metadata
+- `heartbeat_runs.usage_json` does **not** currently carry enough normalized billing dimensions to support honest provider-level reporting
+
+This becomes incorrect as soon as a company uses more than one provider, more than one billing channel, or more than one billing mode.
+
+Examples:
+
+- direct OpenAI API usage
+- Claude subscription usage with zero marginal dollars
+- subscription overage with dollars and tokens
+- OpenRouter billing where the biller is OpenRouter but the upstream provider is Anthropic or OpenAI
+
+The system needs to support:
+
+- dollar reporting
+- token reporting
+- subscription-included usage
+- subscription overage
+- direct metered API usage
+- future aggregator billing such as OpenRouter
+
+## Product Decision
+
+`cost_events` becomes the canonical billing and usage ledger for reporting.
+
+`heartbeat_runs` remains an operational execution log. It may keep mirrored billing metadata for debugging and transcripts, but reporting must not reconstruct billing semantics from `heartbeat_runs.usage_json`.
+
+## Decision: One Ledger Or Two
+
+We do **not** need two tables to solve the current PR's problem.
+For request-level inference reporting, `cost_events` is enough if it carries the right dimensions:
+
+- upstream provider
+- biller
+- billing type
+- model
+- token fields
+- billed amount
+
+That is why the first implementation pass extends `cost_events` instead of introducing a second table immediately.
+
+However, if Paperclip needs to account for the full billing surface of aggregators and managed AI platforms, then `cost_events` alone is not enough.
+Some charges are not cleanly representable as a single model inference event:
+
+- account top-ups and credit purchases
+- platform fees charged at purchase time
+- BYOK platform fees that are account-level or threshold-based
+- prepaid credit expirations, refunds, and adjustments
+- provisioned throughput commitments
+- fine-tuning, training, model import, and storage charges
+- gateway logging or other platform overhead that is not attributable to one prompt/response pair
+
+So the decision is:
+
+- near term: keep `cost_events` as the inference and usage ledger
+- next phase: add `finance_events` for non-inference financial events
+
+This is a deliberate split between:
+
+- usage and inference accounting
+- account-level and platform-level financial accounting
+
+That separation keeps request reporting honest without forcing us to fake invoice semantics onto rows that were never request-scoped.
+
+## External Motivation And Sources
+
+The need for this model is not theoretical.
+It follows directly from the billing systems of providers and aggregators Paperclip needs to support.
+
+### OpenRouter
+
+Source URLs:
+
+- https://openrouter.ai/docs/faq#credit-and-billing-systems
+- https://openrouter.ai/pricing
+
+Relevant billing behavior as of March 14, 2026:
+
+- OpenRouter passes through underlying inference pricing and deducts request cost from purchased credits.
+- OpenRouter charges a 5.5% fee with a $0.80 minimum when purchasing credits.
+- Crypto payments are charged a 5% fee.
+- BYOK has its own fee model after a free request threshold.
+- OpenRouter billing is aggregated at the OpenRouter account level even when the upstream provider is Anthropic, OpenAI, Google, or another provider.
+
+Implication for Paperclip:
+
+- request usage belongs in `cost_events`
+- credit purchases, purchase fees, BYOK fees, refunds, and expirations belong in `finance_events`
+- `biller=openrouter` must remain distinct from `provider=anthropic|openai|google|...`
+
+### Cloudflare AI Gateway Unified Billing
+
+Source URL:
+
+- https://developers.cloudflare.com/ai-gateway/features/unified-billing/
+
+Relevant billing behavior as of March 14, 2026:
+
+- Unified Billing lets users call multiple upstream providers while receiving a single Cloudflare bill.
+- Usage is paid from Cloudflare-loaded credits.
+- Cloudflare supports manual top-ups and auto top-up thresholds.
+- Spend limits can stop request processing on daily, weekly, or monthly boundaries.
+- Unified Billing traffic can use Cloudflare-managed credentials rather than the user's direct provider key.
+
+Implication for Paperclip:
+
+- request usage needs `biller=cloudflare`
+- upstream provider still needs to be preserved separately
+- Cloudflare credit loads and related account-level events are not inference rows and should not be forced into `cost_events`
+- quota and limits reporting must support biller-level controls, not just upstream provider limits
+
+### Amazon Bedrock
+
+Source URL:
+
+- https://aws.amazon.com/bedrock/pricing/
+
+Relevant billing behavior as of March 14, 2026:
+
+- Bedrock supports on-demand and batch pricing.
+- Bedrock pricing varies by region.
+- some pricing tiers add premiums or discounts relative to standard pricing
+- provisioned throughput is commitment-based rather than request-based
+- custom model import uses Custom Model Units billed per minute, with monthly storage charges
+- imported model copies are billed in 5-minute windows once active
+- customization and fine-tuning introduce training and hosted-model charges beyond normal inference
+
+Implication for Paperclip:
+
+- normal tokenized inference fits in `cost_events`
+- provisioned throughput, custom model unit charges, training, and storage charges require `finance_events`
+- region and pricing tier need to be first-class dimensions in the financial model
+
+## Ledger Boundary
+
+To keep the system coherent, the table boundary should be explicit.
+
+### `cost_events`
+
+Use `cost_events` for request-scoped usage and inference charges:
+
+- one row per billable or usage-bearing run event
+- provider/model/biller/billingType/tokens/cost
+- optionally tied to `heartbeat_run_id`
+- supports direct APIs, subscriptions, overage, OpenRouter-routed inference, Cloudflare-routed inference, and Bedrock on-demand inference
+
+### `finance_events`
+
+Use `finance_events` for account-scoped or platform-scoped financial events:
+
+- credit purchase
+- top-up
+- refund
+- fee
+- expiry
+- provisioned capacity
+- training
+- model import
+- storage
+- invoice adjustment
+
+These rows may or may not have a related model, provider, or run id.
+Trying to force them into `cost_events` would either create fake request rows or create null-heavy rows that mean something fundamentally different from inference usage.
+
+## Canonical Billing Dimensions
+
+Every persisted billing event should model four separate axes:
+
+1. Usage provider
+   The upstream provider whose model performed the work.
+   Examples: `openai`, `anthropic`, `google`.
+
+2. Biller
+   The system that charged for the usage.
+   Examples: `openai`, `anthropic`, `openrouter`, `cursor`, `chatgpt`.
+
+3. Billing type
+   The pricing mode applied to the event.
+   Initial canonical values:
+   - `metered_api`
+   - `subscription_included`
+   - `subscription_overage`
+   - `credits`
+   - `fixed`
+   - `unknown`
+
+4. Measures
+   Usage and billing must both be storable:
+   - `input_tokens`
+   - `output_tokens`
+   - `cached_input_tokens`
+   - `cost_cents`
+
+These dimensions are independent.
+For example, an event may be:
+
+- provider: `anthropic`
+- biller: `openrouter`
+- billing type: `metered_api`
+- tokens: non-zero
+- cost cents: non-zero
+
+Or:
+
+- provider: `anthropic`
+- biller: `anthropic`
+- billing type: `subscription_included`
+- tokens: non-zero
+- cost cents: `0`
+
+## Schema Changes
+
+Extend `cost_events` with:
+
+- `heartbeat_run_id uuid null references heartbeat_runs.id`
+- `biller text not null default 'unknown'`
+- `billing_type text not null default 'unknown'`
+- `cached_input_tokens int not null default 0`
+
+Keep `provider` as the upstream usage provider.
+Do not overload `provider` to mean biller.
+
+Add a future `finance_events` table for account-level financial events with fields along these lines:
+
+- `company_id`
+- `occurred_at`
+- `event_kind`
+- `direction`
+- `biller`
+- `provider nullable`
+- `execution_adapter_type nullable`
+- `pricing_tier nullable`
+- `region nullable`
+- `model nullable`
+- `quantity nullable`
+- `unit nullable`
+- `amount_cents`
+- `currency`
+- `estimated`
+- `related_cost_event_id nullable`
+- `related_heartbeat_run_id nullable`
+- `external_invoice_id nullable`
+- `metadata_json nullable`
+
+Add indexes:
+
+- `(company_id, biller, occurred_at)`
+- `(company_id, provider, occurred_at)`
+- `(company_id, heartbeat_run_id)` if distinct-run reporting remains common
+
+## Shared Contract Changes
+
+### Shared types
+
+Add a shared billing type union and enrich cost types with:
+
+- `heartbeatRunId`
+- `biller`
+- `billingType`
+- `cachedInputTokens`
+
+Update reporting response types so the provider breakdown reflects the ledger directly rather than inferred run metadata.
+
+### Validators
+
+Extend `createCostEventSchema` to accept:
+
+- `heartbeatRunId`
+- `biller`
+- `billingType`
+- `cachedInputTokens`
+
+Defaults:
+
+- `biller` defaults to `provider`
+- `billingType` defaults to `unknown`
+- `cachedInputTokens` defaults to `0`
+
+## Adapter Contract Changes
+
+Extend adapter execution results so they can report:
+
+- `biller`
+- richer billing type values
+
+Backwards compatibility:
+
+- existing adapter values `api` and `subscription` are treated as legacy aliases
+- map `api -> metered_api`
+- map `subscription -> subscription_included`
+
+Future adapters may emit the canonical values directly.
+
+OpenRouter support will use:
+
+- `provider` = upstream provider when known
+- `biller` = `openrouter`
+- `billingType` = `metered_api` unless OpenRouter later exposes another billing mode
+
+Cloudflare Unified Billing support will use:
+
+- `provider` = upstream provider when known
+- `biller` = `cloudflare`
+- `billingType` = `credits` or `metered_api` depending on the normalized request billing contract
+
+Bedrock support will use:
+
+- `provider` = upstream provider or `aws_bedrock` depending on adapter shape
+- `biller` = `aws_bedrock`
+- `billingType` = request-scoped mode for inference rows
+- `finance_events` for provisioned, training, import, and storage charges
+
+## Write Path Changes
+
+### Heartbeat-created events
+
+When a heartbeat run produces usage or spend:
+
+1. normalize adapter billing metadata
+2. write a ledger row to `cost_events`
+3. attach `heartbeat_run_id`
+4. set `provider`, `biller`, `billing_type`, token fields, and `cost_cents`
+
+The write path should no longer depend on later inference from `heartbeat_runs`.
+
+### Manual API-created events
+
+Manual cost event creation remains supported.
+These events may have `heartbeatRunId = null`.
+
+Rules:
+
+- `provider` remains required
+- `biller` defaults to `provider`
+- `billingType` defaults to `unknown`
+
+## Reporting Changes
+
+### Server
+
+Refactor reporting queries to use `cost_events` only.
+
+#### `summary`
+
+- sum `cost_cents`
+
+#### `by-agent`
+
+- sum costs and token fields from `cost_events`
+- use `count(distinct heartbeat_run_id)` filtered by billing type for run counts
+- use token sums filtered by billing type for subscription usage
+
+#### `by-provider`
+
+- group by `provider`, `model`
+- sum costs and token fields directly from the ledger
+- derive billing-type slices from `cost_events.billing_type`
+- never pro-rate from unrelated `heartbeat_runs`
+
+#### future `by-biller`
+
+- group by `biller`
+- this is the right view for invoice and subscription accountability
+
+#### `window-spend`
+
+- continue to use `cost_events`
+
+#### project attribution
+
+Keep current project attribution logic for now, but prefer `cost_events.heartbeat_run_id` as the join anchor whenever possible.
+
+## UI Changes
+
+### Principles
+
+- Spend, usage, and quota are related but distinct
+- a missing quota fetch is not the same as “no quota”
+- provider and biller are different dimensions
+
+### Immediate UI changes
+
+1. Keep the current costs page structure.
+2. Make the provider cards accurate by reading only ledger-backed values.
+3. Show provider quota fetch errors explicitly instead of dropping them.
+
+### Follow-up UI direction
+
+The long-term board UI should expose:
+
+- Spend
+  Dollars by biller, provider, model, agent, project
+- Usage
+  Tokens by provider, model, agent, project
+- Quotas
+  Live provider or biller limits, credits, and reset windows
+- Financial events
+  Credit purchases, top-ups, fees, refunds, commitments, storage, and other non-inference charges
+
+## Migration Plan
+
+Migration behavior:
+
+- add new non-destructive columns with defaults
+- backfill existing rows:
+  - `biller = provider`
+  - `billing_type = 'unknown'`
+  - `cached_input_tokens = 0`
+  - `heartbeat_run_id = null`
+
+Do **not** attempt to backfill historical provider-level subscription attribution from `heartbeat_runs`.
+That data was never stored with the required dimensions.
+
+## Testing Plan
+
+Add or update tests for:
+
+1. heartbeat-created ledger rows persist `heartbeatRunId`, `biller`, `billingType`, and cached tokens
+2. legacy adapter billing values map correctly
+3. provider reporting uses ledger data only
+4. mixed-provider companies do not cross-attribute subscription usage
+5. zero-dollar subscription usage still appears in token reporting
+6. quota fetch failures render explicit UI state
+7. manual cost events still validate and write correctly
+8. biller reporting keeps upstream provider breakdowns separate
+9. OpenRouter-style rows can show `biller=openrouter` with non-OpenRouter upstream providers
+10. Cloudflare-style rows can show `biller=cloudflare` with preserved upstream provider identity
+11. future `finance_events` aggregation handles non-request charges without requiring a model or run id
+
+## Delivery Plan
+
+### Step 1
+
+- land the ledger contract and query rewrite
+- make the current costs page correct
+
+### Step 2
+
+- add biller-oriented reporting endpoints and UI
+
+### Step 3
+
+- wire OpenRouter and any future aggregator adapters to the same contract
+
+### Step 4
+
+- add `executionAdapterType` to persisted cost reporting if adapter-level grouping becomes a product requirement
+
+### Step 5
+
+- introduce `finance_events`
+- add non-inference accounting endpoints
+- add UI for platform/account charges alongside inference spend and usage
+
+## Non-Goals For This Change
+
+- multi-currency support
+- invoice reconciliation
+- provider-specific cost estimation beyond persisted billed cost
+- replacing `heartbeat_runs` as the operational run record
--- a/doc/plans/2026-03-14-budget-policies-and-enforcement.md
+++ b/doc/plans/2026-03-14-budget-policies-and-enforcement.md
@@ -0,0 +1,611 @@
+# Budget Policies and Enforcement
+
+## Context
+
+Paperclip already treats budgets as a core control-plane responsibility:
+
+- `doc/SPEC.md` gives the Board authority to set budgets, pause agents, pause work, and override any budget.
+- `doc/SPEC-implementation.md` says V1 must support monthly UTC budget windows, soft alerts, and hard auto-pause.
+- the current code only partially implements that intent.
+
+Today the system has narrow money-budget behavior:
+
+- companies track `budgetMonthlyCents` and `spentMonthlyCents`
+- agents track `budgetMonthlyCents` and `spentMonthlyCents`
+- `cost_events` ingestion increments those counters
+- when an agent exceeds its monthly budget, the agent is paused
+
+That leaves major product gaps:
+
+- no project budget model
+- no approval generated when budget is hit
+- no generic budget policy system
+- no project pause semantics tied to budget
+- no durable incident tracking to prevent duplicate alerts
+- no separation between enforceable spend budgets and advisory usage quotas
+
+This plan defines the concrete budgeting model Paperclip should implement next.
+
+## Product Goals
+
+Paperclip should let operators:
+
+1. Set budgets on agents and projects.
+2. Understand whether a budget is based on money or usage.
+3. Be warned before a budget is exhausted.
+4. Automatically pause work when a hard budget is hit.
+5. Approve, raise, or resume from a budget stop using obvious UI.
+6. See budget state on the dashboard, `/costs`, and scope detail pages.
+
+The system should make one thing very clear:
+
+- budgets are policy controls
+- quotas are usage visibility
+
+They are related, but they are not the same concept.
+
+## Product Decisions
+
+### V1 Budget Defaults
+
+For the next implementation pass, Paperclip should enforce these defaults:
+
+- agent budgets are recurring monthly budgets
+- project budgets are lifetime total budgets
+- hard-stop enforcement uses billed dollars, not tokens
+- monthly windows use UTC calendar months
+- project total budgets do not reset automatically
+
+This gives a clean mental model:
+
+- agents are ongoing workers, so monthly recurring budget is natural
+- projects are bounded workstreams, so lifetime cap is natural
+
+### Metric To Enforce First
+
+The first enforceable metric should be `billed_cents`.
+
+Reasoning:
+
+- it works across providers, billers, and models
+- it maps directly to real financial risk
+- it handles overage and metered usage consistently
+- it avoids cross-provider token normalization problems
+- it applies cleanly even when future finance events are not token-based
+
+Token budgets should not be the first hard-stop policy.
+They should come later as advisory usage controls once the money-based system is solid.
+
+### Subscription Usage Decision
+
+Paperclip should separate subscription-included usage from billed spend:
+
+- `subscription_included`
+  - visible in reporting
+  - visible in usage summaries
+  - does not count against money budget
+- `subscription_overage`
+  - visible in reporting
+  - counts against money budget
+- `metered_api`
+  - visible in reporting
+  - counts against money budget
+
+This keeps the budget system honest:
+
+- users should not see "spend" rise for usage that did not incur marginal billed cost
+- users should still see the token usage and provider quota state
+
+### Soft Alert Versus Hard Stop
+
+Paperclip should have two threshold classes:
+
+- soft alert
+  - creates visible notification state
+  - does not create an approval
+  - does not pause work
+- hard stop
+  - pauses the affected scope automatically
+  - creates an approval requiring human resolution
+  - prevents additional heartbeats or task pickup in that scope
+
+Default thresholds:
+
+- soft alert at `80%`
+- hard stop at `100%`
+
+These should be configurable per policy later, but they are good defaults now.
+
+## Scope Model
+
+### Supported Scope Types
+
+Budget policies should support:
+
+- `company`
+- `agent`
+- `project`
+
+This plan focuses on finishing `agent` and `project` first while preserving the existing company budget behavior.
+
+### Recommended V1.5 Policy Presets
+
+- Company
+  - metric: `billed_cents`
+  - window: `calendar_month_utc`
+- Agent
+  - metric: `billed_cents`
+  - window: `calendar_month_utc`
+- Project
+  - metric: `billed_cents`
+  - window: `lifetime`
+
+Future extensions can add:
+
+- token advisory policies
+- daily or weekly spend windows
+- provider- or biller-scoped budgets
+- inherited delegated budgets down the org tree
+
+## Current Implementation Baseline
+
+The current codebase is not starting from zero, but the existing shape is too ad hoc to extend safely.
+
+### What Exists Today
+
+- company and agent monthly cents counters
+- cost ingestion that updates those counters
+- agent hard-stop pause on monthly budget overrun
+
+### What Is Missing
+
+- project budgets
+- generic budget policy persistence
+- generic threshold crossing detection
+- incident deduplication per scope/window
+- approval creation on hard-stop
+- project execution blocking
+- budget timeline and incident UI
+- distinction between advisory quota and enforceable budget
+
+## Proposed Data Model
+
+### 1. `budget_policies`
+
+Create a new table for canonical budget definitions.
+
+Suggested fields:
+
+- `id`
+- `company_id`
+- `scope_type`
+- `scope_id`
+- `metric`
+- `window_kind`
+- `amount`
+- `warn_percent`
+- `hard_stop_enabled`
+- `notify_enabled`
+- `is_active`
+- `created_by_user_id`
+- `updated_by_user_id`
+- `created_at`
+- `updated_at`
+
+Notes:
+
+- `scope_type` is one of `company | agent | project`
+- `scope_id` is nullable only for company-level policy if company is implied; otherwise keep it explicit
+- `metric` should start with `billed_cents`
+- `window_kind` starts with `calendar_month_utc | lifetime`
+- `amount` is stored in the natural unit of the metric
+
+### 2. `budget_incidents`
+
+Create a durable record of threshold crossings.
+
+Suggested fields:
+
+- `id`
+- `company_id`
+- `policy_id`
+- `scope_type`
+- `scope_id`
+- `metric`
+- `window_kind`
+- `window_start`
+- `window_end`
+- `threshold_type`
+- `amount_limit`
+- `amount_observed`
+- `status`
+- `approval_id` nullable
+- `activity_id` nullable
+- `resolved_at` nullable
+- `created_at`
+- `updated_at`
+
+Notes:
+
+- `threshold_type`: `soft | hard`
+- `status`: `open | acknowledged | resolved | dismissed`
+- one open incident per policy per threshold per window prevents duplicate approvals and alert spam
+
+### 3. Project Pause State
+
+Projects need explicit pause semantics.
+
+Recommended approach:
+
+- extend project status or add a pause field so a project can be blocked by budget
+- preserve whether the project is paused due to budget versus manually paused
+
+Preferred shape:
+
+- keep project workflow status as-is
+- add execution-state fields:
+  - `execution_status`: `active | paused | archived`
+  - `pause_reason`: `manual | budget | system | null`
+
+If that is too large for the immediate pass, a smaller version is:
+
+- add `paused_at`
+- add `pause_reason`
+
+The key requirement is behavioral, not cosmetic:
+Paperclip must know that a project is budget-paused and enforce it.
+
+### 4. Compatibility With Existing Budget Columns
+
+Existing company and agent monthly budget columns should remain temporarily for compatibility.
+
+Migration plan:
+
+1. keep reading existing columns during transition
+2. create equivalent `budget_policies` rows
+3. switch enforcement and UI to policies
+4. later remove or deprecate legacy columns
+
+## Budget Engine
+
+Budget enforcement should move into a dedicated service.
+
+Current logic is buried inside cost ingestion.
+That is too narrow because budget checks must apply at more than one execution boundary.
+
+### Responsibilities
+
+New service: `budgetService`
+
+Responsibilities:
+
+- resolve applicable policies for a cost event
+- compute current window totals
+- detect threshold crossings
+- create incidents, activities, and approvals
+- pause affected scopes on hard-stop
+- provide preflight enforcement checks for execution entry points
+
+### Canonical Evaluation Flow
+
+When a new `cost_event` is written:
+
+1. persist the `cost_event`
+2. identify affected scopes
+   - company
+   - agent
+   - project
+3. fetch active policies for those scopes
+4. compute current observed amount for each policy window
+5. compare to thresholds
+6. create soft incident if soft threshold crossed for first time in window
+7. create hard incident if hard threshold crossed for first time in window
+8. if hard incident:
+   - pause the scope
+   - create approval
+   - create activity event
+   - emit notification state
+
+### Preflight Enforcement Checks
+
+Budget enforcement cannot rely only on post-hoc cost ingestion.
+
+Paperclip must also block execution before new work starts.
+
+Add budget checks to:
+
+- scheduler heartbeat dispatch
+- manual invoke endpoints
+- assignment-driven wakeups
+- queued run promotion
+- issue checkout or pickup paths where applicable
+
+If a scope is budget-paused:
+
+- do not start a new heartbeat
+- do not let the agent pick up additional work
+- present a clear reason in API and UI
+
+### Active Run Behavior
+
+When a hard-stop is triggered while a run is already active:
+
+- mark scope paused immediately for future work
+- request graceful cancellation of the current run
+- allow normal cancellation timeout behavior
+- write activity explaining that pause came from budget enforcement
+
+This mirrors the general pause semantics already expected by the product.
+
+## Approval Model
+
+Budget hard-stops should create a first-class approval.
+
+### New Approval Type
+
+Add approval type:
+
+- `budget_override_required`
+
+Payload should include:
+
+- `scopeType`
+- `scopeId`
+- `scopeName`
+- `metric`
+- `windowKind`
+- `thresholdType`
+- `budgetAmount`
+- `observedAmount`
+- `windowStart`
+- `windowEnd`
+- `topDrivers`
+- `paused`
+
+### Resolution Actions
+
+The approval UI should support:
+
+- raise budget and resume
+- resume once without changing policy
+- keep paused
+
+Optional later action:
+
+- disable budget policy
+
+### Soft Alerts Do Not Need Approval
+
+Soft alerts should create:
+
+- activity event
+- dashboard alert
+- inbox notification or similar board-visible signal
+
+They should not create an approval by default.
+
+## Notification And Activity Model
+
+Budget events need obvious operator visibility.
+
+Required outputs:
+
+- activity log entry on threshold crossings
+- dashboard surface for active budget incidents
+- detail page banner on paused agent or project
+- `/costs` summary of active incidents and policy health
+
+Later channels:
+
+- email
+- webhook
+- Slack or other integrations
+
+## API Plan
+
+### Policy Management
+
+Add routes for:
+
+- list budget policies for company
+- create budget policy
+- update budget policy
+- archive or disable budget policy
+
+### Incident Surfaces
+
+Add routes for:
+
+- list active budget incidents
+- list incident history
+- get incident detail for a scope
+
+### Approval Resolution
+
+Budget approvals should use the existing approval system once the new approval type is added.
+
+Expected flows:
+
+- create approval on hard-stop
+- resolve approval by changing policy and resuming
+- resolve approval by resuming once
+
+### Execution Errors
+
+When work is blocked by budget, the API should return explicit errors.
+
+Examples:
+
+- agent invocation blocked because agent budget is paused
+- issue execution blocked because project budget is paused
+
+Do not silently no-op.
+
+## UI Plan
+
+Budgeting should be visible in the places where operators make decisions.
+
+### `/costs`
+
+Add a budget section that includes:
+
+- active budget incidents
+- policy list with scope, window, metric, and threshold state
+- progress bars for current period or total
+- clear distinction between:
+  - spend budget
+  - subscription quota
+- quick actions:
+  - raise budget
+  - open approval
+  - resume scope if permitted
+
+The page should make this visual distinction obvious:
+
+- Budget
+  - enforceable spend policy
+- Quota
+  - provider or subscription usage window
+
+### Agent Detail
+
+Add an agent budget card:
+
+- monthly budget amount
+- current month spend
+- remaining spend
+- status
+- warning or paused banner
+- link to approval if blocked
+
+### Project Detail
+
+Add a project budget card:
+
+- total budget amount
+- total spend to date
+- remaining spend
+- pause status
+- approval link
+
+Project detail should also show if issue execution is blocked because the project is budget-paused.
+
+### Dashboard
+
+Add a high-signal budget section:
+
+- active budget breaches
+- upcoming soft alerts
+- counts of paused agents and paused projects due to budget
+
+The operator should not have to visit `/costs` to learn that work has stopped.
+
+## Budget Math
+
+### What Counts Toward Budget
+
+For V1.5 enforcement, include:
+
+- `metered_api` cost events
+- `subscription_overage` cost events
+- any future request-scoped cost event with non-zero billed cents
+
+Do not include:
+
+- `subscription_included` cost events with zero billed cents
+- advisory quota rows
+- account-level finance events unless and until company-level financial budgets are added explicitly
+
+### Why Not Tokens First
+
+Token budgets should not be the first hard-stop because:
+
+- providers count tokens differently
+- cached tokens complicate simple totals
+- some future charges are not token-based
+- subscription tokens do not necessarily imply spend
+- money remains the cleanest cross-provider enforcement metric
+
+### Future Budget Metrics
+
+Future policy metrics can include:
+
+- `total_tokens`
+- `input_tokens`
+- `output_tokens`
+- `requests`
+- `finance_amount_cents`
+
+But they should enter only after the money-budget path is stable.
+
+## Migration Plan
+
+### Phase 1: Foundation
+
+- add `budget_policies`
+- add `budget_incidents`
+- add new approval type
+- add project pause metadata
+
+### Phase 2: Compatibility
+
+- backfill policies from existing company and agent monthly budget columns
+- keep legacy columns readable during migration
+
+### Phase 3: Enforcement
+
+- move budget logic into dedicated service
+- add hard-stop incident creation
+- add activity and approval creation
+- add execution guards on heartbeat and invoke paths
+
+### Phase 4: UI
+
+- `/costs` budget section
+- agent detail budget card
+- project detail budget card
+- dashboard incident summary
+
+### Phase 5: Cleanup
+
+- move all reads/writes to `budget_policies`
+- reduce legacy column reliance
+- decide whether to remove old budget columns
+
+## Tests
+
+Required coverage:
+
+- agent monthly budget soft alert at 80%
+- agent monthly budget hard-stop at 100%
+- project lifetime budget soft alert
+- project lifetime budget hard-stop
+- `subscription_included` usage does not consume money budget
+- `subscription_overage` does consume money budget
+- hard-stop creates one incident per threshold per window
+- hard-stop creates approval and pauses correct scope
+- paused project blocks new issue execution
+- paused agent blocks new heartbeat dispatch
+- policy update and resume clears or resolves active incident correctly
+- dashboard and `/costs` surface active incidents
+
+## Open Questions
+
+These should be explicitly deferred unless they block implementation:
+
+- Should project budgets also support monthly mode, or is lifetime enough for the first release?
+- Should company-level budgets eventually include `finance_events` such as OpenRouter top-up fees and Bedrock provisioned charges?
+- Should delegated budget editing be limited by org hierarchy in V1, or remain board-only in the UI even if the data model can support delegation later?
+- Do we need "resume once" immediately, or can first approval resolution be "raise budget and resume" plus "keep paused"?
+
+## Recommendation
+
+Implement the first coherent budgeting system with these rules:
+
+- Agent budget = monthly billed dollars
+- Project budget = lifetime billed dollars
+- Hard-stop = auto-pause + approval
+- Soft alert = visible warning, no approval
+- Subscription usage = visible quota and token reporting, not money-budget enforcement
+
+This solves the real operator problem without mixing together spend control, provider quota windows, and token accounting.