feat(costs): add billing, quota, and budget control plane
This commit is contained in:
468
doc/plans/2026-03-14-billing-ledger-and-reporting.md
Normal file
468
doc/plans/2026-03-14-billing-ledger-and-reporting.md
Normal file
@@ -0,0 +1,468 @@
|
||||
# Billing Ledger and Reporting
|
||||
|
||||
## Context
|
||||
|
||||
Paperclip currently stores model spend in `cost_events` and operational run state in `heartbeat_runs`.
|
||||
That split is fine, but the current reporting code tries to infer billing semantics by mixing both tables:
|
||||
|
||||
- `cost_events` knows provider, model, tokens, and dollars
|
||||
- `heartbeat_runs.usage_json` knows some per-run billing metadata
|
||||
- `heartbeat_runs.usage_json` does **not** currently carry enough normalized billing dimensions to support honest provider-level reporting
|
||||
|
||||
This becomes incorrect as soon as a company uses more than one provider, more than one billing channel, or more than one billing mode.
|
||||
|
||||
Examples:
|
||||
|
||||
- direct OpenAI API usage
|
||||
- Claude subscription usage with zero marginal dollars
|
||||
- subscription overage with dollars and tokens
|
||||
- OpenRouter billing where the biller is OpenRouter but the upstream provider is Anthropic or OpenAI
|
||||
|
||||
The system needs to support:
|
||||
|
||||
- dollar reporting
|
||||
- token reporting
|
||||
- subscription-included usage
|
||||
- subscription overage
|
||||
- direct metered API usage
|
||||
- future aggregator billing such as OpenRouter
|
||||
|
||||
## Product Decision
|
||||
|
||||
`cost_events` becomes the canonical billing and usage ledger for reporting.
|
||||
|
||||
`heartbeat_runs` remains an operational execution log. It may keep mirrored billing metadata for debugging and transcripts, but reporting must not reconstruct billing semantics from `heartbeat_runs.usage_json`.
|
||||
|
||||
## Decision: One Ledger Or Two
|
||||
|
||||
We do **not** need two tables to solve the current PR's problem.
|
||||
For request-level inference reporting, `cost_events` is enough if it carries the right dimensions:
|
||||
|
||||
- upstream provider
|
||||
- biller
|
||||
- billing type
|
||||
- model
|
||||
- token fields
|
||||
- billed amount
|
||||
|
||||
That is why the first implementation pass extends `cost_events` instead of introducing a second table immediately.
|
||||
|
||||
However, if Paperclip needs to account for the full billing surface of aggregators and managed AI platforms, then `cost_events` alone is not enough.
|
||||
Some charges are not cleanly representable as a single model inference event:
|
||||
|
||||
- account top-ups and credit purchases
|
||||
- platform fees charged at purchase time
|
||||
- BYOK platform fees that are account-level or threshold-based
|
||||
- prepaid credit expirations, refunds, and adjustments
|
||||
- provisioned throughput commitments
|
||||
- fine-tuning, training, model import, and storage charges
|
||||
- gateway logging or other platform overhead that is not attributable to one prompt/response pair
|
||||
|
||||
So the decision is:
|
||||
|
||||
- near term: keep `cost_events` as the inference and usage ledger
|
||||
- next phase: add `finance_events` for non-inference financial events
|
||||
|
||||
This is a deliberate split between:
|
||||
|
||||
- usage and inference accounting
|
||||
- account-level and platform-level financial accounting
|
||||
|
||||
That separation keeps request reporting honest without forcing us to fake invoice semantics onto rows that were never request-scoped.
|
||||
|
||||
## External Motivation And Sources
|
||||
|
||||
The need for this model is not theoretical.
|
||||
It follows directly from the billing systems of providers and aggregators Paperclip needs to support.
|
||||
|
||||
### OpenRouter
|
||||
|
||||
Source URLs:
|
||||
|
||||
- https://openrouter.ai/docs/faq#credit-and-billing-systems
|
||||
- https://openrouter.ai/pricing
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- OpenRouter passes through underlying inference pricing and deducts request cost from purchased credits.
|
||||
- OpenRouter charges a 5.5% fee with a $0.80 minimum when purchasing credits.
|
||||
- Crypto payments are charged a 5% fee.
|
||||
- BYOK has its own fee model after a free request threshold.
|
||||
- OpenRouter billing is aggregated at the OpenRouter account level even when the upstream provider is Anthropic, OpenAI, Google, or another provider.
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- request usage belongs in `cost_events`
|
||||
- credit purchases, purchase fees, BYOK fees, refunds, and expirations belong in `finance_events`
|
||||
- `biller=openrouter` must remain distinct from `provider=anthropic|openai|google|...`
|
||||
|
||||
### Cloudflare AI Gateway Unified Billing
|
||||
|
||||
Source URL:
|
||||
|
||||
- https://developers.cloudflare.com/ai-gateway/features/unified-billing/
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- Unified Billing lets users call multiple upstream providers while receiving a single Cloudflare bill.
|
||||
- Usage is paid from Cloudflare-loaded credits.
|
||||
- Cloudflare supports manual top-ups and auto top-up thresholds.
|
||||
- Spend limits can stop request processing on daily, weekly, or monthly boundaries.
|
||||
- Unified Billing traffic can use Cloudflare-managed credentials rather than the user's direct provider key.
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- request usage needs `biller=cloudflare`
|
||||
- upstream provider still needs to be preserved separately
|
||||
- Cloudflare credit loads and related account-level events are not inference rows and should not be forced into `cost_events`
|
||||
- quota and limits reporting must support biller-level controls, not just upstream provider limits
|
||||
|
||||
### Amazon Bedrock
|
||||
|
||||
Source URL:
|
||||
|
||||
- https://aws.amazon.com/bedrock/pricing/
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- Bedrock supports on-demand and batch pricing.
|
||||
- Bedrock pricing varies by region.
|
||||
- some pricing tiers add premiums or discounts relative to standard pricing
|
||||
- provisioned throughput is commitment-based rather than request-based
|
||||
- custom model import uses Custom Model Units billed per minute, with monthly storage charges
|
||||
- imported model copies are billed in 5-minute windows once active
|
||||
- customization and fine-tuning introduce training and hosted-model charges beyond normal inference
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- normal tokenized inference fits in `cost_events`
|
||||
- provisioned throughput, custom model unit charges, training, and storage charges require `finance_events`
|
||||
- region and pricing tier need to be first-class dimensions in the financial model
|
||||
|
||||
## Ledger Boundary
|
||||
|
||||
To keep the system coherent, the table boundary should be explicit.
|
||||
|
||||
### `cost_events`
|
||||
|
||||
Use `cost_events` for request-scoped usage and inference charges:
|
||||
|
||||
- one row per billable or usage-bearing run event
|
||||
- provider/model/biller/billingType/tokens/cost
|
||||
- optionally tied to `heartbeat_run_id`
|
||||
- supports direct APIs, subscriptions, overage, OpenRouter-routed inference, Cloudflare-routed inference, and Bedrock on-demand inference
|
||||
|
||||
### `finance_events`
|
||||
|
||||
Use `finance_events` for account-scoped or platform-scoped financial events:
|
||||
|
||||
- credit purchase
|
||||
- top-up
|
||||
- refund
|
||||
- fee
|
||||
- expiry
|
||||
- provisioned capacity
|
||||
- training
|
||||
- model import
|
||||
- storage
|
||||
- invoice adjustment
|
||||
|
||||
These rows may or may not have a related model, provider, or run id.
|
||||
Trying to force them into `cost_events` would either create fake request rows or create null-heavy rows that mean something fundamentally different from inference usage.
|
||||
|
||||
## Canonical Billing Dimensions
|
||||
|
||||
Every persisted billing event should model four separate axes:
|
||||
|
||||
1. Usage provider
|
||||
The upstream provider whose model performed the work.
|
||||
Examples: `openai`, `anthropic`, `google`.
|
||||
|
||||
2. Biller
|
||||
The system that charged for the usage.
|
||||
Examples: `openai`, `anthropic`, `openrouter`, `cursor`, `chatgpt`.
|
||||
|
||||
3. Billing type
|
||||
The pricing mode applied to the event.
|
||||
Initial canonical values:
|
||||
- `metered_api`
|
||||
- `subscription_included`
|
||||
- `subscription_overage`
|
||||
- `credits`
|
||||
- `fixed`
|
||||
- `unknown`
|
||||
|
||||
4. Measures
|
||||
Usage and billing must both be storable:
|
||||
- `input_tokens`
|
||||
- `output_tokens`
|
||||
- `cached_input_tokens`
|
||||
- `cost_cents`
|
||||
|
||||
These dimensions are independent.
|
||||
For example, an event may be:
|
||||
|
||||
- provider: `anthropic`
|
||||
- biller: `openrouter`
|
||||
- billing type: `metered_api`
|
||||
- tokens: non-zero
|
||||
- cost cents: non-zero
|
||||
|
||||
Or:
|
||||
|
||||
- provider: `anthropic`
|
||||
- biller: `anthropic`
|
||||
- billing type: `subscription_included`
|
||||
- tokens: non-zero
|
||||
- cost cents: `0`
|
||||
|
||||
## Schema Changes
|
||||
|
||||
Extend `cost_events` with:
|
||||
|
||||
- `heartbeat_run_id uuid null references heartbeat_runs.id`
|
||||
- `biller text not null default 'unknown'`
|
||||
- `billing_type text not null default 'unknown'`
|
||||
- `cached_input_tokens int not null default 0`
|
||||
|
||||
Keep `provider` as the upstream usage provider.
|
||||
Do not overload `provider` to mean biller.
|
||||
|
||||
Add a future `finance_events` table for account-level financial events with fields along these lines:
|
||||
|
||||
- `company_id`
|
||||
- `occurred_at`
|
||||
- `event_kind`
|
||||
- `direction`
|
||||
- `biller`
|
||||
- `provider nullable`
|
||||
- `execution_adapter_type nullable`
|
||||
- `pricing_tier nullable`
|
||||
- `region nullable`
|
||||
- `model nullable`
|
||||
- `quantity nullable`
|
||||
- `unit nullable`
|
||||
- `amount_cents`
|
||||
- `currency`
|
||||
- `estimated`
|
||||
- `related_cost_event_id nullable`
|
||||
- `related_heartbeat_run_id nullable`
|
||||
- `external_invoice_id nullable`
|
||||
- `metadata_json nullable`
|
||||
|
||||
Add indexes:
|
||||
|
||||
- `(company_id, biller, occurred_at)`
|
||||
- `(company_id, provider, occurred_at)`
|
||||
- `(company_id, heartbeat_run_id)` if distinct-run reporting remains common
|
||||
|
||||
## Shared Contract Changes
|
||||
|
||||
### Shared types
|
||||
|
||||
Add a shared billing type union and enrich cost types with:
|
||||
|
||||
- `heartbeatRunId`
|
||||
- `biller`
|
||||
- `billingType`
|
||||
- `cachedInputTokens`
|
||||
|
||||
Update reporting response types so the provider breakdown reflects the ledger directly rather than inferred run metadata.
|
||||
|
||||
### Validators
|
||||
|
||||
Extend `createCostEventSchema` to accept:
|
||||
|
||||
- `heartbeatRunId`
|
||||
- `biller`
|
||||
- `billingType`
|
||||
- `cachedInputTokens`
|
||||
|
||||
Defaults:
|
||||
|
||||
- `biller` defaults to `provider`
|
||||
- `billingType` defaults to `unknown`
|
||||
- `cachedInputTokens` defaults to `0`
|
||||
|
||||
## Adapter Contract Changes
|
||||
|
||||
Extend adapter execution results so they can report:
|
||||
|
||||
- `biller`
|
||||
- richer billing type values
|
||||
|
||||
Backwards compatibility:
|
||||
|
||||
- existing adapter values `api` and `subscription` are treated as legacy aliases
|
||||
- map `api -> metered_api`
|
||||
- map `subscription -> subscription_included`
|
||||
|
||||
Future adapters may emit the canonical values directly.
|
||||
|
||||
OpenRouter support will use:
|
||||
|
||||
- `provider` = upstream provider when known
|
||||
- `biller` = `openrouter`
|
||||
- `billingType` = `metered_api` unless OpenRouter later exposes another billing mode
|
||||
|
||||
Cloudflare Unified Billing support will use:
|
||||
|
||||
- `provider` = upstream provider when known
|
||||
- `biller` = `cloudflare`
|
||||
- `billingType` = `credits` or `metered_api` depending on the normalized request billing contract
|
||||
|
||||
Bedrock support will use:
|
||||
|
||||
- `provider` = upstream provider or `aws_bedrock` depending on adapter shape
|
||||
- `biller` = `aws_bedrock`
|
||||
- `billingType` = request-scoped mode for inference rows
|
||||
- `finance_events` for provisioned, training, import, and storage charges
|
||||
|
||||
## Write Path Changes
|
||||
|
||||
### Heartbeat-created events
|
||||
|
||||
When a heartbeat run produces usage or spend:
|
||||
|
||||
1. normalize adapter billing metadata
|
||||
2. write a ledger row to `cost_events`
|
||||
3. attach `heartbeat_run_id`
|
||||
4. set `provider`, `biller`, `billing_type`, token fields, and `cost_cents`
|
||||
|
||||
The write path should no longer depend on later inference from `heartbeat_runs`.
|
||||
|
||||
### Manual API-created events
|
||||
|
||||
Manual cost event creation remains supported.
|
||||
These events may have `heartbeatRunId = null`.
|
||||
|
||||
Rules:
|
||||
|
||||
- `provider` remains required
|
||||
- `biller` defaults to `provider`
|
||||
- `billingType` defaults to `unknown`
|
||||
|
||||
## Reporting Changes
|
||||
|
||||
### Server
|
||||
|
||||
Refactor reporting queries to use `cost_events` only.
|
||||
|
||||
#### `summary`
|
||||
|
||||
- sum `cost_cents`
|
||||
|
||||
#### `by-agent`
|
||||
|
||||
- sum costs and token fields from `cost_events`
|
||||
- use `count(distinct heartbeat_run_id)` filtered by billing type for run counts
|
||||
- use token sums filtered by billing type for subscription usage
|
||||
|
||||
#### `by-provider`
|
||||
|
||||
- group by `provider`, `model`
|
||||
- sum costs and token fields directly from the ledger
|
||||
- derive billing-type slices from `cost_events.billing_type`
|
||||
- never pro-rate from unrelated `heartbeat_runs`
|
||||
|
||||
#### future `by-biller`
|
||||
|
||||
- group by `biller`
|
||||
- this is the right view for invoice and subscription accountability
|
||||
|
||||
#### `window-spend`
|
||||
|
||||
- continue to use `cost_events`
|
||||
|
||||
#### project attribution
|
||||
|
||||
Keep current project attribution logic for now, but prefer `cost_events.heartbeat_run_id` as the join anchor whenever possible.
|
||||
|
||||
## UI Changes
|
||||
|
||||
### Principles
|
||||
|
||||
- Spend, usage, and quota are related but distinct
|
||||
- a missing quota fetch is not the same as “no quota”
|
||||
- provider and biller are different dimensions
|
||||
|
||||
### Immediate UI changes
|
||||
|
||||
1. Keep the current costs page structure.
|
||||
2. Make the provider cards accurate by reading only ledger-backed values.
|
||||
3. Show provider quota fetch errors explicitly instead of dropping them.
|
||||
|
||||
### Follow-up UI direction
|
||||
|
||||
The long-term board UI should expose:
|
||||
|
||||
- Spend
|
||||
Dollars by biller, provider, model, agent, project
|
||||
- Usage
|
||||
Tokens by provider, model, agent, project
|
||||
- Quotas
|
||||
Live provider or biller limits, credits, and reset windows
|
||||
- Financial events
|
||||
Credit purchases, top-ups, fees, refunds, commitments, storage, and other non-inference charges
|
||||
|
||||
## Migration Plan
|
||||
|
||||
Migration behavior:
|
||||
|
||||
- add new non-destructive columns with defaults
|
||||
- backfill existing rows:
|
||||
- `biller = provider`
|
||||
- `billing_type = 'unknown'`
|
||||
- `cached_input_tokens = 0`
|
||||
- `heartbeat_run_id = null`
|
||||
|
||||
Do **not** attempt to backfill historical provider-level subscription attribution from `heartbeat_runs`.
|
||||
That data was never stored with the required dimensions.
|
||||
|
||||
## Testing Plan
|
||||
|
||||
Add or update tests for:
|
||||
|
||||
1. heartbeat-created ledger rows persist `heartbeatRunId`, `biller`, `billingType`, and cached tokens
|
||||
2. legacy adapter billing values map correctly
|
||||
3. provider reporting uses ledger data only
|
||||
4. mixed-provider companies do not cross-attribute subscription usage
|
||||
5. zero-dollar subscription usage still appears in token reporting
|
||||
6. quota fetch failures render explicit UI state
|
||||
7. manual cost events still validate and write correctly
|
||||
8. biller reporting keeps upstream provider breakdowns separate
|
||||
9. OpenRouter-style rows can show `biller=openrouter` with non-OpenRouter upstream providers
|
||||
10. Cloudflare-style rows can show `biller=cloudflare` with preserved upstream provider identity
|
||||
11. future `finance_events` aggregation handles non-request charges without requiring a model or run id
|
||||
|
||||
## Delivery Plan
|
||||
|
||||
### Step 1
|
||||
|
||||
- land the ledger contract and query rewrite
|
||||
- make the current costs page correct
|
||||
|
||||
### Step 2
|
||||
|
||||
- add biller-oriented reporting endpoints and UI
|
||||
|
||||
### Step 3
|
||||
|
||||
- wire OpenRouter and any future aggregator adapters to the same contract
|
||||
|
||||
### Step 4
|
||||
|
||||
- add `executionAdapterType` to persisted cost reporting if adapter-level grouping becomes a product requirement
|
||||
|
||||
### Step 5
|
||||
|
||||
- introduce `finance_events`
|
||||
- add non-inference accounting endpoints
|
||||
- add UI for platform/account charges alongside inference spend and usage
|
||||
|
||||
## Non-Goals For This Change
|
||||
|
||||
- multi-currency support
|
||||
- invoice reconciliation
|
||||
- provider-specific cost estimation beyond persisted billed cost
|
||||
- replacing `heartbeat_runs` as the operational run record
|
||||
611
doc/plans/2026-03-14-budget-policies-and-enforcement.md
Normal file
611
doc/plans/2026-03-14-budget-policies-and-enforcement.md
Normal file
@@ -0,0 +1,611 @@
|
||||
# Budget Policies and Enforcement
|
||||
|
||||
## Context
|
||||
|
||||
Paperclip already treats budgets as a core control-plane responsibility:
|
||||
|
||||
- `doc/SPEC.md` gives the Board authority to set budgets, pause agents, pause work, and override any budget.
|
||||
- `doc/SPEC-implementation.md` says V1 must support monthly UTC budget windows, soft alerts, and hard auto-pause.
|
||||
- the current code only partially implements that intent.
|
||||
|
||||
Today the system has narrow money-budget behavior:
|
||||
|
||||
- companies track `budgetMonthlyCents` and `spentMonthlyCents`
|
||||
- agents track `budgetMonthlyCents` and `spentMonthlyCents`
|
||||
- `cost_events` ingestion increments those counters
|
||||
- when an agent exceeds its monthly budget, the agent is paused
|
||||
|
||||
That leaves major product gaps:
|
||||
|
||||
- no project budget model
|
||||
- no approval generated when budget is hit
|
||||
- no generic budget policy system
|
||||
- no project pause semantics tied to budget
|
||||
- no durable incident tracking to prevent duplicate alerts
|
||||
- no separation between enforceable spend budgets and advisory usage quotas
|
||||
|
||||
This plan defines the concrete budgeting model Paperclip should implement next.
|
||||
|
||||
## Product Goals
|
||||
|
||||
Paperclip should let operators:
|
||||
|
||||
1. Set budgets on agents and projects.
|
||||
2. Understand whether a budget is based on money or usage.
|
||||
3. Be warned before a budget is exhausted.
|
||||
4. Automatically pause work when a hard budget is hit.
|
||||
5. Approve, raise, or resume from a budget stop using obvious UI.
|
||||
6. See budget state on the dashboard, `/costs`, and scope detail pages.
|
||||
|
||||
The system should make one thing very clear:
|
||||
|
||||
- budgets are policy controls
|
||||
- quotas are usage visibility
|
||||
|
||||
They are related, but they are not the same concept.
|
||||
|
||||
## Product Decisions
|
||||
|
||||
### V1 Budget Defaults
|
||||
|
||||
For the next implementation pass, Paperclip should enforce these defaults:
|
||||
|
||||
- agent budgets are recurring monthly budgets
|
||||
- project budgets are lifetime total budgets
|
||||
- hard-stop enforcement uses billed dollars, not tokens
|
||||
- monthly windows use UTC calendar months
|
||||
- project total budgets do not reset automatically
|
||||
|
||||
This gives a clean mental model:
|
||||
|
||||
- agents are ongoing workers, so monthly recurring budget is natural
|
||||
- projects are bounded workstreams, so lifetime cap is natural
|
||||
|
||||
### Metric To Enforce First
|
||||
|
||||
The first enforceable metric should be `billed_cents`.
|
||||
|
||||
Reasoning:
|
||||
|
||||
- it works across providers, billers, and models
|
||||
- it maps directly to real financial risk
|
||||
- it handles overage and metered usage consistently
|
||||
- it avoids cross-provider token normalization problems
|
||||
- it applies cleanly even when future finance events are not token-based
|
||||
|
||||
Token budgets should not be the first hard-stop policy.
|
||||
They should come later as advisory usage controls once the money-based system is solid.
|
||||
|
||||
### Subscription Usage Decision
|
||||
|
||||
Paperclip should separate subscription-included usage from billed spend:
|
||||
|
||||
- `subscription_included`
|
||||
- visible in reporting
|
||||
- visible in usage summaries
|
||||
- does not count against money budget
|
||||
- `subscription_overage`
|
||||
- visible in reporting
|
||||
- counts against money budget
|
||||
- `metered_api`
|
||||
- visible in reporting
|
||||
- counts against money budget
|
||||
|
||||
This keeps the budget system honest:
|
||||
|
||||
- users should not see "spend" rise for usage that did not incur marginal billed cost
|
||||
- users should still see the token usage and provider quota state
|
||||
|
||||
### Soft Alert Versus Hard Stop
|
||||
|
||||
Paperclip should have two threshold classes:
|
||||
|
||||
- soft alert
|
||||
- creates visible notification state
|
||||
- does not create an approval
|
||||
- does not pause work
|
||||
- hard stop
|
||||
- pauses the affected scope automatically
|
||||
- creates an approval requiring human resolution
|
||||
- prevents additional heartbeats or task pickup in that scope
|
||||
|
||||
Default thresholds:
|
||||
|
||||
- soft alert at `80%`
|
||||
- hard stop at `100%`
|
||||
|
||||
These should be configurable per policy later, but they are good defaults now.
|
||||
|
||||
## Scope Model
|
||||
|
||||
### Supported Scope Types
|
||||
|
||||
Budget policies should support:
|
||||
|
||||
- `company`
|
||||
- `agent`
|
||||
- `project`
|
||||
|
||||
This plan focuses on finishing `agent` and `project` first while preserving the existing company budget behavior.
|
||||
|
||||
### Recommended V1.5 Policy Presets
|
||||
|
||||
- Company
|
||||
- metric: `billed_cents`
|
||||
- window: `calendar_month_utc`
|
||||
- Agent
|
||||
- metric: `billed_cents`
|
||||
- window: `calendar_month_utc`
|
||||
- Project
|
||||
- metric: `billed_cents`
|
||||
- window: `lifetime`
|
||||
|
||||
Future extensions can add:
|
||||
|
||||
- token advisory policies
|
||||
- daily or weekly spend windows
|
||||
- provider- or biller-scoped budgets
|
||||
- inherited delegated budgets down the org tree
|
||||
|
||||
## Current Implementation Baseline
|
||||
|
||||
The current codebase is not starting from zero, but the existing shape is too ad hoc to extend safely.
|
||||
|
||||
### What Exists Today
|
||||
|
||||
- company and agent monthly cents counters
|
||||
- cost ingestion that updates those counters
|
||||
- agent hard-stop pause on monthly budget overrun
|
||||
|
||||
### What Is Missing
|
||||
|
||||
- project budgets
|
||||
- generic budget policy persistence
|
||||
- generic threshold crossing detection
|
||||
- incident deduplication per scope/window
|
||||
- approval creation on hard-stop
|
||||
- project execution blocking
|
||||
- budget timeline and incident UI
|
||||
- distinction between advisory quota and enforceable budget
|
||||
|
||||
## Proposed Data Model
|
||||
|
||||
### 1. `budget_policies`
|
||||
|
||||
Create a new table for canonical budget definitions.
|
||||
|
||||
Suggested fields:
|
||||
|
||||
- `id`
|
||||
- `company_id`
|
||||
- `scope_type`
|
||||
- `scope_id`
|
||||
- `metric`
|
||||
- `window_kind`
|
||||
- `amount`
|
||||
- `warn_percent`
|
||||
- `hard_stop_enabled`
|
||||
- `notify_enabled`
|
||||
- `is_active`
|
||||
- `created_by_user_id`
|
||||
- `updated_by_user_id`
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
|
||||
Notes:
|
||||
|
||||
- `scope_type` is one of `company | agent | project`
|
||||
- `scope_id` is nullable only for company-level policy if company is implied; otherwise keep it explicit
|
||||
- `metric` should start with `billed_cents`
|
||||
- `window_kind` starts with `calendar_month_utc | lifetime`
|
||||
- `amount` is stored in the natural unit of the metric
|
||||
|
||||
### 2. `budget_incidents`
|
||||
|
||||
Create a durable record of threshold crossings.
|
||||
|
||||
Suggested fields:
|
||||
|
||||
- `id`
|
||||
- `company_id`
|
||||
- `policy_id`
|
||||
- `scope_type`
|
||||
- `scope_id`
|
||||
- `metric`
|
||||
- `window_kind`
|
||||
- `window_start`
|
||||
- `window_end`
|
||||
- `threshold_type`
|
||||
- `amount_limit`
|
||||
- `amount_observed`
|
||||
- `status`
|
||||
- `approval_id` nullable
|
||||
- `activity_id` nullable
|
||||
- `resolved_at` nullable
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
|
||||
Notes:
|
||||
|
||||
- `threshold_type`: `soft | hard`
|
||||
- `status`: `open | acknowledged | resolved | dismissed`
|
||||
- one open incident per policy per threshold per window prevents duplicate approvals and alert spam
|
||||
|
||||
### 3. Project Pause State
|
||||
|
||||
Projects need explicit pause semantics.
|
||||
|
||||
Recommended approach:
|
||||
|
||||
- extend project status or add a pause field so a project can be blocked by budget
|
||||
- preserve whether the project is paused due to budget versus manually paused
|
||||
|
||||
Preferred shape:
|
||||
|
||||
- keep project workflow status as-is
|
||||
- add execution-state fields:
|
||||
- `execution_status`: `active | paused | archived`
|
||||
- `pause_reason`: `manual | budget | system | null`
|
||||
|
||||
If that is too large for the immediate pass, a smaller version is:
|
||||
|
||||
- add `paused_at`
|
||||
- add `pause_reason`
|
||||
|
||||
The key requirement is behavioral, not cosmetic:
|
||||
Paperclip must know that a project is budget-paused and enforce it.
|
||||
|
||||
### 4. Compatibility With Existing Budget Columns
|
||||
|
||||
Existing company and agent monthly budget columns should remain temporarily for compatibility.
|
||||
|
||||
Migration plan:
|
||||
|
||||
1. keep reading existing columns during transition
|
||||
2. create equivalent `budget_policies` rows
|
||||
3. switch enforcement and UI to policies
|
||||
4. later remove or deprecate legacy columns
|
||||
|
||||
## Budget Engine
|
||||
|
||||
Budget enforcement should move into a dedicated service.
|
||||
|
||||
Current logic is buried inside cost ingestion.
|
||||
That is too narrow because budget checks must apply at more than one execution boundary.
|
||||
|
||||
### Responsibilities
|
||||
|
||||
New service: `budgetService`
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- resolve applicable policies for a cost event
|
||||
- compute current window totals
|
||||
- detect threshold crossings
|
||||
- create incidents, activities, and approvals
|
||||
- pause affected scopes on hard-stop
|
||||
- provide preflight enforcement checks for execution entry points
|
||||
|
||||
### Canonical Evaluation Flow
|
||||
|
||||
When a new `cost_event` is written:
|
||||
|
||||
1. persist the `cost_event`
|
||||
2. identify affected scopes
|
||||
- company
|
||||
- agent
|
||||
- project
|
||||
3. fetch active policies for those scopes
|
||||
4. compute current observed amount for each policy window
|
||||
5. compare to thresholds
|
||||
6. create soft incident if soft threshold crossed for first time in window
|
||||
7. create hard incident if hard threshold crossed for first time in window
|
||||
8. if hard incident:
|
||||
- pause the scope
|
||||
- create approval
|
||||
- create activity event
|
||||
- emit notification state
|
||||
|
||||
### Preflight Enforcement Checks
|
||||
|
||||
Budget enforcement cannot rely only on post-hoc cost ingestion.
|
||||
|
||||
Paperclip must also block execution before new work starts.
|
||||
|
||||
Add budget checks to:
|
||||
|
||||
- scheduler heartbeat dispatch
|
||||
- manual invoke endpoints
|
||||
- assignment-driven wakeups
|
||||
- queued run promotion
|
||||
- issue checkout or pickup paths where applicable
|
||||
|
||||
If a scope is budget-paused:
|
||||
|
||||
- do not start a new heartbeat
|
||||
- do not let the agent pick up additional work
|
||||
- present a clear reason in API and UI
|
||||
|
||||
### Active Run Behavior
|
||||
|
||||
When a hard-stop is triggered while a run is already active:
|
||||
|
||||
- mark scope paused immediately for future work
|
||||
- request graceful cancellation of the current run
|
||||
- allow normal cancellation timeout behavior
|
||||
- write activity explaining that pause came from budget enforcement
|
||||
|
||||
This mirrors the general pause semantics already expected by the product.
|
||||
|
||||
## Approval Model
|
||||
|
||||
Budget hard-stops should create a first-class approval.
|
||||
|
||||
### New Approval Type
|
||||
|
||||
Add approval type:
|
||||
|
||||
- `budget_override_required`
|
||||
|
||||
Payload should include:
|
||||
|
||||
- `scopeType`
|
||||
- `scopeId`
|
||||
- `scopeName`
|
||||
- `metric`
|
||||
- `windowKind`
|
||||
- `thresholdType`
|
||||
- `budgetAmount`
|
||||
- `observedAmount`
|
||||
- `windowStart`
|
||||
- `windowEnd`
|
||||
- `topDrivers`
|
||||
- `paused`
|
||||
|
||||
### Resolution Actions
|
||||
|
||||
The approval UI should support:
|
||||
|
||||
- raise budget and resume
|
||||
- resume once without changing policy
|
||||
- keep paused
|
||||
|
||||
Optional later action:
|
||||
|
||||
- disable budget policy
|
||||
|
||||
### Soft Alerts Do Not Need Approval
|
||||
|
||||
Soft alerts should create:
|
||||
|
||||
- activity event
|
||||
- dashboard alert
|
||||
- inbox notification or similar board-visible signal
|
||||
|
||||
They should not create an approval by default.
|
||||
|
||||
## Notification And Activity Model
|
||||
|
||||
Budget events need obvious operator visibility.
|
||||
|
||||
Required outputs:
|
||||
|
||||
- activity log entry on threshold crossings
|
||||
- dashboard surface for active budget incidents
|
||||
- detail page banner on paused agent or project
|
||||
- `/costs` summary of active incidents and policy health
|
||||
|
||||
Later channels:
|
||||
|
||||
- email
|
||||
- webhook
|
||||
- Slack or other integrations
|
||||
|
||||
## API Plan
|
||||
|
||||
### Policy Management
|
||||
|
||||
Add routes for:
|
||||
|
||||
- list budget policies for company
|
||||
- create budget policy
|
||||
- update budget policy
|
||||
- archive or disable budget policy
|
||||
|
||||
### Incident Surfaces
|
||||
|
||||
Add routes for:
|
||||
|
||||
- list active budget incidents
|
||||
- list incident history
|
||||
- get incident detail for a scope
|
||||
|
||||
### Approval Resolution
|
||||
|
||||
Budget approvals should use the existing approval system once the new approval type is added.
|
||||
|
||||
Expected flows:
|
||||
|
||||
- create approval on hard-stop
|
||||
- resolve approval by changing policy and resuming
|
||||
- resolve approval by resuming once
|
||||
|
||||
### Execution Errors
|
||||
|
||||
When work is blocked by budget, the API should return explicit errors.
|
||||
|
||||
Examples:
|
||||
|
||||
- agent invocation blocked because agent budget is paused
|
||||
- issue execution blocked because project budget is paused
|
||||
|
||||
Do not silently no-op.
|
||||
|
||||
## UI Plan
|
||||
|
||||
Budgeting should be visible in the places where operators make decisions.
|
||||
|
||||
### `/costs`
|
||||
|
||||
Add a budget section that includes:
|
||||
|
||||
- active budget incidents
|
||||
- policy list with scope, window, metric, and threshold state
|
||||
- progress bars for current period or total
|
||||
- clear distinction between:
|
||||
- spend budget
|
||||
- subscription quota
|
||||
- quick actions:
|
||||
- raise budget
|
||||
- open approval
|
||||
- resume scope if permitted
|
||||
|
||||
The page should make this visual distinction obvious:
|
||||
|
||||
- Budget
|
||||
- enforceable spend policy
|
||||
- Quota
|
||||
- provider or subscription usage window
|
||||
|
||||
### Agent Detail
|
||||
|
||||
Add an agent budget card:
|
||||
|
||||
- monthly budget amount
|
||||
- current month spend
|
||||
- remaining spend
|
||||
- status
|
||||
- warning or paused banner
|
||||
- link to approval if blocked
|
||||
|
||||
### Project Detail
|
||||
|
||||
Add a project budget card:
|
||||
|
||||
- total budget amount
|
||||
- total spend to date
|
||||
- remaining spend
|
||||
- pause status
|
||||
- approval link
|
||||
|
||||
Project detail should also show if issue execution is blocked because the project is budget-paused.
|
||||
|
||||
### Dashboard
|
||||
|
||||
Add a high-signal budget section:
|
||||
|
||||
- active budget breaches
|
||||
- upcoming soft alerts
|
||||
- counts of paused agents and paused projects due to budget
|
||||
|
||||
The operator should not have to visit `/costs` to learn that work has stopped.
|
||||
|
||||
## Budget Math
|
||||
|
||||
### What Counts Toward Budget
|
||||
|
||||
For V1.5 enforcement, include:
|
||||
|
||||
- `metered_api` cost events
|
||||
- `subscription_overage` cost events
|
||||
- any future request-scoped cost event with non-zero billed cents
|
||||
|
||||
Do not include:
|
||||
|
||||
- `subscription_included` cost events with zero billed cents
|
||||
- advisory quota rows
|
||||
- account-level finance events unless and until company-level financial budgets are added explicitly
|
||||
|
||||
### Why Not Tokens First
|
||||
|
||||
Token budgets should not be the first hard-stop because:
|
||||
|
||||
- providers count tokens differently
|
||||
- cached tokens complicate simple totals
|
||||
- some future charges are not token-based
|
||||
- subscription tokens do not necessarily imply spend
|
||||
- money remains the cleanest cross-provider enforcement metric
|
||||
|
||||
### Future Budget Metrics
|
||||
|
||||
Future policy metrics can include:
|
||||
|
||||
- `total_tokens`
|
||||
- `input_tokens`
|
||||
- `output_tokens`
|
||||
- `requests`
|
||||
- `finance_amount_cents`
|
||||
|
||||
But they should enter only after the money-budget path is stable.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Phase 1: Foundation
|
||||
|
||||
- add `budget_policies`
|
||||
- add `budget_incidents`
|
||||
- add new approval type
|
||||
- add project pause metadata
|
||||
|
||||
### Phase 2: Compatibility
|
||||
|
||||
- backfill policies from existing company and agent monthly budget columns
|
||||
- keep legacy columns readable during migration
|
||||
|
||||
### Phase 3: Enforcement
|
||||
|
||||
- move budget logic into dedicated service
|
||||
- add hard-stop incident creation
|
||||
- add activity and approval creation
|
||||
- add execution guards on heartbeat and invoke paths
|
||||
|
||||
### Phase 4: UI
|
||||
|
||||
- `/costs` budget section
|
||||
- agent detail budget card
|
||||
- project detail budget card
|
||||
- dashboard incident summary
|
||||
|
||||
### Phase 5: Cleanup
|
||||
|
||||
- move all reads/writes to `budget_policies`
|
||||
- reduce legacy column reliance
|
||||
- decide whether to remove old budget columns
|
||||
|
||||
## Tests
|
||||
|
||||
Required coverage:
|
||||
|
||||
- agent monthly budget soft alert at 80%
|
||||
- agent monthly budget hard-stop at 100%
|
||||
- project lifetime budget soft alert
|
||||
- project lifetime budget hard-stop
|
||||
- `subscription_included` usage does not consume money budget
|
||||
- `subscription_overage` does consume money budget
|
||||
- hard-stop creates one incident per threshold per window
|
||||
- hard-stop creates approval and pauses correct scope
|
||||
- paused project blocks new issue execution
|
||||
- paused agent blocks new heartbeat dispatch
|
||||
- policy update and resume clears or resolves active incident correctly
|
||||
- dashboard and `/costs` surface active incidents
|
||||
|
||||
## Open Questions
|
||||
|
||||
These should be explicitly deferred unless they block implementation:
|
||||
|
||||
- Should project budgets also support monthly mode, or is lifetime enough for the first release?
|
||||
- Should company-level budgets eventually include `finance_events` such as OpenRouter top-up fees and Bedrock provisioned charges?
|
||||
- Should delegated budget editing be limited by org hierarchy in V1, or remain board-only in the UI even if the data model can support delegation later?
|
||||
- Do we need "resume once" immediately, or can first approval resolution be "raise budget and resume" plus "keep paused"?
|
||||
|
||||
## Recommendation
|
||||
|
||||
Implement the first coherent budgeting system with these rules:
|
||||
|
||||
- Agent budget = monthly billed dollars
|
||||
- Project budget = lifetime billed dollars
|
||||
- Hard-stop = auto-pause + approval
|
||||
- Soft alert = visible warning, no approval
|
||||
- Subscription usage = visible quota and token reporting, not money-budget enforcement
|
||||
|
||||
This solves the real operator problem without mixing together spend control, provider quota windows, and token accounting.
|
||||
Reference in New Issue
Block a user