Merge remote-tracking branch 'public-gh/master' into paperclip-subissues
* public-gh/master: Fix budget incident resolution edge cases Fix agent budget tab routing Fix budget auth and monthly spend rollups Harden budget enforcement and migration startup Add budget tabs and sidebar budget indicators feat(costs): add billing, quota, and budget control plane refactor(quota): move provider quota logic into adapter layer, add unit tests fix(costs): replace non-null map assertions with nullish coalescing, clarify weekData guard fix(costs): guard byProject against duplicate null keys, memoize ProviderQuotaCard row aggregations fix(costs): align byAgent run filter to startedAt, tighten providerTabItems memo deps, stabilize byProject row keys feat(costs): add agent model breakdown, harden date validation, sync CostByProject type, fix quota threshold and tab-gated queries fix(costs): harden company auth check, fix frozen date memo, hide empty quota rows fix(costs): guard routes, fix DST ranges, sync provider state, wire live updates feat(costs): consolidate /usage into /costs with Spend + Providers tabs feat(usage): add subscription quota windows per provider on /usage page address greptile review: per-provider deficit notch, startedAt filter, weekRange refresh, deduplicate providerDisplayName feat(ui): add resource and usage dashboard (/usage route) # Conflicts: # packages/db/src/migration-runtime.ts # packages/db/src/migrations/meta/0031_snapshot.json # packages/db/src/migrations/meta/_journal.json
This commit is contained in:
468
doc/plans/2026-03-14-billing-ledger-and-reporting.md
Normal file
468
doc/plans/2026-03-14-billing-ledger-and-reporting.md
Normal file
@@ -0,0 +1,468 @@
|
||||
# Billing Ledger and Reporting
|
||||
|
||||
## Context
|
||||
|
||||
Paperclip currently stores model spend in `cost_events` and operational run state in `heartbeat_runs`.
|
||||
That split is fine, but the current reporting code tries to infer billing semantics by mixing both tables:
|
||||
|
||||
- `cost_events` knows provider, model, tokens, and dollars
|
||||
- `heartbeat_runs.usage_json` knows some per-run billing metadata
|
||||
- `heartbeat_runs.usage_json` does **not** currently carry enough normalized billing dimensions to support honest provider-level reporting
|
||||
|
||||
This becomes incorrect as soon as a company uses more than one provider, more than one billing channel, or more than one billing mode.
|
||||
|
||||
Examples:
|
||||
|
||||
- direct OpenAI API usage
|
||||
- Claude subscription usage with zero marginal dollars
|
||||
- subscription overage with dollars and tokens
|
||||
- OpenRouter billing where the biller is OpenRouter but the upstream provider is Anthropic or OpenAI
|
||||
|
||||
The system needs to support:
|
||||
|
||||
- dollar reporting
|
||||
- token reporting
|
||||
- subscription-included usage
|
||||
- subscription overage
|
||||
- direct metered API usage
|
||||
- future aggregator billing such as OpenRouter
|
||||
|
||||
## Product Decision
|
||||
|
||||
`cost_events` becomes the canonical billing and usage ledger for reporting.
|
||||
|
||||
`heartbeat_runs` remains an operational execution log. It may keep mirrored billing metadata for debugging and transcripts, but reporting must not reconstruct billing semantics from `heartbeat_runs.usage_json`.
|
||||
|
||||
## Decision: One Ledger Or Two
|
||||
|
||||
We do **not** need two tables to solve the current PR's problem.
|
||||
For request-level inference reporting, `cost_events` is enough if it carries the right dimensions:
|
||||
|
||||
- upstream provider
|
||||
- biller
|
||||
- billing type
|
||||
- model
|
||||
- token fields
|
||||
- billed amount
|
||||
|
||||
That is why the first implementation pass extends `cost_events` instead of introducing a second table immediately.
|
||||
|
||||
However, if Paperclip needs to account for the full billing surface of aggregators and managed AI platforms, then `cost_events` alone is not enough.
|
||||
Some charges are not cleanly representable as a single model inference event:
|
||||
|
||||
- account top-ups and credit purchases
|
||||
- platform fees charged at purchase time
|
||||
- BYOK platform fees that are account-level or threshold-based
|
||||
- prepaid credit expirations, refunds, and adjustments
|
||||
- provisioned throughput commitments
|
||||
- fine-tuning, training, model import, and storage charges
|
||||
- gateway logging or other platform overhead that is not attributable to one prompt/response pair
|
||||
|
||||
So the decision is:
|
||||
|
||||
- near term: keep `cost_events` as the inference and usage ledger
|
||||
- next phase: add `finance_events` for non-inference financial events
|
||||
|
||||
This is a deliberate split between:
|
||||
|
||||
- usage and inference accounting
|
||||
- account-level and platform-level financial accounting
|
||||
|
||||
That separation keeps request reporting honest without forcing us to fake invoice semantics onto rows that were never request-scoped.
|
||||
|
||||
## External Motivation And Sources
|
||||
|
||||
The need for this model is not theoretical.
|
||||
It follows directly from the billing systems of providers and aggregators Paperclip needs to support.
|
||||
|
||||
### OpenRouter
|
||||
|
||||
Source URLs:
|
||||
|
||||
- https://openrouter.ai/docs/faq#credit-and-billing-systems
|
||||
- https://openrouter.ai/pricing
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- OpenRouter passes through underlying inference pricing and deducts request cost from purchased credits.
|
||||
- OpenRouter charges a 5.5% fee with a $0.80 minimum when purchasing credits.
|
||||
- Crypto payments are charged a 5% fee.
|
||||
- BYOK has its own fee model after a free request threshold.
|
||||
- OpenRouter billing is aggregated at the OpenRouter account level even when the upstream provider is Anthropic, OpenAI, Google, or another provider.
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- request usage belongs in `cost_events`
|
||||
- credit purchases, purchase fees, BYOK fees, refunds, and expirations belong in `finance_events`
|
||||
- `biller=openrouter` must remain distinct from `provider=anthropic|openai|google|...`
|
||||
|
||||
### Cloudflare AI Gateway Unified Billing
|
||||
|
||||
Source URL:
|
||||
|
||||
- https://developers.cloudflare.com/ai-gateway/features/unified-billing/
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- Unified Billing lets users call multiple upstream providers while receiving a single Cloudflare bill.
|
||||
- Usage is paid from Cloudflare-loaded credits.
|
||||
- Cloudflare supports manual top-ups and auto top-up thresholds.
|
||||
- Spend limits can stop request processing on daily, weekly, or monthly boundaries.
|
||||
- Unified Billing traffic can use Cloudflare-managed credentials rather than the user's direct provider key.
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- request usage needs `biller=cloudflare`
|
||||
- upstream provider still needs to be preserved separately
|
||||
- Cloudflare credit loads and related account-level events are not inference rows and should not be forced into `cost_events`
|
||||
- quota and limits reporting must support biller-level controls, not just upstream provider limits
|
||||
|
||||
### Amazon Bedrock
|
||||
|
||||
Source URL:
|
||||
|
||||
- https://aws.amazon.com/bedrock/pricing/
|
||||
|
||||
Relevant billing behavior as of March 14, 2026:
|
||||
|
||||
- Bedrock supports on-demand and batch pricing.
|
||||
- Bedrock pricing varies by region.
|
||||
- some pricing tiers add premiums or discounts relative to standard pricing
|
||||
- provisioned throughput is commitment-based rather than request-based
|
||||
- custom model import uses Custom Model Units billed per minute, with monthly storage charges
|
||||
- imported model copies are billed in 5-minute windows once active
|
||||
- customization and fine-tuning introduce training and hosted-model charges beyond normal inference
|
||||
|
||||
Implication for Paperclip:
|
||||
|
||||
- normal tokenized inference fits in `cost_events`
|
||||
- provisioned throughput, custom model unit charges, training, and storage charges require `finance_events`
|
||||
- region and pricing tier need to be first-class dimensions in the financial model
|
||||
|
||||
## Ledger Boundary
|
||||
|
||||
To keep the system coherent, the table boundary should be explicit.
|
||||
|
||||
### `cost_events`
|
||||
|
||||
Use `cost_events` for request-scoped usage and inference charges:
|
||||
|
||||
- one row per billable or usage-bearing run event
|
||||
- provider/model/biller/billingType/tokens/cost
|
||||
- optionally tied to `heartbeat_run_id`
|
||||
- supports direct APIs, subscriptions, overage, OpenRouter-routed inference, Cloudflare-routed inference, and Bedrock on-demand inference
|
||||
|
||||
### `finance_events`
|
||||
|
||||
Use `finance_events` for account-scoped or platform-scoped financial events:
|
||||
|
||||
- credit purchase
|
||||
- top-up
|
||||
- refund
|
||||
- fee
|
||||
- expiry
|
||||
- provisioned capacity
|
||||
- training
|
||||
- model import
|
||||
- storage
|
||||
- invoice adjustment
|
||||
|
||||
These rows may or may not have a related model, provider, or run id.
|
||||
Trying to force them into `cost_events` would either create fake request rows or create null-heavy rows that mean something fundamentally different from inference usage.
|
||||
|
||||
## Canonical Billing Dimensions
|
||||
|
||||
Every persisted billing event should model four separate axes:
|
||||
|
||||
1. Usage provider
|
||||
The upstream provider whose model performed the work.
|
||||
Examples: `openai`, `anthropic`, `google`.
|
||||
|
||||
2. Biller
|
||||
The system that charged for the usage.
|
||||
Examples: `openai`, `anthropic`, `openrouter`, `cursor`, `chatgpt`.
|
||||
|
||||
3. Billing type
|
||||
The pricing mode applied to the event.
|
||||
Initial canonical values:
|
||||
- `metered_api`
|
||||
- `subscription_included`
|
||||
- `subscription_overage`
|
||||
- `credits`
|
||||
- `fixed`
|
||||
- `unknown`
|
||||
|
||||
4. Measures
|
||||
Usage and billing must both be storable:
|
||||
- `input_tokens`
|
||||
- `output_tokens`
|
||||
- `cached_input_tokens`
|
||||
- `cost_cents`
|
||||
|
||||
These dimensions are independent.
|
||||
For example, an event may be:
|
||||
|
||||
- provider: `anthropic`
|
||||
- biller: `openrouter`
|
||||
- billing type: `metered_api`
|
||||
- tokens: non-zero
|
||||
- cost cents: non-zero
|
||||
|
||||
Or:
|
||||
|
||||
- provider: `anthropic`
|
||||
- biller: `anthropic`
|
||||
- billing type: `subscription_included`
|
||||
- tokens: non-zero
|
||||
- cost cents: `0`
|
||||
|
||||
## Schema Changes
|
||||
|
||||
Extend `cost_events` with:
|
||||
|
||||
- `heartbeat_run_id uuid null references heartbeat_runs.id`
|
||||
- `biller text not null default 'unknown'`
|
||||
- `billing_type text not null default 'unknown'`
|
||||
- `cached_input_tokens int not null default 0`
|
||||
|
||||
Keep `provider` as the upstream usage provider.
|
||||
Do not overload `provider` to mean biller.
|
||||
|
||||
Add a future `finance_events` table for account-level financial events with fields along these lines:
|
||||
|
||||
- `company_id`
|
||||
- `occurred_at`
|
||||
- `event_kind`
|
||||
- `direction`
|
||||
- `biller`
|
||||
- `provider nullable`
|
||||
- `execution_adapter_type nullable`
|
||||
- `pricing_tier nullable`
|
||||
- `region nullable`
|
||||
- `model nullable`
|
||||
- `quantity nullable`
|
||||
- `unit nullable`
|
||||
- `amount_cents`
|
||||
- `currency`
|
||||
- `estimated`
|
||||
- `related_cost_event_id nullable`
|
||||
- `related_heartbeat_run_id nullable`
|
||||
- `external_invoice_id nullable`
|
||||
- `metadata_json nullable`
|
||||
|
||||
Add indexes:
|
||||
|
||||
- `(company_id, biller, occurred_at)`
|
||||
- `(company_id, provider, occurred_at)`
|
||||
- `(company_id, heartbeat_run_id)` if distinct-run reporting remains common
|
||||
|
||||
## Shared Contract Changes
|
||||
|
||||
### Shared types
|
||||
|
||||
Add a shared billing type union and enrich cost types with:
|
||||
|
||||
- `heartbeatRunId`
|
||||
- `biller`
|
||||
- `billingType`
|
||||
- `cachedInputTokens`
|
||||
|
||||
Update reporting response types so the provider breakdown reflects the ledger directly rather than inferred run metadata.
|
||||
|
||||
### Validators
|
||||
|
||||
Extend `createCostEventSchema` to accept:
|
||||
|
||||
- `heartbeatRunId`
|
||||
- `biller`
|
||||
- `billingType`
|
||||
- `cachedInputTokens`
|
||||
|
||||
Defaults:
|
||||
|
||||
- `biller` defaults to `provider`
|
||||
- `billingType` defaults to `unknown`
|
||||
- `cachedInputTokens` defaults to `0`
|
||||
|
||||
## Adapter Contract Changes
|
||||
|
||||
Extend adapter execution results so they can report:
|
||||
|
||||
- `biller`
|
||||
- richer billing type values
|
||||
|
||||
Backwards compatibility:
|
||||
|
||||
- existing adapter values `api` and `subscription` are treated as legacy aliases
|
||||
- map `api -> metered_api`
|
||||
- map `subscription -> subscription_included`
|
||||
|
||||
Future adapters may emit the canonical values directly.
|
||||
|
||||
OpenRouter support will use:
|
||||
|
||||
- `provider` = upstream provider when known
|
||||
- `biller` = `openrouter`
|
||||
- `billingType` = `metered_api` unless OpenRouter later exposes another billing mode
|
||||
|
||||
Cloudflare Unified Billing support will use:
|
||||
|
||||
- `provider` = upstream provider when known
|
||||
- `biller` = `cloudflare`
|
||||
- `billingType` = `credits` or `metered_api` depending on the normalized request billing contract
|
||||
|
||||
Bedrock support will use:
|
||||
|
||||
- `provider` = upstream provider or `aws_bedrock` depending on adapter shape
|
||||
- `biller` = `aws_bedrock`
|
||||
- `billingType` = request-scoped mode for inference rows
|
||||
- `finance_events` for provisioned, training, import, and storage charges
|
||||
|
||||
## Write Path Changes
|
||||
|
||||
### Heartbeat-created events
|
||||
|
||||
When a heartbeat run produces usage or spend:
|
||||
|
||||
1. normalize adapter billing metadata
|
||||
2. write a ledger row to `cost_events`
|
||||
3. attach `heartbeat_run_id`
|
||||
4. set `provider`, `biller`, `billing_type`, token fields, and `cost_cents`
|
||||
|
||||
The write path should no longer depend on later inference from `heartbeat_runs`.
|
||||
|
||||
### Manual API-created events
|
||||
|
||||
Manual cost event creation remains supported.
|
||||
These events may have `heartbeatRunId = null`.
|
||||
|
||||
Rules:
|
||||
|
||||
- `provider` remains required
|
||||
- `biller` defaults to `provider`
|
||||
- `billingType` defaults to `unknown`
|
||||
|
||||
## Reporting Changes
|
||||
|
||||
### Server
|
||||
|
||||
Refactor reporting queries to use `cost_events` only.
|
||||
|
||||
#### `summary`
|
||||
|
||||
- sum `cost_cents`
|
||||
|
||||
#### `by-agent`
|
||||
|
||||
- sum costs and token fields from `cost_events`
|
||||
- use `count(distinct heartbeat_run_id)` filtered by billing type for run counts
|
||||
- use token sums filtered by billing type for subscription usage
|
||||
|
||||
#### `by-provider`
|
||||
|
||||
- group by `provider`, `model`
|
||||
- sum costs and token fields directly from the ledger
|
||||
- derive billing-type slices from `cost_events.billing_type`
|
||||
- never pro-rate from unrelated `heartbeat_runs`
|
||||
|
||||
#### future `by-biller`
|
||||
|
||||
- group by `biller`
|
||||
- this is the right view for invoice and subscription accountability
|
||||
|
||||
#### `window-spend`
|
||||
|
||||
- continue to use `cost_events`
|
||||
|
||||
#### project attribution
|
||||
|
||||
Keep current project attribution logic for now, but prefer `cost_events.heartbeat_run_id` as the join anchor whenever possible.
|
||||
|
||||
## UI Changes
|
||||
|
||||
### Principles
|
||||
|
||||
- Spend, usage, and quota are related but distinct
|
||||
- a missing quota fetch is not the same as “no quota”
|
||||
- provider and biller are different dimensions
|
||||
|
||||
### Immediate UI changes
|
||||
|
||||
1. Keep the current costs page structure.
|
||||
2. Make the provider cards accurate by reading only ledger-backed values.
|
||||
3. Show provider quota fetch errors explicitly instead of dropping them.
|
||||
|
||||
### Follow-up UI direction
|
||||
|
||||
The long-term board UI should expose:
|
||||
|
||||
- Spend
|
||||
Dollars by biller, provider, model, agent, project
|
||||
- Usage
|
||||
Tokens by provider, model, agent, project
|
||||
- Quotas
|
||||
Live provider or biller limits, credits, and reset windows
|
||||
- Financial events
|
||||
Credit purchases, top-ups, fees, refunds, commitments, storage, and other non-inference charges
|
||||
|
||||
## Migration Plan
|
||||
|
||||
Migration behavior:
|
||||
|
||||
- add new non-destructive columns with defaults
|
||||
- backfill existing rows:
|
||||
- `biller = provider`
|
||||
- `billing_type = 'unknown'`
|
||||
- `cached_input_tokens = 0`
|
||||
- `heartbeat_run_id = null`
|
||||
|
||||
Do **not** attempt to backfill historical provider-level subscription attribution from `heartbeat_runs`.
|
||||
That data was never stored with the required dimensions.
|
||||
|
||||
## Testing Plan
|
||||
|
||||
Add or update tests for:
|
||||
|
||||
1. heartbeat-created ledger rows persist `heartbeatRunId`, `biller`, `billingType`, and cached tokens
|
||||
2. legacy adapter billing values map correctly
|
||||
3. provider reporting uses ledger data only
|
||||
4. mixed-provider companies do not cross-attribute subscription usage
|
||||
5. zero-dollar subscription usage still appears in token reporting
|
||||
6. quota fetch failures render explicit UI state
|
||||
7. manual cost events still validate and write correctly
|
||||
8. biller reporting keeps upstream provider breakdowns separate
|
||||
9. OpenRouter-style rows can show `biller=openrouter` with non-OpenRouter upstream providers
|
||||
10. Cloudflare-style rows can show `biller=cloudflare` with preserved upstream provider identity
|
||||
11. future `finance_events` aggregation handles non-request charges without requiring a model or run id
|
||||
|
||||
## Delivery Plan
|
||||
|
||||
### Step 1
|
||||
|
||||
- land the ledger contract and query rewrite
|
||||
- make the current costs page correct
|
||||
|
||||
### Step 2
|
||||
|
||||
- add biller-oriented reporting endpoints and UI
|
||||
|
||||
### Step 3
|
||||
|
||||
- wire OpenRouter and any future aggregator adapters to the same contract
|
||||
|
||||
### Step 4
|
||||
|
||||
- add `executionAdapterType` to persisted cost reporting if adapter-level grouping becomes a product requirement
|
||||
|
||||
### Step 5
|
||||
|
||||
- introduce `finance_events`
|
||||
- add non-inference accounting endpoints
|
||||
- add UI for platform/account charges alongside inference spend and usage
|
||||
|
||||
## Non-Goals For This Change
|
||||
|
||||
- multi-currency support
|
||||
- invoice reconciliation
|
||||
- provider-specific cost estimation beyond persisted billed cost
|
||||
- replacing `heartbeat_runs` as the operational run record
|
||||
611
doc/plans/2026-03-14-budget-policies-and-enforcement.md
Normal file
611
doc/plans/2026-03-14-budget-policies-and-enforcement.md
Normal file
@@ -0,0 +1,611 @@
|
||||
# Budget Policies and Enforcement
|
||||
|
||||
## Context
|
||||
|
||||
Paperclip already treats budgets as a core control-plane responsibility:
|
||||
|
||||
- `doc/SPEC.md` gives the Board authority to set budgets, pause agents, pause work, and override any budget.
|
||||
- `doc/SPEC-implementation.md` says V1 must support monthly UTC budget windows, soft alerts, and hard auto-pause.
|
||||
- the current code only partially implements that intent.
|
||||
|
||||
Today the system has narrow money-budget behavior:
|
||||
|
||||
- companies track `budgetMonthlyCents` and `spentMonthlyCents`
|
||||
- agents track `budgetMonthlyCents` and `spentMonthlyCents`
|
||||
- `cost_events` ingestion increments those counters
|
||||
- when an agent exceeds its monthly budget, the agent is paused
|
||||
|
||||
That leaves major product gaps:
|
||||
|
||||
- no project budget model
|
||||
- no approval generated when budget is hit
|
||||
- no generic budget policy system
|
||||
- no project pause semantics tied to budget
|
||||
- no durable incident tracking to prevent duplicate alerts
|
||||
- no separation between enforceable spend budgets and advisory usage quotas
|
||||
|
||||
This plan defines the concrete budgeting model Paperclip should implement next.
|
||||
|
||||
## Product Goals
|
||||
|
||||
Paperclip should let operators:
|
||||
|
||||
1. Set budgets on agents and projects.
|
||||
2. Understand whether a budget is based on money or usage.
|
||||
3. Be warned before a budget is exhausted.
|
||||
4. Automatically pause work when a hard budget is hit.
|
||||
5. Approve, raise, or resume from a budget stop using obvious UI.
|
||||
6. See budget state on the dashboard, `/costs`, and scope detail pages.
|
||||
|
||||
The system should make one thing very clear:
|
||||
|
||||
- budgets are policy controls
|
||||
- quotas are usage visibility
|
||||
|
||||
They are related, but they are not the same concept.
|
||||
|
||||
## Product Decisions
|
||||
|
||||
### V1 Budget Defaults
|
||||
|
||||
For the next implementation pass, Paperclip should enforce these defaults:
|
||||
|
||||
- agent budgets are recurring monthly budgets
|
||||
- project budgets are lifetime total budgets
|
||||
- hard-stop enforcement uses billed dollars, not tokens
|
||||
- monthly windows use UTC calendar months
|
||||
- project total budgets do not reset automatically
|
||||
|
||||
This gives a clean mental model:
|
||||
|
||||
- agents are ongoing workers, so monthly recurring budget is natural
|
||||
- projects are bounded workstreams, so lifetime cap is natural
|
||||
|
||||
### Metric To Enforce First
|
||||
|
||||
The first enforceable metric should be `billed_cents`.
|
||||
|
||||
Reasoning:
|
||||
|
||||
- it works across providers, billers, and models
|
||||
- it maps directly to real financial risk
|
||||
- it handles overage and metered usage consistently
|
||||
- it avoids cross-provider token normalization problems
|
||||
- it applies cleanly even when future finance events are not token-based
|
||||
|
||||
Token budgets should not be the first hard-stop policy.
|
||||
They should come later as advisory usage controls once the money-based system is solid.
|
||||
|
||||
### Subscription Usage Decision
|
||||
|
||||
Paperclip should separate subscription-included usage from billed spend:
|
||||
|
||||
- `subscription_included`
|
||||
- visible in reporting
|
||||
- visible in usage summaries
|
||||
- does not count against money budget
|
||||
- `subscription_overage`
|
||||
- visible in reporting
|
||||
- counts against money budget
|
||||
- `metered_api`
|
||||
- visible in reporting
|
||||
- counts against money budget
|
||||
|
||||
This keeps the budget system honest:
|
||||
|
||||
- users should not see "spend" rise for usage that did not incur marginal billed cost
|
||||
- users should still see the token usage and provider quota state
|
||||
|
||||
### Soft Alert Versus Hard Stop
|
||||
|
||||
Paperclip should have two threshold classes:
|
||||
|
||||
- soft alert
|
||||
- creates visible notification state
|
||||
- does not create an approval
|
||||
- does not pause work
|
||||
- hard stop
|
||||
- pauses the affected scope automatically
|
||||
- creates an approval requiring human resolution
|
||||
- prevents additional heartbeats or task pickup in that scope
|
||||
|
||||
Default thresholds:
|
||||
|
||||
- soft alert at `80%`
|
||||
- hard stop at `100%`
|
||||
|
||||
These should be configurable per policy later, but they are good defaults now.
|
||||
|
||||
## Scope Model
|
||||
|
||||
### Supported Scope Types
|
||||
|
||||
Budget policies should support:
|
||||
|
||||
- `company`
|
||||
- `agent`
|
||||
- `project`
|
||||
|
||||
This plan focuses on finishing `agent` and `project` first while preserving the existing company budget behavior.
|
||||
|
||||
### Recommended V1.5 Policy Presets
|
||||
|
||||
- Company
|
||||
- metric: `billed_cents`
|
||||
- window: `calendar_month_utc`
|
||||
- Agent
|
||||
- metric: `billed_cents`
|
||||
- window: `calendar_month_utc`
|
||||
- Project
|
||||
- metric: `billed_cents`
|
||||
- window: `lifetime`
|
||||
|
||||
Future extensions can add:
|
||||
|
||||
- token advisory policies
|
||||
- daily or weekly spend windows
|
||||
- provider- or biller-scoped budgets
|
||||
- inherited delegated budgets down the org tree
|
||||
|
||||
## Current Implementation Baseline
|
||||
|
||||
The current codebase is not starting from zero, but the existing shape is too ad hoc to extend safely.
|
||||
|
||||
### What Exists Today
|
||||
|
||||
- company and agent monthly cents counters
|
||||
- cost ingestion that updates those counters
|
||||
- agent hard-stop pause on monthly budget overrun
|
||||
|
||||
### What Is Missing
|
||||
|
||||
- project budgets
|
||||
- generic budget policy persistence
|
||||
- generic threshold crossing detection
|
||||
- incident deduplication per scope/window
|
||||
- approval creation on hard-stop
|
||||
- project execution blocking
|
||||
- budget timeline and incident UI
|
||||
- distinction between advisory quota and enforceable budget
|
||||
|
||||
## Proposed Data Model
|
||||
|
||||
### 1. `budget_policies`
|
||||
|
||||
Create a new table for canonical budget definitions.
|
||||
|
||||
Suggested fields:
|
||||
|
||||
- `id`
|
||||
- `company_id`
|
||||
- `scope_type`
|
||||
- `scope_id`
|
||||
- `metric`
|
||||
- `window_kind`
|
||||
- `amount`
|
||||
- `warn_percent`
|
||||
- `hard_stop_enabled`
|
||||
- `notify_enabled`
|
||||
- `is_active`
|
||||
- `created_by_user_id`
|
||||
- `updated_by_user_id`
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
|
||||
Notes:
|
||||
|
||||
- `scope_type` is one of `company | agent | project`
|
||||
- `scope_id` is nullable only for company-level policy if company is implied; otherwise keep it explicit
|
||||
- `metric` should start with `billed_cents`
|
||||
- `window_kind` starts with `calendar_month_utc | lifetime`
|
||||
- `amount` is stored in the natural unit of the metric
|
||||
|
||||
### 2. `budget_incidents`
|
||||
|
||||
Create a durable record of threshold crossings.
|
||||
|
||||
Suggested fields:
|
||||
|
||||
- `id`
|
||||
- `company_id`
|
||||
- `policy_id`
|
||||
- `scope_type`
|
||||
- `scope_id`
|
||||
- `metric`
|
||||
- `window_kind`
|
||||
- `window_start`
|
||||
- `window_end`
|
||||
- `threshold_type`
|
||||
- `amount_limit`
|
||||
- `amount_observed`
|
||||
- `status`
|
||||
- `approval_id` nullable
|
||||
- `activity_id` nullable
|
||||
- `resolved_at` nullable
|
||||
- `created_at`
|
||||
- `updated_at`
|
||||
|
||||
Notes:
|
||||
|
||||
- `threshold_type`: `soft | hard`
|
||||
- `status`: `open | acknowledged | resolved | dismissed`
|
||||
- one open incident per policy per threshold per window prevents duplicate approvals and alert spam
|
||||
|
||||
### 3. Project Pause State
|
||||
|
||||
Projects need explicit pause semantics.
|
||||
|
||||
Recommended approach:
|
||||
|
||||
- extend project status or add a pause field so a project can be blocked by budget
|
||||
- preserve whether the project is paused due to budget versus manually paused
|
||||
|
||||
Preferred shape:
|
||||
|
||||
- keep project workflow status as-is
|
||||
- add execution-state fields:
|
||||
- `execution_status`: `active | paused | archived`
|
||||
- `pause_reason`: `manual | budget | system | null`
|
||||
|
||||
If that is too large for the immediate pass, a smaller version is:
|
||||
|
||||
- add `paused_at`
|
||||
- add `pause_reason`
|
||||
|
||||
The key requirement is behavioral, not cosmetic:
|
||||
Paperclip must know that a project is budget-paused and enforce it.
|
||||
|
||||
### 4. Compatibility With Existing Budget Columns
|
||||
|
||||
Existing company and agent monthly budget columns should remain temporarily for compatibility.
|
||||
|
||||
Migration plan:
|
||||
|
||||
1. keep reading existing columns during transition
|
||||
2. create equivalent `budget_policies` rows
|
||||
3. switch enforcement and UI to policies
|
||||
4. later remove or deprecate legacy columns
|
||||
|
||||
## Budget Engine
|
||||
|
||||
Budget enforcement should move into a dedicated service.
|
||||
|
||||
Current logic is buried inside cost ingestion.
|
||||
That is too narrow because budget checks must apply at more than one execution boundary.
|
||||
|
||||
### Responsibilities
|
||||
|
||||
New service: `budgetService`
|
||||
|
||||
Responsibilities:
|
||||
|
||||
- resolve applicable policies for a cost event
|
||||
- compute current window totals
|
||||
- detect threshold crossings
|
||||
- create incidents, activities, and approvals
|
||||
- pause affected scopes on hard-stop
|
||||
- provide preflight enforcement checks for execution entry points
|
||||
|
||||
### Canonical Evaluation Flow
|
||||
|
||||
When a new `cost_event` is written:
|
||||
|
||||
1. persist the `cost_event`
|
||||
2. identify affected scopes
|
||||
- company
|
||||
- agent
|
||||
- project
|
||||
3. fetch active policies for those scopes
|
||||
4. compute current observed amount for each policy window
|
||||
5. compare to thresholds
|
||||
6. create soft incident if soft threshold crossed for first time in window
|
||||
7. create hard incident if hard threshold crossed for first time in window
|
||||
8. if hard incident:
|
||||
- pause the scope
|
||||
- create approval
|
||||
- create activity event
|
||||
- emit notification state
|
||||
|
||||
### Preflight Enforcement Checks
|
||||
|
||||
Budget enforcement cannot rely only on post-hoc cost ingestion.
|
||||
|
||||
Paperclip must also block execution before new work starts.
|
||||
|
||||
Add budget checks to:
|
||||
|
||||
- scheduler heartbeat dispatch
|
||||
- manual invoke endpoints
|
||||
- assignment-driven wakeups
|
||||
- queued run promotion
|
||||
- issue checkout or pickup paths where applicable
|
||||
|
||||
If a scope is budget-paused:
|
||||
|
||||
- do not start a new heartbeat
|
||||
- do not let the agent pick up additional work
|
||||
- present a clear reason in API and UI
|
||||
|
||||
### Active Run Behavior
|
||||
|
||||
When a hard-stop is triggered while a run is already active:
|
||||
|
||||
- mark scope paused immediately for future work
|
||||
- request graceful cancellation of the current run
|
||||
- allow normal cancellation timeout behavior
|
||||
- write activity explaining that pause came from budget enforcement
|
||||
|
||||
This mirrors the general pause semantics already expected by the product.
|
||||
|
||||
## Approval Model
|
||||
|
||||
Budget hard-stops should create a first-class approval.
|
||||
|
||||
### New Approval Type
|
||||
|
||||
Add approval type:
|
||||
|
||||
- `budget_override_required`
|
||||
|
||||
Payload should include:
|
||||
|
||||
- `scopeType`
|
||||
- `scopeId`
|
||||
- `scopeName`
|
||||
- `metric`
|
||||
- `windowKind`
|
||||
- `thresholdType`
|
||||
- `budgetAmount`
|
||||
- `observedAmount`
|
||||
- `windowStart`
|
||||
- `windowEnd`
|
||||
- `topDrivers`
|
||||
- `paused`
|
||||
|
||||
### Resolution Actions
|
||||
|
||||
The approval UI should support:
|
||||
|
||||
- raise budget and resume
|
||||
- resume once without changing policy
|
||||
- keep paused
|
||||
|
||||
Optional later action:
|
||||
|
||||
- disable budget policy
|
||||
|
||||
### Soft Alerts Do Not Need Approval
|
||||
|
||||
Soft alerts should create:
|
||||
|
||||
- activity event
|
||||
- dashboard alert
|
||||
- inbox notification or similar board-visible signal
|
||||
|
||||
They should not create an approval by default.
|
||||
|
||||
## Notification And Activity Model
|
||||
|
||||
Budget events need obvious operator visibility.
|
||||
|
||||
Required outputs:
|
||||
|
||||
- activity log entry on threshold crossings
|
||||
- dashboard surface for active budget incidents
|
||||
- detail page banner on paused agent or project
|
||||
- `/costs` summary of active incidents and policy health
|
||||
|
||||
Later channels:
|
||||
|
||||
- email
|
||||
- webhook
|
||||
- Slack or other integrations
|
||||
|
||||
## API Plan
|
||||
|
||||
### Policy Management
|
||||
|
||||
Add routes for:
|
||||
|
||||
- list budget policies for company
|
||||
- create budget policy
|
||||
- update budget policy
|
||||
- archive or disable budget policy
|
||||
|
||||
### Incident Surfaces
|
||||
|
||||
Add routes for:
|
||||
|
||||
- list active budget incidents
|
||||
- list incident history
|
||||
- get incident detail for a scope
|
||||
|
||||
### Approval Resolution
|
||||
|
||||
Budget approvals should use the existing approval system once the new approval type is added.
|
||||
|
||||
Expected flows:
|
||||
|
||||
- create approval on hard-stop
|
||||
- resolve approval by changing policy and resuming
|
||||
- resolve approval by resuming once
|
||||
|
||||
### Execution Errors
|
||||
|
||||
When work is blocked by budget, the API should return explicit errors.
|
||||
|
||||
Examples:
|
||||
|
||||
- agent invocation blocked because agent budget is paused
|
||||
- issue execution blocked because project budget is paused
|
||||
|
||||
Do not silently no-op.
|
||||
|
||||
## UI Plan
|
||||
|
||||
Budgeting should be visible in the places where operators make decisions.
|
||||
|
||||
### `/costs`
|
||||
|
||||
Add a budget section that includes:
|
||||
|
||||
- active budget incidents
|
||||
- policy list with scope, window, metric, and threshold state
|
||||
- progress bars for current period or total
|
||||
- clear distinction between:
|
||||
- spend budget
|
||||
- subscription quota
|
||||
- quick actions:
|
||||
- raise budget
|
||||
- open approval
|
||||
- resume scope if permitted
|
||||
|
||||
The page should make this visual distinction obvious:
|
||||
|
||||
- Budget
|
||||
- enforceable spend policy
|
||||
- Quota
|
||||
- provider or subscription usage window
|
||||
|
||||
### Agent Detail
|
||||
|
||||
Add an agent budget card:
|
||||
|
||||
- monthly budget amount
|
||||
- current month spend
|
||||
- remaining spend
|
||||
- status
|
||||
- warning or paused banner
|
||||
- link to approval if blocked
|
||||
|
||||
### Project Detail
|
||||
|
||||
Add a project budget card:
|
||||
|
||||
- total budget amount
|
||||
- total spend to date
|
||||
- remaining spend
|
||||
- pause status
|
||||
- approval link
|
||||
|
||||
Project detail should also show if issue execution is blocked because the project is budget-paused.
|
||||
|
||||
### Dashboard
|
||||
|
||||
Add a high-signal budget section:
|
||||
|
||||
- active budget breaches
|
||||
- upcoming soft alerts
|
||||
- counts of paused agents and paused projects due to budget
|
||||
|
||||
The operator should not have to visit `/costs` to learn that work has stopped.
|
||||
|
||||
## Budget Math
|
||||
|
||||
### What Counts Toward Budget
|
||||
|
||||
For V1.5 enforcement, include:
|
||||
|
||||
- `metered_api` cost events
|
||||
- `subscription_overage` cost events
|
||||
- any future request-scoped cost event with non-zero billed cents
|
||||
|
||||
Do not include:
|
||||
|
||||
- `subscription_included` cost events with zero billed cents
|
||||
- advisory quota rows
|
||||
- account-level finance events unless and until company-level financial budgets are added explicitly
|
||||
|
||||
### Why Not Tokens First
|
||||
|
||||
Token budgets should not be the first hard-stop because:
|
||||
|
||||
- providers count tokens differently
|
||||
- cached tokens complicate simple totals
|
||||
- some future charges are not token-based
|
||||
- subscription tokens do not necessarily imply spend
|
||||
- money remains the cleanest cross-provider enforcement metric
|
||||
|
||||
### Future Budget Metrics
|
||||
|
||||
Future policy metrics can include:
|
||||
|
||||
- `total_tokens`
|
||||
- `input_tokens`
|
||||
- `output_tokens`
|
||||
- `requests`
|
||||
- `finance_amount_cents`
|
||||
|
||||
But they should enter only after the money-budget path is stable.
|
||||
|
||||
## Migration Plan
|
||||
|
||||
### Phase 1: Foundation
|
||||
|
||||
- add `budget_policies`
|
||||
- add `budget_incidents`
|
||||
- add new approval type
|
||||
- add project pause metadata
|
||||
|
||||
### Phase 2: Compatibility
|
||||
|
||||
- backfill policies from existing company and agent monthly budget columns
|
||||
- keep legacy columns readable during migration
|
||||
|
||||
### Phase 3: Enforcement
|
||||
|
||||
- move budget logic into dedicated service
|
||||
- add hard-stop incident creation
|
||||
- add activity and approval creation
|
||||
- add execution guards on heartbeat and invoke paths
|
||||
|
||||
### Phase 4: UI
|
||||
|
||||
- `/costs` budget section
|
||||
- agent detail budget card
|
||||
- project detail budget card
|
||||
- dashboard incident summary
|
||||
|
||||
### Phase 5: Cleanup
|
||||
|
||||
- move all reads/writes to `budget_policies`
|
||||
- reduce legacy column reliance
|
||||
- decide whether to remove old budget columns
|
||||
|
||||
## Tests
|
||||
|
||||
Required coverage:
|
||||
|
||||
- agent monthly budget soft alert at 80%
|
||||
- agent monthly budget hard-stop at 100%
|
||||
- project lifetime budget soft alert
|
||||
- project lifetime budget hard-stop
|
||||
- `subscription_included` usage does not consume money budget
|
||||
- `subscription_overage` does consume money budget
|
||||
- hard-stop creates one incident per threshold per window
|
||||
- hard-stop creates approval and pauses correct scope
|
||||
- paused project blocks new issue execution
|
||||
- paused agent blocks new heartbeat dispatch
|
||||
- policy update and resume clears or resolves active incident correctly
|
||||
- dashboard and `/costs` surface active incidents
|
||||
|
||||
## Open Questions
|
||||
|
||||
These should be explicitly deferred unless they block implementation:
|
||||
|
||||
- Should project budgets also support monthly mode, or is lifetime enough for the first release?
|
||||
- Should company-level budgets eventually include `finance_events` such as OpenRouter top-up fees and Bedrock provisioned charges?
|
||||
- Should delegated budget editing be limited by org hierarchy in V1, or remain board-only in the UI even if the data model can support delegation later?
|
||||
- Do we need "resume once" immediately, or can first approval resolution be "raise budget and resume" plus "keep paused"?
|
||||
|
||||
## Recommendation
|
||||
|
||||
Implement the first coherent budgeting system with these rules:
|
||||
|
||||
- Agent budget = monthly billed dollars
|
||||
- Project budget = lifetime billed dollars
|
||||
- Hard-stop = auto-pause + approval
|
||||
- Soft alert = visible warning, no approval
|
||||
- Subscription usage = visible quota and token reporting, not money-budget enforcement
|
||||
|
||||
This solves the real operator problem without mixing together spend control, provider quota windows, and token accounting.
|
||||
Reference in New Issue
Block a user