Files
paperclip/doc/plans/2026-03-14-billing-ledger-and-reporting.md

469 lines
14 KiB
Markdown

# Billing Ledger and Reporting
## Context
Paperclip currently stores model spend in `cost_events` and operational run state in `heartbeat_runs`.
That split is fine, but the current reporting code tries to infer billing semantics by mixing both tables:
- `cost_events` knows provider, model, tokens, and dollars
- `heartbeat_runs.usage_json` knows some per-run billing metadata
- `heartbeat_runs.usage_json` does **not** currently carry enough normalized billing dimensions to support honest provider-level reporting
This becomes incorrect as soon as a company uses more than one provider, more than one billing channel, or more than one billing mode.
Examples:
- direct OpenAI API usage
- Claude subscription usage with zero marginal dollars
- subscription overage with dollars and tokens
- OpenRouter billing where the biller is OpenRouter but the upstream provider is Anthropic or OpenAI
The system needs to support:
- dollar reporting
- token reporting
- subscription-included usage
- subscription overage
- direct metered API usage
- future aggregator billing such as OpenRouter
## Product Decision
`cost_events` becomes the canonical billing and usage ledger for reporting.
`heartbeat_runs` remains an operational execution log. It may keep mirrored billing metadata for debugging and transcripts, but reporting must not reconstruct billing semantics from `heartbeat_runs.usage_json`.
## Decision: One Ledger Or Two
We do **not** need two tables to solve the current PR's problem.
For request-level inference reporting, `cost_events` is enough if it carries the right dimensions:
- upstream provider
- biller
- billing type
- model
- token fields
- billed amount
That is why the first implementation pass extends `cost_events` instead of introducing a second table immediately.
However, if Paperclip needs to account for the full billing surface of aggregators and managed AI platforms, then `cost_events` alone is not enough.
Some charges are not cleanly representable as a single model inference event:
- account top-ups and credit purchases
- platform fees charged at purchase time
- BYOK platform fees that are account-level or threshold-based
- prepaid credit expirations, refunds, and adjustments
- provisioned throughput commitments
- fine-tuning, training, model import, and storage charges
- gateway logging or other platform overhead that is not attributable to one prompt/response pair
So the decision is:
- near term: keep `cost_events` as the inference and usage ledger
- next phase: add `finance_events` for non-inference financial events
This is a deliberate split between:
- usage and inference accounting
- account-level and platform-level financial accounting
That separation keeps request reporting honest without forcing us to fake invoice semantics onto rows that were never request-scoped.
## External Motivation And Sources
The need for this model is not theoretical.
It follows directly from the billing systems of providers and aggregators Paperclip needs to support.
### OpenRouter
Source URLs:
- https://openrouter.ai/docs/faq#credit-and-billing-systems
- https://openrouter.ai/pricing
Relevant billing behavior as of March 14, 2026:
- OpenRouter passes through underlying inference pricing and deducts request cost from purchased credits.
- OpenRouter charges a 5.5% fee with a $0.80 minimum when purchasing credits.
- Crypto payments are charged a 5% fee.
- BYOK has its own fee model after a free request threshold.
- OpenRouter billing is aggregated at the OpenRouter account level even when the upstream provider is Anthropic, OpenAI, Google, or another provider.
Implication for Paperclip:
- request usage belongs in `cost_events`
- credit purchases, purchase fees, BYOK fees, refunds, and expirations belong in `finance_events`
- `biller=openrouter` must remain distinct from `provider=anthropic|openai|google|...`
### Cloudflare AI Gateway Unified Billing
Source URL:
- https://developers.cloudflare.com/ai-gateway/features/unified-billing/
Relevant billing behavior as of March 14, 2026:
- Unified Billing lets users call multiple upstream providers while receiving a single Cloudflare bill.
- Usage is paid from Cloudflare-loaded credits.
- Cloudflare supports manual top-ups and auto top-up thresholds.
- Spend limits can stop request processing on daily, weekly, or monthly boundaries.
- Unified Billing traffic can use Cloudflare-managed credentials rather than the user's direct provider key.
Implication for Paperclip:
- request usage needs `biller=cloudflare`
- upstream provider still needs to be preserved separately
- Cloudflare credit loads and related account-level events are not inference rows and should not be forced into `cost_events`
- quota and limits reporting must support biller-level controls, not just upstream provider limits
### Amazon Bedrock
Source URL:
- https://aws.amazon.com/bedrock/pricing/
Relevant billing behavior as of March 14, 2026:
- Bedrock supports on-demand and batch pricing.
- Bedrock pricing varies by region.
- some pricing tiers add premiums or discounts relative to standard pricing
- provisioned throughput is commitment-based rather than request-based
- custom model import uses Custom Model Units billed per minute, with monthly storage charges
- imported model copies are billed in 5-minute windows once active
- customization and fine-tuning introduce training and hosted-model charges beyond normal inference
Implication for Paperclip:
- normal tokenized inference fits in `cost_events`
- provisioned throughput, custom model unit charges, training, and storage charges require `finance_events`
- region and pricing tier need to be first-class dimensions in the financial model
## Ledger Boundary
To keep the system coherent, the table boundary should be explicit.
### `cost_events`
Use `cost_events` for request-scoped usage and inference charges:
- one row per billable or usage-bearing run event
- provider/model/biller/billingType/tokens/cost
- optionally tied to `heartbeat_run_id`
- supports direct APIs, subscriptions, overage, OpenRouter-routed inference, Cloudflare-routed inference, and Bedrock on-demand inference
### `finance_events`
Use `finance_events` for account-scoped or platform-scoped financial events:
- credit purchase
- top-up
- refund
- fee
- expiry
- provisioned capacity
- training
- model import
- storage
- invoice adjustment
These rows may or may not have a related model, provider, or run id.
Trying to force them into `cost_events` would either create fake request rows or create null-heavy rows that mean something fundamentally different from inference usage.
## Canonical Billing Dimensions
Every persisted billing event should model four separate axes:
1. Usage provider
The upstream provider whose model performed the work.
Examples: `openai`, `anthropic`, `google`.
2. Biller
The system that charged for the usage.
Examples: `openai`, `anthropic`, `openrouter`, `cursor`, `chatgpt`.
3. Billing type
The pricing mode applied to the event.
Initial canonical values:
- `metered_api`
- `subscription_included`
- `subscription_overage`
- `credits`
- `fixed`
- `unknown`
4. Measures
Usage and billing must both be storable:
- `input_tokens`
- `output_tokens`
- `cached_input_tokens`
- `cost_cents`
These dimensions are independent.
For example, an event may be:
- provider: `anthropic`
- biller: `openrouter`
- billing type: `metered_api`
- tokens: non-zero
- cost cents: non-zero
Or:
- provider: `anthropic`
- biller: `anthropic`
- billing type: `subscription_included`
- tokens: non-zero
- cost cents: `0`
## Schema Changes
Extend `cost_events` with:
- `heartbeat_run_id uuid null references heartbeat_runs.id`
- `biller text not null default 'unknown'`
- `billing_type text not null default 'unknown'`
- `cached_input_tokens int not null default 0`
Keep `provider` as the upstream usage provider.
Do not overload `provider` to mean biller.
Add a future `finance_events` table for account-level financial events with fields along these lines:
- `company_id`
- `occurred_at`
- `event_kind`
- `direction`
- `biller`
- `provider nullable`
- `execution_adapter_type nullable`
- `pricing_tier nullable`
- `region nullable`
- `model nullable`
- `quantity nullable`
- `unit nullable`
- `amount_cents`
- `currency`
- `estimated`
- `related_cost_event_id nullable`
- `related_heartbeat_run_id nullable`
- `external_invoice_id nullable`
- `metadata_json nullable`
Add indexes:
- `(company_id, biller, occurred_at)`
- `(company_id, provider, occurred_at)`
- `(company_id, heartbeat_run_id)` if distinct-run reporting remains common
## Shared Contract Changes
### Shared types
Add a shared billing type union and enrich cost types with:
- `heartbeatRunId`
- `biller`
- `billingType`
- `cachedInputTokens`
Update reporting response types so the provider breakdown reflects the ledger directly rather than inferred run metadata.
### Validators
Extend `createCostEventSchema` to accept:
- `heartbeatRunId`
- `biller`
- `billingType`
- `cachedInputTokens`
Defaults:
- `biller` defaults to `provider`
- `billingType` defaults to `unknown`
- `cachedInputTokens` defaults to `0`
## Adapter Contract Changes
Extend adapter execution results so they can report:
- `biller`
- richer billing type values
Backwards compatibility:
- existing adapter values `api` and `subscription` are treated as legacy aliases
- map `api -> metered_api`
- map `subscription -> subscription_included`
Future adapters may emit the canonical values directly.
OpenRouter support will use:
- `provider` = upstream provider when known
- `biller` = `openrouter`
- `billingType` = `metered_api` unless OpenRouter later exposes another billing mode
Cloudflare Unified Billing support will use:
- `provider` = upstream provider when known
- `biller` = `cloudflare`
- `billingType` = `credits` or `metered_api` depending on the normalized request billing contract
Bedrock support will use:
- `provider` = upstream provider or `aws_bedrock` depending on adapter shape
- `biller` = `aws_bedrock`
- `billingType` = request-scoped mode for inference rows
- `finance_events` for provisioned, training, import, and storage charges
## Write Path Changes
### Heartbeat-created events
When a heartbeat run produces usage or spend:
1. normalize adapter billing metadata
2. write a ledger row to `cost_events`
3. attach `heartbeat_run_id`
4. set `provider`, `biller`, `billing_type`, token fields, and `cost_cents`
The write path should no longer depend on later inference from `heartbeat_runs`.
### Manual API-created events
Manual cost event creation remains supported.
These events may have `heartbeatRunId = null`.
Rules:
- `provider` remains required
- `biller` defaults to `provider`
- `billingType` defaults to `unknown`
## Reporting Changes
### Server
Refactor reporting queries to use `cost_events` only.
#### `summary`
- sum `cost_cents`
#### `by-agent`
- sum costs and token fields from `cost_events`
- use `count(distinct heartbeat_run_id)` filtered by billing type for run counts
- use token sums filtered by billing type for subscription usage
#### `by-provider`
- group by `provider`, `model`
- sum costs and token fields directly from the ledger
- derive billing-type slices from `cost_events.billing_type`
- never pro-rate from unrelated `heartbeat_runs`
#### future `by-biller`
- group by `biller`
- this is the right view for invoice and subscription accountability
#### `window-spend`
- continue to use `cost_events`
#### project attribution
Keep current project attribution logic for now, but prefer `cost_events.heartbeat_run_id` as the join anchor whenever possible.
## UI Changes
### Principles
- Spend, usage, and quota are related but distinct
- a missing quota fetch is not the same as “no quota”
- provider and biller are different dimensions
### Immediate UI changes
1. Keep the current costs page structure.
2. Make the provider cards accurate by reading only ledger-backed values.
3. Show provider quota fetch errors explicitly instead of dropping them.
### Follow-up UI direction
The long-term board UI should expose:
- Spend
Dollars by biller, provider, model, agent, project
- Usage
Tokens by provider, model, agent, project
- Quotas
Live provider or biller limits, credits, and reset windows
- Financial events
Credit purchases, top-ups, fees, refunds, commitments, storage, and other non-inference charges
## Migration Plan
Migration behavior:
- add new non-destructive columns with defaults
- backfill existing rows:
- `biller = provider`
- `billing_type = 'unknown'`
- `cached_input_tokens = 0`
- `heartbeat_run_id = null`
Do **not** attempt to backfill historical provider-level subscription attribution from `heartbeat_runs`.
That data was never stored with the required dimensions.
## Testing Plan
Add or update tests for:
1. heartbeat-created ledger rows persist `heartbeatRunId`, `biller`, `billingType`, and cached tokens
2. legacy adapter billing values map correctly
3. provider reporting uses ledger data only
4. mixed-provider companies do not cross-attribute subscription usage
5. zero-dollar subscription usage still appears in token reporting
6. quota fetch failures render explicit UI state
7. manual cost events still validate and write correctly
8. biller reporting keeps upstream provider breakdowns separate
9. OpenRouter-style rows can show `biller=openrouter` with non-OpenRouter upstream providers
10. Cloudflare-style rows can show `biller=cloudflare` with preserved upstream provider identity
11. future `finance_events` aggregation handles non-request charges without requiring a model or run id
## Delivery Plan
### Step 1
- land the ledger contract and query rewrite
- make the current costs page correct
### Step 2
- add biller-oriented reporting endpoints and UI
### Step 3
- wire OpenRouter and any future aggregator adapters to the same contract
### Step 4
- add `executionAdapterType` to persisted cost reporting if adapter-level grouping becomes a product requirement
### Step 5
- introduce `finance_events`
- add non-inference accounting endpoints
- add UI for platform/account charges alongside inference spend and usage
## Non-Goals For This Change
- multi-currency support
- invoice reconciliation
- provider-specific cost estimation beyond persisted billed cost
- replacing `heartbeat_runs` as the operational run record