15 KiB
Budget Policies and Enforcement
Context
Paperclip already treats budgets as a core control-plane responsibility:
doc/SPEC.mdgives the Board authority to set budgets, pause agents, pause work, and override any budget.doc/SPEC-implementation.mdsays V1 must support monthly UTC budget windows, soft alerts, and hard auto-pause.- the current code only partially implements that intent.
Today the system has narrow money-budget behavior:
- companies track
budgetMonthlyCentsandspentMonthlyCents - agents track
budgetMonthlyCentsandspentMonthlyCents cost_eventsingestion increments those counters- when an agent exceeds its monthly budget, the agent is paused
That leaves major product gaps:
- no project budget model
- no approval generated when budget is hit
- no generic budget policy system
- no project pause semantics tied to budget
- no durable incident tracking to prevent duplicate alerts
- no separation between enforceable spend budgets and advisory usage quotas
This plan defines the concrete budgeting model Paperclip should implement next.
Product Goals
Paperclip should let operators:
- Set budgets on agents and projects.
- Understand whether a budget is based on money or usage.
- Be warned before a budget is exhausted.
- Automatically pause work when a hard budget is hit.
- Approve, raise, or resume from a budget stop using obvious UI.
- See budget state on the dashboard,
/costs, and scope detail pages.
The system should make one thing very clear:
- budgets are policy controls
- quotas are usage visibility
They are related, but they are not the same concept.
Product Decisions
V1 Budget Defaults
For the next implementation pass, Paperclip should enforce these defaults:
- agent budgets are recurring monthly budgets
- project budgets are lifetime total budgets
- hard-stop enforcement uses billed dollars, not tokens
- monthly windows use UTC calendar months
- project total budgets do not reset automatically
This gives a clean mental model:
- agents are ongoing workers, so monthly recurring budget is natural
- projects are bounded workstreams, so lifetime cap is natural
Metric To Enforce First
The first enforceable metric should be billed_cents.
Reasoning:
- it works across providers, billers, and models
- it maps directly to real financial risk
- it handles overage and metered usage consistently
- it avoids cross-provider token normalization problems
- it applies cleanly even when future finance events are not token-based
Token budgets should not be the first hard-stop policy. They should come later as advisory usage controls once the money-based system is solid.
Subscription Usage Decision
Paperclip should separate subscription-included usage from billed spend:
subscription_included- visible in reporting
- visible in usage summaries
- does not count against money budget
subscription_overage- visible in reporting
- counts against money budget
metered_api- visible in reporting
- counts against money budget
This keeps the budget system honest:
- users should not see "spend" rise for usage that did not incur marginal billed cost
- users should still see the token usage and provider quota state
Soft Alert Versus Hard Stop
Paperclip should have two threshold classes:
- soft alert
- creates visible notification state
- does not create an approval
- does not pause work
- hard stop
- pauses the affected scope automatically
- creates an approval requiring human resolution
- prevents additional heartbeats or task pickup in that scope
Default thresholds:
- soft alert at
80% - hard stop at
100%
These should be configurable per policy later, but they are good defaults now.
Scope Model
Supported Scope Types
Budget policies should support:
companyagentproject
This plan focuses on finishing agent and project first while preserving the existing company budget behavior.
Recommended V1.5 Policy Presets
- Company
- metric:
billed_cents - window:
calendar_month_utc
- metric:
- Agent
- metric:
billed_cents - window:
calendar_month_utc
- metric:
- Project
- metric:
billed_cents - window:
lifetime
- metric:
Future extensions can add:
- token advisory policies
- daily or weekly spend windows
- provider- or biller-scoped budgets
- inherited delegated budgets down the org tree
Current Implementation Baseline
The current codebase is not starting from zero, but the existing shape is too ad hoc to extend safely.
What Exists Today
- company and agent monthly cents counters
- cost ingestion that updates those counters
- agent hard-stop pause on monthly budget overrun
What Is Missing
- project budgets
- generic budget policy persistence
- generic threshold crossing detection
- incident deduplication per scope/window
- approval creation on hard-stop
- project execution blocking
- budget timeline and incident UI
- distinction between advisory quota and enforceable budget
Proposed Data Model
1. budget_policies
Create a new table for canonical budget definitions.
Suggested fields:
idcompany_idscope_typescope_idmetricwindow_kindamountwarn_percenthard_stop_enablednotify_enabledis_activecreated_by_user_idupdated_by_user_idcreated_atupdated_at
Notes:
scope_typeis one ofcompany | agent | projectscope_idis nullable only for company-level policy if company is implied; otherwise keep it explicitmetricshould start withbilled_centswindow_kindstarts withcalendar_month_utc | lifetimeamountis stored in the natural unit of the metric
2. budget_incidents
Create a durable record of threshold crossings.
Suggested fields:
idcompany_idpolicy_idscope_typescope_idmetricwindow_kindwindow_startwindow_endthreshold_typeamount_limitamount_observedstatusapproval_idnullableactivity_idnullableresolved_atnullablecreated_atupdated_at
Notes:
threshold_type:soft | hardstatus:open | acknowledged | resolved | dismissed- one open incident per policy per threshold per window prevents duplicate approvals and alert spam
3. Project Pause State
Projects need explicit pause semantics.
Recommended approach:
- extend project status or add a pause field so a project can be blocked by budget
- preserve whether the project is paused due to budget versus manually paused
Preferred shape:
- keep project workflow status as-is
- add execution-state fields:
execution_status:active | paused | archivedpause_reason:manual | budget | system | null
If that is too large for the immediate pass, a smaller version is:
- add
paused_at - add
pause_reason
The key requirement is behavioral, not cosmetic: Paperclip must know that a project is budget-paused and enforce it.
4. Compatibility With Existing Budget Columns
Existing company and agent monthly budget columns should remain temporarily for compatibility.
Migration plan:
- keep reading existing columns during transition
- create equivalent
budget_policiesrows - switch enforcement and UI to policies
- later remove or deprecate legacy columns
Budget Engine
Budget enforcement should move into a dedicated service.
Current logic is buried inside cost ingestion. That is too narrow because budget checks must apply at more than one execution boundary.
Responsibilities
New service: budgetService
Responsibilities:
- resolve applicable policies for a cost event
- compute current window totals
- detect threshold crossings
- create incidents, activities, and approvals
- pause affected scopes on hard-stop
- provide preflight enforcement checks for execution entry points
Canonical Evaluation Flow
When a new cost_event is written:
- persist the
cost_event - identify affected scopes
- company
- agent
- project
- fetch active policies for those scopes
- compute current observed amount for each policy window
- compare to thresholds
- create soft incident if soft threshold crossed for first time in window
- create hard incident if hard threshold crossed for first time in window
- if hard incident:
- pause the scope
- create approval
- create activity event
- emit notification state
Preflight Enforcement Checks
Budget enforcement cannot rely only on post-hoc cost ingestion.
Paperclip must also block execution before new work starts.
Add budget checks to:
- scheduler heartbeat dispatch
- manual invoke endpoints
- assignment-driven wakeups
- queued run promotion
- issue checkout or pickup paths where applicable
If a scope is budget-paused:
- do not start a new heartbeat
- do not let the agent pick up additional work
- present a clear reason in API and UI
Active Run Behavior
When a hard-stop is triggered while a run is already active:
- mark scope paused immediately for future work
- request graceful cancellation of the current run
- allow normal cancellation timeout behavior
- write activity explaining that pause came from budget enforcement
This mirrors the general pause semantics already expected by the product.
Approval Model
Budget hard-stops should create a first-class approval.
New Approval Type
Add approval type:
budget_override_required
Payload should include:
scopeTypescopeIdscopeNamemetricwindowKindthresholdTypebudgetAmountobservedAmountwindowStartwindowEndtopDriverspaused
Resolution Actions
The approval UI should support:
- raise budget and resume
- resume once without changing policy
- keep paused
Optional later action:
- disable budget policy
Soft Alerts Do Not Need Approval
Soft alerts should create:
- activity event
- dashboard alert
- inbox notification or similar board-visible signal
They should not create an approval by default.
Notification And Activity Model
Budget events need obvious operator visibility.
Required outputs:
- activity log entry on threshold crossings
- dashboard surface for active budget incidents
- detail page banner on paused agent or project
/costssummary of active incidents and policy health
Later channels:
- webhook
- Slack or other integrations
API Plan
Policy Management
Add routes for:
- list budget policies for company
- create budget policy
- update budget policy
- archive or disable budget policy
Incident Surfaces
Add routes for:
- list active budget incidents
- list incident history
- get incident detail for a scope
Approval Resolution
Budget approvals should use the existing approval system once the new approval type is added.
Expected flows:
- create approval on hard-stop
- resolve approval by changing policy and resuming
- resolve approval by resuming once
Execution Errors
When work is blocked by budget, the API should return explicit errors.
Examples:
- agent invocation blocked because agent budget is paused
- issue execution blocked because project budget is paused
Do not silently no-op.
UI Plan
Budgeting should be visible in the places where operators make decisions.
/costs
Add a budget section that includes:
- active budget incidents
- policy list with scope, window, metric, and threshold state
- progress bars for current period or total
- clear distinction between:
- spend budget
- subscription quota
- quick actions:
- raise budget
- open approval
- resume scope if permitted
The page should make this visual distinction obvious:
- Budget
- enforceable spend policy
- Quota
- provider or subscription usage window
Agent Detail
Add an agent budget card:
- monthly budget amount
- current month spend
- remaining spend
- status
- warning or paused banner
- link to approval if blocked
Project Detail
Add a project budget card:
- total budget amount
- total spend to date
- remaining spend
- pause status
- approval link
Project detail should also show if issue execution is blocked because the project is budget-paused.
Dashboard
Add a high-signal budget section:
- active budget breaches
- upcoming soft alerts
- counts of paused agents and paused projects due to budget
The operator should not have to visit /costs to learn that work has stopped.
Budget Math
What Counts Toward Budget
For V1.5 enforcement, include:
metered_apicost eventssubscription_overagecost events- any future request-scoped cost event with non-zero billed cents
Do not include:
subscription_includedcost events with zero billed cents- advisory quota rows
- account-level finance events unless and until company-level financial budgets are added explicitly
Why Not Tokens First
Token budgets should not be the first hard-stop because:
- providers count tokens differently
- cached tokens complicate simple totals
- some future charges are not token-based
- subscription tokens do not necessarily imply spend
- money remains the cleanest cross-provider enforcement metric
Future Budget Metrics
Future policy metrics can include:
total_tokensinput_tokensoutput_tokensrequestsfinance_amount_cents
But they should enter only after the money-budget path is stable.
Migration Plan
Phase 1: Foundation
- add
budget_policies - add
budget_incidents - add new approval type
- add project pause metadata
Phase 2: Compatibility
- backfill policies from existing company and agent monthly budget columns
- keep legacy columns readable during migration
Phase 3: Enforcement
- move budget logic into dedicated service
- add hard-stop incident creation
- add activity and approval creation
- add execution guards on heartbeat and invoke paths
Phase 4: UI
/costsbudget section- agent detail budget card
- project detail budget card
- dashboard incident summary
Phase 5: Cleanup
- move all reads/writes to
budget_policies - reduce legacy column reliance
- decide whether to remove old budget columns
Tests
Required coverage:
- agent monthly budget soft alert at 80%
- agent monthly budget hard-stop at 100%
- project lifetime budget soft alert
- project lifetime budget hard-stop
subscription_includedusage does not consume money budgetsubscription_overagedoes consume money budget- hard-stop creates one incident per threshold per window
- hard-stop creates approval and pauses correct scope
- paused project blocks new issue execution
- paused agent blocks new heartbeat dispatch
- policy update and resume clears or resolves active incident correctly
- dashboard and
/costssurface active incidents
Open Questions
These should be explicitly deferred unless they block implementation:
- Should project budgets also support monthly mode, or is lifetime enough for the first release?
- Should company-level budgets eventually include
finance_eventssuch as OpenRouter top-up fees and Bedrock provisioned charges? - Should delegated budget editing be limited by org hierarchy in V1, or remain board-only in the UI even if the data model can support delegation later?
- Do we need "resume once" immediately, or can first approval resolution be "raise budget and resume" plus "keep paused"?
Recommendation
Implement the first coherent budgeting system with these rules:
- Agent budget = monthly billed dollars
- Project budget = lifetime billed dollars
- Hard-stop = auto-pause + approval
- Soft alert = visible warning, no approval
- Subscription usage = visible quota and token reporting, not money-budget enforcement
This solves the real operator problem without mixing together spend control, provider quota windows, and token accounting.