Heartbeat service now resolves session state per-task using agentTaskSessions, with resolveNextSessionState handling codec-based serialization and fallback to legacy sessionId. Queued runs are chained — when a run finishes or is reaped, the next queued run for the same agent starts automatically. Queued runs for an agent with an already-running run wait instead of failing. Add task-sessions list endpoint and extend reset-session to accept optional taskKey for targeted session clearing. Block pending_approval agents from API key auth. Update agent/company delete cascades to include task sessions. Update spec docs with task-session architecture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
25 KiB
Agent Runs Subsystem Spec
Status: Draft
Date: 2026-02-17
Audience: Product + Engineering
Scope: Agent execution runtime, adapter protocol, wakeup orchestration, and live status delivery
1. Document Role
This spec defines how Paperclip actually runs agents while staying runtime-agnostic.
doc/SPEC-implementation.mdremains the V1 baseline contract.- This document adds concrete subsystem detail for agent execution, including local CLI adapters, runtime state persistence, wakeup scheduling, and browser live updates.
- If this doc conflicts with current runtime behavior in code, this doc is the target behavior for upcoming implementation.
2. Captured Intent (From Request)
The following intentions are explicitly preserved in this spec:
- Paperclip is adapter-agnostic. The key is a protocol, not a specific runtime.
- We still need default built-ins to make the system useful immediately.
- First two built-ins are
claude-localandcodex-local. - Those adapters run local CLIs directly on the host machine, unsandboxed.
- Agent config includes working directory and initial/default prompt.
- Heartbeats run the configured adapter process, Paperclip manages lifecycle, and on exit Paperclip parses JSON output and updates state.
- Session IDs and token usage must be persisted so later heartbeats can resume.
- Adapters should support status updates (short message + color) and optional streaming logs.
- UI should support prompt template "pills" for variable insertion.
- CLI errors must be visible in full (or as much as possible) in the UI.
- Status changes must live-update across task and agent views via server push.
- Wakeup triggers should be centralized by a heartbeat/wakeup service with at least:
- timer interval
- wake on task assignment
- explicit ping/request
3. Goals and Non-Goals
3.1 Goals
- Define a stable adapter protocol that supports multiple runtimes.
- Ship production-usable local adapters for Claude CLI and Codex CLI.
- Persist adapter runtime state (session IDs, token/cost usage, last errors).
- Centralize wakeup decisions and queueing in one service.
- Provide realtime run/task/agent updates to the browser.
- Support deployment-specific full-log storage without bloating Postgres.
- Preserve company scoping and existing governance invariants.
3.2 Non-Goals (for this subsystem phase)
- Distributed execution workers across multiple hosts.
- Third-party adapter marketplace/plugin SDK.
- Perfect cost accounting for providers that do not emit cost.
- Long-term log archival strategy beyond basic retention.
4. Baseline and Gaps (As of 2026-02-17)
Current code already has:
agentswithadapterType+adapterConfig.heartbeat_runswith basic status tracking.- in-process
heartbeatServicethat invokesprocessandhttp. - cancellation endpoints for active runs.
Current gaps this spec addresses:
- No persistent per-agent runtime state for session resume.
- No queue/wakeup abstraction (invoke is immediate).
- No assignment-triggered or timer-triggered centralized wakeups.
- No websocket/SSE push path to browser.
- No persisted run event timeline or external full-log storage contract.
- No typed local adapter contracts for Claude/Codex session and usage extraction.
- No prompt-template variable/pill system in agent setup.
- No deployment-aware adapter for full run log storage (disk/object store/etc).
5. Architecture Overview
The subsystem introduces six cooperating components:
-
Adapter Registry- Maps
adapter_typeto implementation. - Exposes capability metadata and config validation.
- Maps
-
Wakeup Coordinator- Single entrypoint for all wakeups (
timer,assignment,on_demand,automation). - Applies dedupe/coalescing and queue rules.
- Single entrypoint for all wakeups (
-
Run Executor- Claims queued wakeups.
- Creates
heartbeat_runs. - Spawns/monitors child processes for local adapters.
- Handles timeout/cancel/graceful kill.
-
Runtime State Store- Persists resumable adapter state per agent.
- Persists run usage summaries and lightweight run-event timeline.
-
Run Log Store- Persists full stdout/stderr streams via pluggable storage adapter.
- Returns stable
logReffor retrieval (local path, object key, or DB reference).
-
Realtime Event Hub- Publishes run/agent/task updates over websocket.
- Supports selective subscription by company.
Control flow (happy path):
- Trigger arrives (
timer,assignment,on_demand, orautomation). - Wakeup coordinator enqueues/merges wake request.
- Executor claims request, creates run row, marks agent
running. - Adapter executes, emits status/log/usage events.
- Full logs stream to
RunLogStore; metadata/events are persisted to DB and pushed to websocket subscribers. - Process exits, output parser updates run result + runtime state.
- Agent returns to
idleorerror; UI updates in real time.
6. Agent Run Protocol (Version agent-run/v1)
This protocol is runtime-agnostic and implemented by all adapters.
type RunOutcome = "succeeded" | "failed" | "cancelled" | "timed_out";
type StatusColor = "neutral" | "blue" | "green" | "yellow" | "red";
interface TokenUsage {
inputTokens: number;
outputTokens: number;
cachedInputTokens?: number;
cachedOutputTokens?: number;
}
interface AdapterInvokeInput {
protocolVersion: "agent-run/v1";
companyId: string;
agentId: string;
runId: string;
wakeupSource: "timer" | "assignment" | "on_demand" | "automation";
triggerDetail?: "manual" | "ping" | "callback" | "system";
cwd: string;
prompt: string;
adapterConfig: Record<string, unknown>;
runtimeState: Record<string, unknown>;
env: Record<string, string>;
timeoutSec: number;
}
interface AdapterHooks {
status?: (update: { message: string; color?: StatusColor }) => Promise<void>;
log?: (event: { stream: "stdout" | "stderr" | "system"; chunk: string }) => Promise<void>;
usage?: (usage: TokenUsage) => Promise<void>;
event?: (eventType: string, payload: Record<string, unknown>) => Promise<void>;
}
interface AdapterInvokeResult {
outcome: RunOutcome;
exitCode: number | null;
errorMessage?: string | null;
summary?: string | null;
sessionId?: string | null;
usage?: TokenUsage | null;
provider?: string | null;
model?: string | null;
costUsd?: number | null;
runtimeStatePatch?: Record<string, unknown>;
rawResult?: Record<string, unknown> | null;
}
interface AgentRunAdapter {
type: string;
protocolVersion: "agent-run/v1";
capabilities: {
resumableSession: boolean;
statusUpdates: boolean;
logStreaming: boolean;
tokenUsage: boolean;
};
validateConfig(config: unknown): { ok: true } | { ok: false; errors: string[] };
invoke(input: AdapterInvokeInput, hooks: AdapterHooks, signal: AbortSignal): Promise<AdapterInvokeResult>;
}
6.1 Required Behavior
validateConfigruns before saving or invoking.invokemust be deterministic for a given config + runtime state + prompt.- Adapter must not mutate DB directly; it returns data via result/events only.
- Adapter must emit enough context for errors to be debuggable.
- If
invokethrows, executor records run asfailedwith captured error text.
6.2 Optional Behavior
Adapters may omit status/log hooks. If omitted, runtime still emits system lifecycle statuses (queued, running, finished).
6.3 Run log storage protocol
Full run logs are managed by a separate pluggable store (not by the agent adapter).
type RunLogStoreType = "local_file" | "object_store" | "postgres";
interface RunLogHandle {
store: RunLogStoreType;
logRef: string; // opaque provider reference (path, key, uri, row id)
}
interface RunLogStore {
begin(input: { companyId: string; agentId: string; runId: string }): Promise<RunLogHandle>;
append(
handle: RunLogHandle,
event: { stream: "stdout" | "stderr" | "system"; chunk: string; ts: string },
): Promise<void>;
finalize(
handle: RunLogHandle,
summary: { bytes: number; sha256?: string; compressed: boolean },
): Promise<void>;
read(
handle: RunLogHandle,
opts?: { offset?: number; limitBytes?: number },
): Promise<{ content: string; nextOffset?: number }>;
delete?(handle: RunLogHandle): Promise<void>;
}
V1 deployment defaults:
- Dev/local default:
local_file(write todata/run-logs/...). - Cloud/serverless default:
object_store(S3/R2/GCS compatible). - Optional fallback:
postgreswith strict size caps.
6.4 Adapter identity and compatibility
For V1 rollout, adapter identity is explicit:
claude_localcodex_localprocess(generic existing behavior)http(generic existing behavior)
claude_local and codex_local are not wrappers around arbitrary process; they are typed adapters with known parser/resume semantics.
7. Built-in Adapters (Phase 1)
7.1 claude-local
Runs local claude CLI directly.
Config
{
"cwd": "/absolute/or/relative/path",
"promptTemplate": "You are agent {{agent.id}} ...",
"bootstrapPromptTemplate": "Initial setup instructions (optional)",
"model": "optional-model-id",
"maxTurnsPerRun": 80,
"dangerouslySkipPermissions": true,
"env": {"KEY": "VALUE"},
"extraArgs": [],
"timeoutSec": 1800,
"graceSec": 20
}
Invocation
- Base command:
claude --print <prompt> --output-format json - Resume: add
--resume <sessionId>when runtime state has session ID - Unsandboxed mode: add
--dangerously-skip-permissionswhen enabled
Output parsing
- Parse stdout JSON object.
- Extract
session_idfor resume. - Extract usage fields:
usage.input_tokensusage.cache_read_input_tokens(if present)usage.output_tokens
- Extract
total_cost_usdwhen present. - On non-zero exit: still attempt parse; if parse succeeds keep extracted state and mark run failed unless adapter explicitly reports success.
7.2 codex-local
Runs local codex CLI directly.
Config
{
"cwd": "/absolute/or/relative/path",
"promptTemplate": "You are agent {{agent.id}} ...",
"bootstrapPromptTemplate": "Initial setup instructions (optional)",
"model": "optional-model-id",
"search": false,
"dangerouslyBypassApprovalsAndSandbox": true,
"env": {"KEY": "VALUE"},
"extraArgs": [],
"timeoutSec": 1800,
"graceSec": 20
}
Invocation
- Base command:
codex exec --json <prompt> - Resume form:
codex exec --json resume <sessionId> <prompt> - Unsandboxed mode: add
--dangerously-bypass-approvals-and-sandboxwhen enabled - Optional search mode: add
--search
Output parsing
Codex emits JSONL events. Parse line-by-line and extract:
thread.started.thread_id-> session IDitem.completedwhere item type isagent_message-> output textturn.completed.usage:input_tokenscached_input_tokensoutput_tokens
Codex JSONL currently may not include cost; store token usage and leave cost null/unknown unless available.
7.3 Common local adapter process handling
Both local adapters must:
- Use
spawn(command, args, { shell: false, stdio: "pipe" }). - Capture stdout/stderr in stream chunks and forward to
RunLogStore. - Maintain rolling stdout/stderr tail excerpts in memory for DB diagnostic fields.
- Emit live log events to websocket subscribers (optional to throttle/chunk).
- Support graceful cancel:
SIGTERM, thenSIGKILLaftergraceSec. - Enforce timeout using adapter
timeoutSec. - Return exit code + parsed result + diagnostic stderr.
8. Heartbeat and Wakeup Coordinator
8.1 Wakeup sources
Supported sources:
timer: periodic heartbeat per agent.assignment: issue assigned/reassigned to agent.on_demand: explicit wake request path (board/manual click or API ping).automation: non-interactive wake path (external callback or internal system automation).
8.2 Central API
All sources call one internal service:
enqueueWakeup({
companyId,
agentId,
source,
triggerDetail, // optional: manual|ping|callback|system
reason,
payload,
requestedBy,
idempotencyKey?
})
No source invokes adapters directly.
8.3 Queue semantics
- Max active run per agent remains
1. - If agent already has
queued/runningrun:- coalesce duplicate wakeups
- increment
coalescedCount - preserve latest reason/source metadata
- Queue is DB-backed for restart safety.
- Coordinator uses FIFO by
requested_at, with optional priority:on_demand>assignment>timer/automation
8.4 Agent heartbeat policy fields
Agent-level control-plane settings (not adapter-specific):
{
"heartbeat": {
"enabled": true,
"intervalSec": 300,
"wakeOnAssignment": true,
"wakeOnOnDemand": true,
"wakeOnAutomation": true,
"cooldownSec": 10
}
}
Defaults:
enabled: trueintervalSec: null(no timer until explicitly set) or product default300if desired globallywakeOnAssignment: truewakeOnOnDemand: truewakeOnAutomation: true
8.5 Trigger integration rules
- Timer checks run on server worker interval and enqueue due agents.
- Issue assignment mutation enqueues wakeup when assignee changes and target agent has
wakeOnAssignment=true. - On-demand endpoint enqueues wakeup with
source=on_demandandtriggerDetail=manual|pingwhenwakeOnOnDemand=true. - Callback/system automations enqueue wakeup with
source=automationandtriggerDetail=callback|systemwhenwakeOnAutomation=true. - Paused/terminated agents do not receive new wakeups.
- Hard budget-stopped agents do not receive new wakeups.
9. Persistence Model
All tables remain company-scoped.
9.0 Changes to agents
- Extend
adapter_typedomain to includeclaude_localandcodex_local(alongside existingprocess,http). - Keep
adapter_configas adapter-owned config (CLI flags, cwd, prompt templates, env overrides). - Add
runtime_configjsonb for control-plane scheduling policy:- heartbeat enable/interval
- wake-on-assignment
- wake-on-on-demand
- wake-on-automation
- cooldown
This separation keeps adapter config runtime-agnostic while allowing the heartbeat service to apply consistent scheduling logic.
9.1 New table: agent_runtime_state
One row per agent for aggregate runtime counters and legacy compatibility.
agent_iduuid pk fkagents.idcompany_iduuid fk not nulladapter_typetext not nullsession_idtext nullstate_jsonjsonb not null default{}last_run_iduuid fkheartbeat_runs.idnulllast_run_statustext nulltotal_input_tokensbigint not null default0total_output_tokensbigint not null default0total_cached_input_tokensbigint not null default0total_cost_centsbigint not null default0last_errortext nullupdated_attimestamptz not null
Invariant: exactly one runtime state row per agent.
9.1.1 New table: agent_task_sessions
One row per (company_id, agent_id, adapter_type, task_key) for resumable session state.
iduuid pkcompany_iduuid fk not nullagent_iduuid fk not nulladapter_typetext not nulltask_keytext not nullsession_params_jsonjsonb null (adapter-defined shape)session_display_idtext null (for UI/debug)last_run_iduuid fkheartbeat_runs.idnulllast_errortext nullcreated_attimestamptz not nullupdated_attimestamptz not null
Invariant: unique (company_id, agent_id, adapter_type, task_key).
9.2 New table: agent_wakeup_requests
Queue + audit for wakeups.
iduuid pkcompany_iduuid fk not nullagent_iduuid fk not nullsourcetext not null (timer|assignment|on_demand|automation)trigger_detailtext null (manual|ping|callback|system)reasontext nullpayloadjsonb nullstatustext not null (queued|claimed|coalesced|skipped|completed|failed|cancelled)coalesced_countint not null default0requested_by_actor_typetext null (user|agent|system)requested_by_actor_idtext nullidempotency_keytext nullrun_iduuid fkheartbeat_runs.idnullrequested_attimestamptz not nullclaimed_attimestamptz nullfinished_attimestamptz nullerrortext null
9.3 New table: heartbeat_run_events
Append-only per-run lightweight event timeline (no full raw log chunks).
idbigserial pkcompany_iduuid fk not nullrun_iduuid fkheartbeat_runs.idnot nullagent_iduuid fkagents.idnot nullseqint not nullevent_typetext not null (lifecycle|status|usage|error|structured)streamtext null (system|stdout|stderr) (summarized events only, not full stream chunks)leveltext null (info|warn|error)colortext nullmessagetext nullpayloadjsonb nullcreated_attimestamptz not null
9.4 Changes to heartbeat_runs
Add fields required for result and diagnostics:
wakeup_request_iduuid fkagent_wakeup_requests.idnullexit_codeint nullsignaltext nullusage_jsonjsonb nullresult_jsonjsonb nullsession_id_beforetext nullsession_id_aftertext nulllog_storetext null (local_file|object_store|postgres)log_reftext null (opaque provider reference; path/key/uri/row id)log_bytesbigint nulllog_sha256text nulllog_compressedboolean not null default falsestderr_excerpttext nullstdout_excerpttext nullerror_codetext null
This keeps per-run diagnostics queryable without storing full logs in Postgres.
9.5 Log storage adapter configuration
Runtime log storage is deployment-configured (not per-agent by default).
{
"runLogStore": {
"type": "local_file | object_store | postgres",
"basePath": "./data/run-logs",
"bucket": "paperclip-run-logs",
"prefix": "runs/",
"compress": true,
"maxInlineExcerptBytes": 32768
}
}
Rules:
log_refmust be opaque and provider-neutral at API boundaries.- UI/API must not assume local filesystem semantics.
- Provider-specific secrets/credentials stay in server config, never in agent config.
10. Prompt Template and Pill System
10.1 Template format
- Mustache-style placeholders:
{{path.to.value}} - No arbitrary code execution.
- Unknown variable on save = validation error.
10.2 Initial variable catalog
company.idcompany.nameagent.idagent.nameagent.roleagent.titlerun.idrun.sourcerun.startedAtheartbeat.reasonpaperclip.skill(shared Paperclip skill text block)credentials.apiBaseUrlcredentials.apiKey(optional, sensitive)
10.3 Prompt fields
bootstrapPromptTemplate- Used when no session exists.
promptTemplate- Used on every wakeup.
- Can include run source/reason pills.
If bootstrapPromptTemplate is omitted, promptTemplate is used for first run.
10.4 UI requirements
- Agent setup/edit form includes prompt editors with pill insertion.
- Variables are shown as clickable pills for fast insertion.
- Save-time validation indicates unknown/missing variables.
- Sensitive pills (
credentials.*) show explicit warning badge.
10.5 Security notes for credentials
- Credentials in prompt are allowed for initial simplicity but discouraged.
- Preferred transport is env vars (
PAPERCLIP_*) injected at runtime. - Prompt preview and logs must redact sensitive values.
11. Realtime Status Delivery
11.1 Transport
Primary transport: websocket channel per company.
- Endpoint:
GET /api/companies/:companyId/events/ws - Auth: board session or agent API key (company-bound)
11.2 Event envelope
{
"eventId": "uuid-or-monotonic-id",
"companyId": "uuid",
"type": "heartbeat.run.status",
"entityType": "heartbeat_run",
"entityId": "uuid",
"occurredAt": "2026-02-17T12:00:00Z",
"payload": {}
}
11.3 Required event types
agent.status.changedheartbeat.run.queuedheartbeat.run.startedheartbeat.run.status(short color+message updates)heartbeat.run.log(optional live chunk stream; full persistence handled byRunLogStore)heartbeat.run.finishedissue.updatedissue.comment.createdactivity.appended
11.4 UI behavior
- Agent detail view updates run timeline live.
- Task board reflects assignment/status/comment changes from agent activity without refresh.
- Org/agent list reflects status changes live.
- If websocket disconnects, client falls back to short polling until reconnect.
12. Error Handling and Diagnostics
12.1 Error classes
adapter_not_installedinvalid_working_directoryspawn_failedtimeoutcancellednonzero_exitoutput_parse_errorresume_session_invalidbudget_blocked
12.2 Logging requirements
- Persist full stdout/stderr stream to configured
RunLogStore. - Persist only lightweight run metadata/events in Postgres (
heartbeat_runs,heartbeat_run_events). - Persist bounded
stdout_excerptandstderr_excerptin Postgres for quick diagnostics. - Mark truncation explicitly when excerpts are capped.
- Redact secrets from logs, excerpts, and websocket payloads.
12.3 Log retention and lifecycle
RunLogStoreretention is configurable by deployment (for example 7/30/90 days).- Postgres run metadata can outlive full log objects.
- Deletion/pruning jobs must handle orphaned metadata/log-object references safely.
- If full log object is gone, APIs still return metadata and excerpts with
log_unavailablestatus.
12.4 Restart recovery
On server startup:
- Find stale
queued/runningruns. - Mark as
failedwitherror_code=control_plane_restart. - Set affected non-paused/non-terminated agents to
error(oridlebased on policy). - Emit recovery events to websocket and activity log.
13. API Surface Changes
13.1 New/updated endpoints
POST /agents/:agentId/wakeup- enqueue wakeup with source/reason
POST /agents/:agentId/heartbeat/invoke- backward-compatible alias to wakeup API
GET /agents/:agentId/runtime-state- board-only debug view
GET /agents/:agentId/task-sessions- board-only list of task-scoped adapter sessions
POST /agents/:agentId/runtime-state/reset-session- clears all task sessions for the agent, or one when
taskKeyis provided
- clears all task sessions for the agent, or one when
GET /heartbeat-runs/:runId/events?afterSeq=:n- fetch persisted lightweight timeline
GET /heartbeat-runs/:runId/log- reads full log stream via
RunLogStore(or redirects/presigned URL for object store)
- reads full log stream via
GET /api/companies/:companyId/events/ws- websocket stream
13.2 Mutation logging
All wakeup/run state mutations must create activity_log entries:
wakeup.requestedwakeup.coalescedheartbeat.startedheartbeat.finishedheartbeat.failedheartbeat.cancelledruntime_state.updated
14. Heartbeat Service Implementation Plan
Phase 1: Contracts and schema
- Add new DB tables/columns (
agent_runtime_state,agent_wakeup_requests,heartbeat_run_events,heartbeat_runs.log_*fields). - Add
RunLogStoreinterface and configuration wiring. - Add shared types/constants/validators.
- Keep existing routes functional during migration.
Phase 2: Wakeup coordinator
- Implement DB-backed wakeup queue.
- Convert invoke/wake routes to enqueue with
source=on_demandand appropriatetriggerDetail. - Add worker loop to claim and execute queued wakeups.
Phase 3: Local adapters
- Implement
claude-localadapter. - Implement
codex-localadapter. - Parse and persist session IDs and token usage.
- Wire cancel/timeout/grace behavior.
Phase 4: Realtime push
- Implement company websocket hub.
- Publish run/agent/issue events.
- Update UI pages to subscribe and invalidate/update relevant data.
Phase 5: Prompt pills and config UX
- Add adapter-specific config editor with prompt templates.
- Add pill insertion and variable validation.
- Add sensitive-variable warnings and redaction.
Phase 6: Hardening
- Add failure/restart recovery sweeps.
- Add metadata/full-log retention policies and pruning jobs.
- Add integration/e2e coverage for wakeup triggers and live updates.
15. Acceptance Criteria
- Agent with
claude-localorcodex-localcan run, exit, and persist run result. - Session parameters are persisted per task scope and reused automatically for same-task resumes.
- Token usage is persisted per run and accumulated per agent runtime state.
- Timer, assignment, on-demand, and automation wakeups all enqueue through one coordinator.
- Pause/terminate interrupts running local process and prevents new wakeups.
- Browser receives live websocket updates for run status/logs and task/agent changes.
- Failed runs expose rich CLI diagnostics in UI with excerpts immediately available and full log retrievable via
RunLogStore. - All actions remain company-scoped and auditable.
16. Open Questions
- Should timer default be
null(off until enabled) or300seconds by default? - What should the default retention policy be for full log objects vs Postgres metadata?
- Should agent API credentials be allowed in prompt templates by default, or require explicit opt-in toggle?
- Should websocket be the only realtime channel, or should we also expose SSE for simpler clients?