diff --git a/PRODUCT.md b/PRODUCT.md index 0c6189a9..5c265aa1 100644 --- a/PRODUCT.md +++ b/PRODUCT.md @@ -20,14 +20,13 @@ A company has: Every employee is an agent. When you create a company, you start by defining the CEO, then build out from there. Each employee has: -- **SOUL.md** — who they are, their ultimate objective, their personality, their operating principles. This is the agent's identity and mission. -- **HEARTBEAT.md** — what they do on their main loop. Every time the agent "wakes up," this defines what they review, what they check, what actions they take. -- **Agent configuration** — how this specific agent runs (see Agent Execution below) +- **Adapter type + config** — how this agent runs and what defines its identity/behavior. This is adapter-specific (e.g., an OpenClaw agent might use SOUL.md and HEARTBEAT.md files; a Claude Code agent might use CLAUDE.md; a bare script might use CLI args). Paperclip doesn't prescribe the format — the adapter does. - **Role & reporting** — their title, who they report to, who reports to them +- **Capabilities description** — a short paragraph on what this agent does and when they're relevant (helps other agents discover who can help with what) -Example: A CEO's SOUL.md says "achieve $1M MRR with an AI note-taking app in 3 months." Their HEARTBEAT.md says "every loop: review what your executives are doing, check company metrics, reprioritize if needed, assign new strategic initiatives." +Example: A CEO agent's adapter config tells it to "review what your executives are doing, check company metrics, reprioritize if needed, assign new strategic initiatives" on each heartbeat. An engineer's config tells it to "check assigned tasks, pick the highest priority, and work it." -Then you define who reports to the CEO: a CTO managing programmers, a CMO managing the marketing team, and so on. Every agent in the tree gets their own SOUL.md and HEARTBEAT.md. +Then you define who reports to the CEO: a CTO managing programmers, a CMO managing the marketing team, and so on. Every agent in the tree gets their own adapter configuration. ### Agent Execution @@ -61,7 +60,7 @@ More detailed task structure TBD. 2. **Company is the unit of organization.** Everything lives under a company. One Paperclip instance, many companies. -3. **Soul + Heartbeat define an agent.** Every agent has an identity (SOUL.md) and a loop (HEARTBEAT.md). This is the minimum contract. +3. **Adapter config defines the agent.** Every agent has an adapter type and configuration that controls its identity and behavior. The minimum contract is just "be callable." 4. **All work traces to the goal.** Hierarchical task management means nothing exists in isolation. If you can't explain why a task matters to the company goal, it shouldn't exist. @@ -72,18 +71,15 @@ More detailed task structure TBD. 1. Open Paperclip, create a new company 2. Define the company's goal: "Create the #1 AI note-taking app, $1M MRR in 3 months" 3. Create the CEO - - Write their SOUL.md (objective, personality, operating style) - - Write their HEARTBEAT.md (the loop — review executives, check metrics, reprioritize) - - Configure agent execution (e.g., Claude Code session, OpenClaw bot, custom script) + - Choose an adapter (e.g., process adapter for Claude Code, HTTP adapter for OpenClaw) + - Configure the adapter (agent identity, loop behavior, execution settings) + - CEO proposes strategic breakdown → board approves 4. Define the CEO's reports: CTO, CMO, CFO, etc. - - Each gets their own SOUL.md, HEARTBEAT.md, and agent config + - Each gets their own adapter config and role definition 5. Define their reports: engineers under CTO, marketers under CMO, etc. 6. Set budgets, define initial strategic tasks 7. Hit go — agents start their heartbeats and the company runs -## Open Questions +## Further Detail -- Agent configuration interface — what exactly is the schema? What are the levels of integration (basic script vs. fully integrated with cost tracking)? -- Task management detail — what fields does a task have? Assignment, status workflow, dependencies beyond parentage? -- Communication protocol — how do agents talk to each other? Through tasks only, or direct messaging? -- Revenue/expense tracking — how does this integrate? Manual entry, API connections, agent-reported? +See [SPEC.md](./SPEC.md) for the full technical specification and [doc/TASKS.md](./doc/TASKS.md) for the task management data model. diff --git a/SPEC.md b/SPEC.md index aa31c2a2..e4eb0507 100644 --- a/SPEC.md +++ b/SPEC.md @@ -4,7 +4,7 @@ Target specification for the Paperclip control plane. Living document — update --- -## 1. Company Model [NEEDS DETAIL] +## 1. Company Model [DRAFT] A company is a first-order object. One Paperclip instance runs multiple companies. @@ -22,27 +22,40 @@ A company is a first-order object. One Paperclip instance runs multiple companie Every company has a **board** that governs high-impact decisions. The board is the human oversight layer. -**V1: Single human board.** One human operator approves: +**V1: Single human board.** One human operator. + +#### Board Approval Gates (V1) + - New agent hires (creating new agents) -- [TBD: other governance-gated actions] +- CEO's initial strategic breakdown (CEO proposes, board approves before execution begins) +- [TBD: other governance-gated actions — budget changes, goal changes, firing agents?] + +#### Board Powers (Always Available) + +The board has **unrestricted access** to the entire system at all times: + +- **Pause/resume any agent** — stop an agent's heartbeat immediately +- **Pause/resume any work item** — pause a task, project, subtask tree, milestone. Paused items are not picked up by agents. +- **Full project management access** — create, edit, comment on, modify, delete, reassign any task/project/milestone through the UI +- **Override any agent decision** — reassign tasks, change priorities, modify descriptions + +The board is not just an approval gate — it's a live control surface. The human can intervene at any level at any time. **Future governance models** (not V1): - Hiring budgets (auto-approve hires within $X/month) - Multi-member boards - Delegated authority (CEO can hire within limits) -The board is the boundary between "the company runs autonomously" and "humans retain control." The default is conservative — human approval for structural changes. - ### Open Questions -- Revenue/expense tracking — how does financial data enter the system? +- Revenue/expense tracking — plugin for later, not V1 - Company-level settings and configuration? - Company lifecycle (pause, archive, delete)? -- What other actions require board approval beyond hiring? (budget changes, company goal changes, firing agents?) +- What governance-gated actions exist beyond hiring and CEO strategy approval? --- -## 2. Agent Model [NEEDS DETAIL] +## 2. Agent Model [DRAFT] Every employee is an agent. Agents are the workforce. @@ -103,25 +116,61 @@ Beyond the minimum, Paperclip provides progressively richer integration: Paperclip ships **default agents** that demonstrate full integration: progress tracking, cost instrumentation, and a **Paperclip skill** (a Claude Code skill for interacting with the Paperclip API) for task management. These serve as both useful defaults and reference implementations for adapter authors. -### Open Questions +#### Export Formats -- What is the adapter interface? What must an adapter implement? -- How does an agent authenticate to the control plane? -- Agent lifecycle (create, pause, terminate, restart)? -- What does the Paperclip skill provide? (task CRUD, status updates, reading company context?) -- Export format for org configs — JSON? YAML? Directory of files? +Two export modes: + +1. **Template export** (default) — structure only: agent definitions, org chart, adapter configs, role descriptions. Optionally includes a few seed tasks to help get started. This is the blueprint for spinning up a new company. +2. **Snapshot export** — full state: structure + current tasks, progress, agent status. A complete picture you could restore or fork. + +The usual workflow: export a template, create a new company from it, add a couple initial tasks, go. --- -## 3. Org Structure [NEEDS DETAIL] +## 3. Org Structure [DRAFT] Hierarchical reporting structure. CEO at top, reports cascade down. +### Agent Visibility + +**Full visibility across the org.** Every agent can see the entire org chart, all tasks, all agents. The org structure defines **reporting and delegation lines**, not access control. + +Each agent publishes a short description of their responsibilities and capabilities — almost like skills ("when I'm relevant"). This lets other agents discover who can help with what. + +### Cross-Team Work + +Agents can create tasks and assign them to agents outside their reporting line. This is the mechanism for cross-team collaboration. + +#### Task Acceptance Rules + +When an agent receives a task from outside their team: + +1. **Agrees it's appropriate + can do it** → complete it directly +2. **Agrees it's appropriate + can't do it** → mark as blocked +3. **Questions whether it's worth doing** → **cannot cancel it themselves.** Must reassign to their own manager, explain the situation. Manager decides whether to accept, reassign, or escalate. + +#### Manager Escalation Protocol + +It's any manager's responsibility to understand why their subordinates are blocked and resolve it: + +0. **Decide** — as a manager, is this work worth doing? +1. **Delegate down** — ask someone under them to help unblock +2. **Escalate up** — ask the manager above them for help + +#### Request Depth Tracking + +When a task originates from a cross-team request, track the **depth** as an integer — how many delegation hops from the original requester. This provides visibility into how far work cascades through the org. + +#### Billing Codes + +Tasks carry a **billing code** so that token spend during execution can be attributed upstream to the requesting task/agent. When Agent A asks Agent B to do work, the cost of B's work is tracked against A's request. This enables cost attribution across the org. + ### Open Questions - Is this a strict tree or can agents report to multiple managers? -- Can org structure change at runtime? -- Do agents inherit configuration from their manager? +- Can org structure change at runtime? (agents reassigned, teams restructured) +- Do agents inherit any configuration from their manager? +- Billing code format — simple string? Hierarchical? --- @@ -138,7 +187,19 @@ Agent configuration includes an **adapter** that defines how Paperclip invokes t | `process` | Execute a child process | `python run_agent.py --agent-id {id}` | | `http` | Send an HTTP request | `POST https://openclaw.example.com/hook/{id}` | -More adapters can be added. The adapter interface is simple: "given this agent's config, initiate their cycle." +More adapters can be added. + +### Adapter Interface + +Every adapter implements three methods: + +``` +invoke(agentConfig, context?) → void // Start the agent's cycle +status(agentConfig) → AgentStatus // Is it running? finished? errored? +cancel(agentConfig) → void // Graceful stop signal (for pause/resume) +``` + +This is the full adapter contract. `invoke` starts the agent, `status` lets Paperclip check on it, `cancel` enables the board's pause functionality. Everything else (cost reporting, task updates) is optional and flows through the Paperclip REST API. ### What Paperclip Controls @@ -152,12 +213,24 @@ More adapters can be added. The adapter interface is simple: "given this agent's - What the agent does during its cycle - Whether the agent is task-scoped, time-windowed, or continuous +### Pause Behavior + +When the board (or system) pauses an agent: + +1. **Signal the current execution** — send a graceful termination signal to the running process/session +2. **Grace period** — give the agent time to wrap up, save state, report final status +3. **Force-kill after timeout** — if the agent doesn't stop within the grace period, terminate +4. **Stop future heartbeats** — no new heartbeat cycles will fire until the agent is resumed + +This is "graceful signal + stop future heartbeats." The current run gets a chance to land cleanly. + ### Open Questions - Heartbeat frequency — who controls it? Fixed? Per-agent? Cron-like? - What happens when a heartbeat invocation fails? (process crashes, HTTP 500) - Health monitoring — how does Paperclip distinguish "stuck" from "working on a long task"? - Can agents self-trigger their next heartbeat? ("I'm done, wake me again in 5 min") +- Grace period duration — fixed? configurable per agent? --- @@ -181,26 +254,264 @@ There is no separate messaging or chat system. Tasks are the communication chann - Discussion happens in task comments, not a side channel - If an agent needs to escalate, they comment on the parent task or reassign ---- +### Task Hierarchy Mapping -## 6. Cost Tracking [NEEDS DETAIL] +The company goal is **not a special entity** — it's a regular Initiative (as defined in doc/TASKS.md). When a user creates a company, the UX creates the first Initiative for them (the company's goal). But structurally it's the same as any other initiative. -Token budgets, spend tracking, burn rate. - -### Open Questions - -- How does cost data enter the system? -- Budget enforcement — hard limits vs. alerts? -- Granularity — per-agent, per-task, per-company? +Full hierarchy: **Initiative** (company goal) → Projects → Milestones → Issues → Sub-issues. Everything traces back to an initiative, and the "company goal" is just the first/primary initiative. --- -## 7. Knowledge Base [NEEDS DETAIL] +## 6. Cost Tracking [DRAFT] -Shared organizational memory. +### Cost Reporting + +Fully-instrumented agents report token/API usage back to Paperclip. Costs are tracked at every level: + +- **Per agent** — how much is this employee costing? +- **Per task** — how much did this unit of work cost? +- **Per project** — how much is this deliverable costing? +- **Per company** — total burn rate + +Costs can be denominated in **tokens or dollars** (configurable). + +Billing codes on tasks (see Org Structure) enable cost attribution across teams — when Agent A requests work from Agent B, B's costs roll up to A's request. + +### Budget Controls + +Three tiers: + +1. **Visibility** — dashboards showing spend at every level (agent, task, project, company) +2. **Soft alerts** — configurable thresholds (e.g. warn at 80% of budget) +3. **Hard ceiling** — auto-pause the agent when budget is hit. Board notified. Board can override/raise the limit. + +Budgets can be set to **unlimited** (no ceiling). ### Open Questions -- What form does company knowledge take? -- How do agents read/write to it? -- Scoping — company-wide, team-level, agent-level? +- Cost reporting API — what's the schema for an agent to report costs? +- Dashboard design — what metrics matter most at each level? +- Budget period — per-day? per-week? per-month? rolling? + +--- + +## 7. Default Agents & Bootstrap Flow [DRAFT] + +### Bootstrap Sequence + +How a company goes from "created" to "running": + +1. Human creates a company with a goal +2. Human defines initial top-level tasks/initiatives +3. Human creates the CEO agent (using the default CEO template or custom) +4. CEO's first heartbeat: reviews the goal and tasks, proposes a strategic breakdown (org structure, sub-tasks, hiring plan) +5. **Board approves** the CEO's strategic plan +6. CEO begins execution — creating tasks, proposing hires (board-approved), delegating + +### Default Agents + +Paperclip ships default agent templates: + +- **Default agent** — a basic Claude Code or Codex loop. Knows the **Paperclip skill** (SKILL.md) so it can interact with the task system, read company context, report status. +- **Default CEO** — extends the default agent with CEO-specific behavior: strategic planning, delegation to reports, progress review, board communication. + +These are starting points. Users can customize or replace them entirely. + +### Default Agent Behavior + +The default agent's loop is **config-driven**. The adapter config contains the instructions that define what the agent does on each heartbeat cycle. There is no hardcoded standard loop — each agent's config determines its behavior. + +This means the default CEO config tells the CEO to review strategy, check on reports, etc. The default engineer config tells the engineer to check assigned tasks, pick the highest priority, and work it. But these are config choices, not protocol requirements. + +### Paperclip Skill (SKILL.md) + +A skill definition that teaches agents how to interact with Paperclip. Provides: + +- Task CRUD (create, read, update, complete tasks) +- Status reporting (check in, report progress) +- Company context (read goal, org chart, current state) +- Cost reporting (log token/API usage) + +This skill is adapter-agnostic — it can be loaded into Claude Code, injected into prompts, or used as API documentation for custom agents. + +--- + +## 8. Architecture & Deployment [DRAFT] + +### Deployment Model + +**Single-tenant, self-hostable.** Not a SaaS. One instance = one operator's companies. + +#### Development Path (Progressive Deployment) + +1. **Local dev** — One command to install and run. Embedded Postgres. Everything on your machine. Agents run locally. +2. **Hosted** — Deploy to Vercel/Supabase/AWS/anywhere. Remote agents connect to your server with a shared database. The UI is accessible via the web. +3. **Open company** — Optionally make parts public (e.g. a job board visible to the public for open companies). + +The key constraint: it must be trivial to go from "I'm trying this on my machine" to "my agents are running on remote servers talking to my Paperclip instance." + +#### Agent Authentication + +Agents need to register and authenticate with the Paperclip server to get an API key that identifies them. + +Flow: +1. Agent "signs up" — requests access to the Paperclip instance +2. Human board member approves/onboards the agent +3. Agent receives credentials (API key) and can now interact with the control plane + +This is the same pattern as agent hiring — an agent can't just show up, it needs board approval. + +### Tech Stack + +| Layer | Technology | +| --- | --- | +| Frontend | React + Vite | +| Backend | TypeScript + Hono (REST API, not tRPC — need non-TS clients) | +| Database | PostgreSQL (embedded for dev, hosted for production) | +| Auth | Standard React auth library (not Supabase-dependent) | + +### Concurrency Model: Atomic Task Checkout + +Tasks use **single assignment** (one agent per task) with **atomic checkout**: + +1. Agent attempts to set a task to `in_progress` (claiming it) +2. The API/database enforces this atomically — if another agent already claimed it, the request fails with an error identifying which agent has it +3. If the task is already assigned to the requesting agent from a previous session, they can resume + +No optimistic locking or CRDTs needed. The single-assignment model + atomic checkout prevents conflicts at the design level. + +### Human in the Loop + +Agents can create tasks assigned to humans. The board member (or any human with access) can complete these tasks through the UI. + +When a human completes a task, if the requesting agent's adapter supports **pingbacks** (e.g. OpenClaw hooks), Paperclip sends a notification to wake that agent. This keeps humans rare but possible participants in the workflow. + +### API Design + +**Single unified REST API.** The same API serves both the frontend UI and agents. Authentication determines permissions — board auth has full access, agent API keys have scoped access (their own tasks, cost reporting, company context). + +No separate "agent API" vs. "board API." Same endpoints, different authorization levels. + +### Work Artifacts + +Paperclip does **not** manage work artifacts (code repos, file systems, deployments, documents). That's entirely the agent's domain. Paperclip tracks tasks and costs. Where and how work gets done is outside scope. + +### Open Questions + +- Embedded Postgres solution — pglite? embedded-postgres? +- Auth library choice — better-auth? next-auth? clerk? +- Real-time updates to the UI — WebSocket? SSE? Polling? +- Monorepo structure — pnpm workspaces with shared packages? +- Agent API key scoping — what exactly can an agent access? Only their own tasks? Their team's? The whole company? + +### Crash Recovery: Manual, Not Automatic + +When an agent crashes or disappears mid-task, Paperclip does **not** auto-reassign or auto-release the task. Instead: + +- Paperclip surfaces stale tasks (tasks in `in_progress` with no recent activity) through dashboards and reporting +- Paperclip does not fail silently — the auditing and visibility tools make problems obvious +- Recovery is handled by humans or by emergent processes (e.g. a project manager agent whose job is to monitor for stale work and surface it) + +**Principle: Paperclip reports problems, it doesn't silently fix them.** Automatic recovery hides failures. Good visibility lets the right entity (human or agent) decide what to do. + +### Plugin / Extension Architecture + +The core Paperclip system must be extensible. Features like knowledge bases, revenue tracking, and specialized adapters should be addable as **plugins** without modifying core. This means: + +- Well-defined API boundaries that plugins can hook into +- Event system or hooks for reacting to task/agent lifecycle events +- Plugin-registrable UI components (future) + +This isn't a V1 deliverable (we're not building a plugin framework upfront), but the architecture should not paint us into a corner. Keep boundaries clean so extensions are possible. + +--- + +## 9. Frontend / UI [DRAFT] + +### Primary Views + +Each is a distinct page/route: + +1. **Org Chart** — the org tree with live status indicators (running/idle/paused/error) per agent. Real-time activity feed of what agents are doing. +2. **Task Board** — Linear-style task management. Kanban and list views. Filter by team, agent, project, status. +3. **Dashboard** — high-level metrics: agent count, active tasks, costs, goal progress, burn rate. The "glance" view from GOAL.md. +4. **Agent Detail** — deep dive on a single agent: their tasks, activity, costs, configuration, status history. +5. **Project/Initiative Views** — progress tracking against milestones and goals. +6. **Cost Dashboard** — spend visualization at every level (agent, task, project, company). + +### Board Controls (Available Everywhere) + +- Pause/resume agents (any view) +- Pause/resume tasks/projects (any view) +- Approve/reject pending actions (hiring, strategy proposals) +- Direct task creation, editing, commenting + +--- + +## 10. V1 Scope (MVP) [DRAFT] + +**Full loop with one adapter.** V1 must demonstrate the complete Paperclip cycle end-to-end, even if narrow. + +### Must Have (V1) + +- [ ] **Company CRUD** — create a company with a goal +- [ ] **Agent CRUD** — create/edit/pause/resume agents with adapter config +- [ ] **Org chart** — define reporting structure, visualize it +- [ ] **Process adapter** — invoke(), status(), cancel() for local child processes +- [ ] **Task management** — full lifecycle with hierarchy (tasks trace to company goal) +- [ ] **Atomic task checkout** — single assignment, in_progress locking +- [ ] **Board governance** — human approves hires, pauses agents, full PM access +- [ ] **Cost tracking** — agents report token usage, per-agent/task/company visibility +- [ ] **Budget controls** — soft alerts + hard ceiling with auto-pause +- [ ] **Default agent** — basic Claude Code/Codex loop with Paperclip skill +- [ ] **Default CEO** — strategic planning, delegation, board communication +- [ ] **Paperclip skill (SKILL.md)** — teaches agents to interact with the API +- [ ] **REST API** — full API for agent interaction (Hono) +- [ ] **Web UI** — React/Vite: org chart, task board, dashboard, cost views +- [ ] **Agent auth** — registration, board approval, API keys +- [ ] **One-command dev setup** — embedded Postgres, everything local + +### Not V1 + +- Multiple adapter types (HTTP adapter, OpenClaw adapter — V2) +- Template export/import +- Knowledge base +- Advanced governance models (hiring budgets, multi-member boards) +- Revenue/expense tracking beyond token costs +- Public job board / open company features + +--- + +## 11. Knowledge Base + +**Anti-goal for core.** The knowledge base is not part of the Paperclip core — it will be a plugin. The task system + comments + agent descriptions provide sufficient shared context. + +The architecture must support adding a knowledge base plugin later (clean API boundaries, hookable lifecycle events) but the core system explicitly does not include one. + +--- + +## 12. Anti-Requirements + +Things Paperclip explicitly does **not** do: + +- **Not an agent runtime** — Paperclip orchestrates, agents run elsewhere +- **Not a knowledge base** — core has no wiki/docs/vector-DB (plugin territory) +- **Not a SaaS** — single-tenant, self-hosted +- **Not opinionated about agent implementation** — any language, any framework, any runtime +- **Not automatically self-healing** — surfaces problems, doesn't silently fix them +- **Does not manage work artifacts** — no repo management, no deployment, no file systems +- **Does not auto-reassign work** — stale tasks are surfaced, not silently redistributed + +--- + +## 13. Principles (Consolidated) + +1. **Unopinionated about how you run your agents.** Any language, any framework, any runtime. Paperclip is the control plane, not the execution plane. +2. **Company is the unit of organization.** Everything lives under a company. +3. **Tasks are the communication channel.** All agent communication flows through tasks + comments. No side channels. +4. **All work traces to the goal.** Hierarchical task management — nothing exists in isolation. +5. **Board governs.** Humans retain control through the board. Conservative defaults (human approval required). +6. **Surface problems, don't hide them.** Good auditing and visibility. No silent auto-recovery. +7. **Atomic ownership.** Single assignee per task. Atomic checkout prevents conflicts. +8. **Progressive deployment.** Trivial to start local, straightforward to scale to hosted. +9. **Extensible core.** Clean boundaries so plugins can add capabilities without modifying core.