Merge pull request #1166 from paperclipai/fix/canary-version-after-partial-publish

fix: advance canary after partial publish
2026-03-17 16:37:58 -05:00 · 2026-03-17 16:31:38 -05:00 · 2026-03-17 16:02:36 -05:00 · 2026-03-17 16:01:48 -05:00 · 2026-03-17 15:40:13 -05:00 · 2026-03-17 15:39:50 -05:00
457 changed files with 151220 additions and 4155 deletions
--- a/.agents/skills/create-agent-adapter/SKILL.md
+++ b/.agents/skills/create-agent-adapter/SKILL.md
--- a/.agents/skills/doc-maintenance/SKILL.md
+++ b/.agents/skills/doc-maintenance/SKILL.md
@@ -0,0 +1,201 @@
+---
+name: doc-maintenance
+description: >
+  Audit top-level documentation (README, SPEC, PRODUCT) against recent git
+  history to find drift — shipped features missing from docs or features
+  listed as upcoming that already landed. Proposes minimal edits, creates
+  a branch, and opens a PR. Use when asked to review docs for accuracy,
+  after major feature merges, or on a periodic schedule.
+---
+
+# Doc Maintenance Skill
+
+Detect documentation drift and fix it via PR — no rewrites, no churn.
+
+## When to Use
+
+- Periodic doc review (e.g. weekly or after releases)
+- After major feature merges
+- When asked "are our docs up to date?"
+- When asked to audit README / SPEC / PRODUCT accuracy
+
+## Target Documents
+
+| Document | Path | What matters |
+|----------|------|-------------|
+| README | `README.md` | Features table, roadmap, quickstart, "what is" accuracy, "works with" table |
+| SPEC | `doc/SPEC.md` | No false "not supported" claims, major model/schema accuracy |
+| PRODUCT | `doc/PRODUCT.md` | Core concepts, feature list, principles accuracy |
+
+Out of scope: DEVELOPING.md, DATABASE.md, CLI.md, doc/plans/, skill files,
+release notes. These are dev-facing or ephemeral — lower risk of user-facing
+confusion.
+
+## Workflow
+
+### Step 1 — Detect what changed
+
+Find the last review cursor:
+
+```bash
+# Read the last-reviewed commit SHA
+CURSOR_FILE=".doc-review-cursor"
+if [ -f "$CURSOR_FILE" ]; then
+  LAST_SHA=$(cat "$CURSOR_FILE" | head -1)
+else
+  # First run: look back 60 days
+  LAST_SHA=$(git log --format="%H" --after="60 days ago" --reverse | head -1)
+fi
+```
+
+Then gather commits since the cursor:
+
+```bash
+git log "$LAST_SHA"..HEAD --oneline --no-merges
+```
+
+### Step 2 — Classify changes
+
+Scan commit messages and changed files. Categorize into:
+
+- **Feature** — new capabilities (keywords: `feat`, `add`, `implement`, `support`)
+- **Breaking** — removed/renamed things (keywords: `remove`, `breaking`, `drop`, `rename`)
+- **Structural** — new directories, config changes, new adapters, new CLI commands
+
+**Ignore:** refactors, test-only changes, CI config, dependency bumps, doc-only
+changes, style/formatting commits. These don't affect doc accuracy.
+
+For borderline cases, check the actual diff — a commit titled "refactor: X"
+that adds a new public API is a feature.
+
+### Step 3 — Build a change summary
+
+Produce a concise list like:
+
+```
+Since last review (<sha>, <date>):
+- FEATURE: Plugin system merged (runtime, SDK, CLI, slots, event bridge)
+- FEATURE: Project archiving added
+- BREAKING: Removed legacy webhook adapter
+- STRUCTURAL: New .agents/skills/ directory convention
+```
+
+If there are no notable changes, skip to Step 7 (update cursor and exit).
+
+### Step 4 — Audit each target doc
+
+For each target document, read it fully and cross-reference against the change
+summary. Check for:
+
+1. **False negatives** — major shipped features not mentioned at all
+2. **False positives** — features listed as "coming soon" / "roadmap" / "planned"
+   / "not supported" / "TBD" that already shipped
+3. **Quickstart accuracy** — install commands, prereqs, and startup instructions
+   still correct (README only)
+4. **Feature table accuracy** — does the features section reflect current
+   capabilities? (README only)
+5. **Works-with accuracy** — are supported adapters/integrations listed correctly?
+
+Use `references/audit-checklist.md` as the structured checklist.
+Use `references/section-map.md` to know where to look for each feature area.
+
+### Step 5 — Create branch and apply minimal edits
+
+```bash
+# Create a branch for the doc updates
+BRANCH="docs/maintenance-$(date +%Y%m%d)"
+git checkout -b "$BRANCH"
+```
+
+Apply **only** the edits needed to fix drift. Rules:
+
+- **Minimal patches only.** Fix inaccuracies, don't rewrite sections.
+- **Preserve voice and style.** Match the existing tone of each document.
+- **No cosmetic changes.** Don't fix typos, reformat tables, or reorganize
+  sections unless they're part of a factual fix.
+- **No new sections.** If a feature needs a whole new section, note it in the
+  PR description as a follow-up — don't add it in a maintenance pass.
+- **Roadmap items:** Move shipped features out of Roadmap. Add a brief mention
+  in the appropriate existing section if there isn't one already. Don't add
+  long descriptions.
+
+### Step 6 — Open a PR
+
+Commit the changes and open a PR:
+
+```bash
+git add README.md doc/SPEC.md doc/PRODUCT.md .doc-review-cursor
+git commit -m "docs: update documentation for accuracy
+
+- [list each fix briefly]
+
+Co-Authored-By: Paperclip <noreply@paperclip.ing>"
+
+git push -u origin "$BRANCH"
+
+gh pr create \
+  --title "docs: periodic documentation accuracy update" \
+  --body "$(cat <<'EOF'
+## Summary
+Automated doc maintenance pass. Fixes documentation drift detected since
+last review.
+
+### Changes
+- [list each fix]
+
+### Change summary (since last review)
+- [list notable code changes that triggered doc updates]
+
+## Review notes
+- Only factual accuracy fixes — no style/cosmetic changes
+- Preserves existing voice and structure
+- Larger doc additions (new sections, tutorials) noted as follow-ups
+
+🤖 Generated by doc-maintenance skill
+EOF
+)"
+```
+
+### Step 7 — Update the cursor
+
+After a successful audit (whether or not edits were needed), update the cursor:
+
+```bash
+git rev-parse HEAD > .doc-review-cursor
+```
+
+If edits were made, this is already committed in the PR branch. If no edits
+were needed, commit the cursor update to the current branch.
+
+## Change Classification Rules
+
+| Signal | Category | Doc update needed? |
+|--------|----------|-------------------|
+| `feat:`, `add`, `implement`, `support` in message | Feature | Yes if user-facing |
+| `remove`, `drop`, `breaking`, `!:` in message | Breaking | Yes |
+| New top-level directory or config file | Structural | Maybe |
+| `fix:`, `bugfix` | Fix | No (unless it changes behavior described in docs) |
+| `refactor:`, `chore:`, `ci:`, `test:` | Maintenance | No |
+| `docs:` | Doc change | No (already handled) |
+| Dependency bumps only | Maintenance | No |
+
+## Patch Style Guide
+
+- Fix the fact, not the prose
+- If removing a roadmap item, don't leave a gap — remove the bullet cleanly
+- If adding a feature mention, match the format of surrounding entries
+  (e.g. if features are in a table, add a table row)
+- Keep README changes especially minimal — it shouldn't churn often
+- For SPEC/PRODUCT, prefer updating existing statements over adding new ones
+  (e.g. change "not supported in V1" to "supported via X" rather than adding
+  a new section)
+
+## Output
+
+When the skill completes, report:
+
+- How many commits were scanned
+- How many notable changes were found
+- How many doc edits were made (and to which files)
+- PR link (if edits were made)
+- Any follow-up items that need larger doc work
--- a/.agents/skills/doc-maintenance/references/audit-checklist.md
+++ b/.agents/skills/doc-maintenance/references/audit-checklist.md
@@ -0,0 +1,85 @@
+# Doc Maintenance Audit Checklist
+
+Use this checklist when auditing each target document. For each item, compare
+against the change summary from git history.
+
+## README.md
+
+### Features table
+- [ ] Each feature card reflects a shipped capability
+- [ ] No feature cards for things that don't exist yet
+- [ ] No major shipped features missing from the table
+
+### Roadmap
+- [ ] Nothing listed as "planned" or "coming soon" that already shipped
+- [ ] No removed/cancelled items still listed
+- [ ] Items reflect current priorities (cross-check with recent PRs)
+
+### Quickstart
+- [ ] `npx paperclipai onboard` command is correct
+- [ ] Manual install steps are accurate (clone URL, commands)
+- [ ] Prerequisites (Node version, pnpm version) are current
+- [ ] Server URL and port are correct
+
+### "What is Paperclip" section
+- [ ] High-level description is accurate
+- [ ] Step table (Define goal / Hire team / Approve and run) is correct
+
+### "Works with" table
+- [ ] All supported adapters/runtimes are listed
+- [ ] No removed adapters still listed
+- [ ] Logos and labels match current adapter names
+
+### "Paperclip is right for you if"
+- [ ] Use cases are still accurate
+- [ ] No claims about capabilities that don't exist
+
+### "Why Paperclip is special"
+- [ ] Technical claims are accurate (atomic execution, governance, etc.)
+- [ ] No features listed that were removed or significantly changed
+
+### FAQ
+- [ ] Answers are still correct
+- [ ] No references to removed features or outdated behavior
+
+### Development section
+- [ ] Commands are accurate (`pnpm dev`, `pnpm build`, etc.)
+- [ ] Link to DEVELOPING.md is correct
+
+## doc/SPEC.md
+
+### Company Model
+- [ ] Fields match current schema
+- [ ] Governance model description is accurate
+
+### Agent Model
+- [ ] Adapter types match what's actually supported
+- [ ] Agent configuration description is accurate
+- [ ] No features described as "not supported" or "not V1" that shipped
+
+### Task Model
+- [ ] Task hierarchy description is accurate
+- [ ] Status values match current implementation
+
+### Extensions / Plugins
+- [ ] If plugins are shipped, no "not in V1" or "future" language
+- [ ] Plugin model description matches implementation
+
+### Open Questions
+- [ ] Resolved questions removed or updated
+- [ ] No "TBD" items that have been decided
+
+## doc/PRODUCT.md
+
+### Core Concepts
+- [ ] Company, Employees, Task Management descriptions accurate
+- [ ] Agent Execution modes described correctly
+- [ ] No missing major concepts
+
+### Principles
+- [ ] Principles haven't been contradicted by shipped features
+- [ ] No principles referencing removed capabilities
+
+### User Flow
+- [ ] Dream scenario still reflects actual onboarding
+- [ ] Steps are achievable with current features
--- a/.agents/skills/doc-maintenance/references/section-map.md
+++ b/.agents/skills/doc-maintenance/references/section-map.md
@@ -0,0 +1,22 @@
+# Section Map
+
+Maps feature areas to specific document sections so the skill knows where to
+look when a feature ships or changes.
+
+| Feature Area | README Section | SPEC Section | PRODUCT Section |
+|-------------|---------------|-------------|----------------|
+| Plugins / Extensions | Features table, Roadmap | Extensions, Agent Model | Core Concepts |
+| Adapters (new runtimes) | "Works with" table, FAQ | Agent Model, Agent Configuration | Employees & Agents, Agent Execution |
+| Governance / Approvals | Features table, "Why special" | Board Governance, Board Approval Gates | Principles |
+| Budget / Cost Control | Features table, "Why special" | Budget Delegation | Company (revenue & expenses) |
+| Task Management | Features table | Task Model | Task Management |
+| Org Chart / Hierarchy | Features table | Agent Model (reporting) | Employees & Agents |
+| Multi-Company | Features table, FAQ | Company Model | Company |
+| Heartbeats | Features table, FAQ | Agent Execution | Agent Execution |
+| CLI Commands | Development section | — | — |
+| Onboarding / Quickstart | Quickstart, FAQ | — | User Flow |
+| Skills / Skill Injection | "Why special" | — | — |
+| Company Templates | "Why special", Roadmap (ClipMart) | — | — |
+| Mobile / UI | Features table | — | — |
+| Project Archiving | — | — | — |
+| OpenClaw Integration | "Works with" table, FAQ | Agent Model | Agent Execution |
--- a/.changeset/README.md
+++ b/.changeset/README.md
@@ -1,8 +0,0 @@
-# Changesets
-
-Hello and welcome! This folder has been automatically generated by `@changesets/cli`, a build tool that works
-with multi-package repos, or single-package repos to help you version and publish your code. You can
-find the full documentation for it [in our repository](https://github.com/changesets/changesets).
-
-We have a quick list of common questions to get you started engaging with this project in
-[our documentation](https://github.com/changesets/changesets/blob/main/docs/common-questions.md).
--- a/.changeset/config.json
+++ b/.changeset/config.json
@@ -1,11 +0,0 @@
-{
-  "$schema": "https://unpkg.com/@changesets/config@3.1.3/schema.json",
-  "changelog": "@changesets/cli/changelog",
-  "commit": false,
-  "fixed": [["@paperclipai/*", "paperclipai"]],
-  "linked": [],
-  "access": "public",
-  "baseBranch": "master",
-  "updateInternalDependencies": "patch",
-  "ignore": ["@paperclipai/ui"]
-}
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -0,0 +1,10 @@
+# Replace @cryppadotta if a different maintainer or team should own release infrastructure.
+
+.github/** @cryppadotta @devinfoley
+scripts/release*.sh @cryppadotta @devinfoley
+scripts/release-*.mjs @cryppadotta @devinfoley
+scripts/create-github-release.sh @cryppadotta @devinfoley
+scripts/rollback-latest.sh @cryppadotta @devinfoley
+doc/RELEASING.md @cryppadotta @devinfoley
+doc/PUBLISHING.md @cryppadotta @devinfoley
+doc/RELEASE-AUTOMATION-SETUP.md @cryppadotta @devinfoley
--- a/.github/workflows/pr-policy.yml
+++ b/.github/workflows/pr-policy.yml
@@ -13,8 +13,6 @@ jobs:
  policy:
    runs-on: ubuntu-latest
    timeout-minutes: 10
-    permissions:
-      pull-requests: read

    steps:
      - name: Checkout repository
@@ -33,38 +31,19 @@ jobs:
        with:
          node-version: 20

-      - name: Enforce lockfile policy when manifests change
-        env:
-          GH_TOKEN: ${{ github.token }}
+      - name: Block manual lockfile edits
+        if: github.head_ref != 'chore/refresh-lockfile'
        run: |
-          changed="$(gh api "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" --paginate --jq '.[].filename')"
-          manifest_pattern='(^|/)package\.json$|^pnpm-workspace\.yaml$|^\.npmrc$|^pnpmfile\.(cjs|js|mjs)$'
-
-          manifest_changed=false
-          lockfile_changed=false
-
-          if printf '%s\n' "$changed" | grep -Eq "$manifest_pattern"; then
-            manifest_changed=true
-          fi
-
+          changed="$(git diff --name-only "${{ github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha }}")"
          if printf '%s\n' "$changed" | grep -qx 'pnpm-lock.yaml'; then
-            lockfile_changed=true
-          fi
-
-          if [ "$lockfile_changed" = true ] && [ "$manifest_changed" != true ]; then
-            echo "pnpm-lock.yaml changed without a dependency manifest change." >&2
+            echo "Do not commit pnpm-lock.yaml in pull requests. CI owns lockfile updates."
            exit 1
          fi

-          if [ "$manifest_changed" = true ]; then
+      - name: Validate dependency resolution when manifests change
+        run: |
+          changed="$(git diff --name-only "${{ github.event.pull_request.base.sha }}" "${{ github.event.pull_request.head.sha }}")"
+          manifest_pattern='(^|/)package\.json$|^pnpm-workspace\.yaml$|^\.npmrc$|^pnpmfile\.(cjs|js|mjs)$'
+          if printf '%s\n' "$changed" | grep -Eq "$manifest_pattern"; then
            pnpm install --lockfile-only --ignore-scripts --no-frozen-lockfile
-
-            if ! git diff --quiet -- pnpm-lock.yaml; then
-              if [ "${{ github.event.pull_request.head.repo.full_name }}" = "${{ github.repository }}" ]; then
-                echo "pnpm-lock.yaml is stale for this PR. Wait for the Refresh Lockfile workflow to push the bot commit, then rerun checks." >&2
-              else
-                echo "pnpm-lock.yaml is stale for this fork PR. Run pnpm install --lockfile-only --ignore-scripts --no-frozen-lockfile and commit pnpm-lock.yaml." >&2
-              fi
-              exit 1
-            fi
          fi
--- a/.github/workflows/pr-verify.yml
+++ b/.github/workflows/pr-verify.yml
@@ -26,11 +26,11 @@ jobs:
      - name: Setup Node.js
        uses: actions/setup-node@v4
        with:
-          node-version: 20
+          node-version: 24
          cache: pnpm

      - name: Install dependencies
-        run: pnpm install --frozen-lockfile
+        run: pnpm install --no-frozen-lockfile

      - name: Typecheck
        run: pnpm -r typecheck
@@ -40,3 +40,9 @@ jobs:

      - name: Build
        run: pnpm build
+
+      - name: Release canary dry run
+        run: |
+          git checkout -B master HEAD
+          git checkout -- pnpm-lock.yaml
+          ./scripts/release.sh canary --skip-verify --dry-run
--- a/.github/workflows/refresh-lockfile-pr.yml
+++ b/.github/workflows/refresh-lockfile-pr.yml
@@ -1,111 +0,0 @@
-name: Refresh Lockfile
-
-on:
-  pull_request:
-    branches:
-      - master
-    types:
-      - opened
-      - synchronize
-      - reopened
-      - ready_for_review
-
-concurrency:
-  group: refresh-lockfile-pr-${{ github.event.pull_request.number }}
-  cancel-in-progress: true
-
-jobs:
-  refresh:
-    runs-on: ubuntu-latest
-    timeout-minutes: 10
-    permissions:
-      contents: write
-      pull-requests: read
-
-    steps:
-      - name: Detect dependency manifest changes
-        id: changes
-        env:
-          GH_TOKEN: ${{ github.token }}
-        run: |
-          changed="$(gh api "repos/${{ github.repository }}/pulls/${{ github.event.pull_request.number }}/files" --paginate --jq '.[].filename')"
-          manifest_pattern='(^|/)package\.json$|^pnpm-workspace\.yaml$|^\.npmrc$|^pnpmfile\.(cjs|js|mjs)$'
-
-          if printf '%s\n' "$changed" | grep -Eq "$manifest_pattern"; then
-            echo "manifest_changed=true" >> "$GITHUB_OUTPUT"
-          else
-            echo "manifest_changed=false" >> "$GITHUB_OUTPUT"
-          fi
-
-          if [ "${{ github.event.pull_request.head.repo.full_name }}" = "${{ github.repository }}" ]; then
-            echo "same_repo=true" >> "$GITHUB_OUTPUT"
-          else
-            echo "same_repo=false" >> "$GITHUB_OUTPUT"
-          fi
-
-      - name: Checkout pull request head
-        if: steps.changes.outputs.manifest_changed == 'true'
-        uses: actions/checkout@v4
-        with:
-          repository: ${{ github.event.pull_request.head.repo.full_name }}
-          ref: ${{ github.event.pull_request.head.ref }}
-          fetch-depth: 0
-
-      - name: Setup pnpm
-        if: steps.changes.outputs.manifest_changed == 'true'
-        uses: pnpm/action-setup@v4
-        with:
-          version: 9.15.4
-          run_install: false
-
-      - name: Setup Node.js
-        if: steps.changes.outputs.manifest_changed == 'true'
-        uses: actions/setup-node@v4
-        with:
-          node-version: 20
-          cache: pnpm
-
-      - name: Refresh pnpm lockfile
-        if: steps.changes.outputs.manifest_changed == 'true'
-        run: pnpm install --lockfile-only --ignore-scripts --no-frozen-lockfile
-
-      - name: Fail on unexpected file changes
-        if: steps.changes.outputs.manifest_changed == 'true'
-        run: |
-          changed="$(git status --porcelain)"
-          if [ -z "$changed" ]; then
-            echo "Lockfile is already up to date."
-            exit 0
-          fi
-          if printf '%s\n' "$changed" | grep -Fvq ' pnpm-lock.yaml'; then
-            echo "Unexpected files changed during lockfile refresh:"
-            echo "$changed"
-            exit 1
-          fi
-
-      - name: Commit refreshed lockfile to same-repo PR branch
-        if: steps.changes.outputs.manifest_changed == 'true' && steps.changes.outputs.same_repo == 'true'
-        run: |
-          if git diff --quiet -- pnpm-lock.yaml; then
-            echo "Lockfile unchanged, nothing to do."
-            exit 0
-          fi
-
-          git config user.name "lockfile-bot"
-          git config user.email "lockfile-bot@users.noreply.github.com"
-          git add pnpm-lock.yaml
-          git commit -m "chore(lockfile): refresh pnpm-lock.yaml"
-          git push origin "HEAD:${{ github.event.pull_request.head.ref }}"
-
-      - name: Fail fork PRs that need a lockfile refresh
-        if: steps.changes.outputs.manifest_changed == 'true' && steps.changes.outputs.same_repo != 'true'
-        run: |
-          if git diff --quiet -- pnpm-lock.yaml; then
-            echo "Lockfile unchanged, nothing to do."
-            exit 0
-          fi
-
-          echo "This fork PR changes dependency manifests and requires a refreshed pnpm-lock.yaml." >&2
-          echo "Run: pnpm install --lockfile-only --ignore-scripts --no-frozen-lockfile" >&2
-          echo "Then commit pnpm-lock.yaml to the PR branch." >&2
-          exit 1
--- a/.github/workflows/refresh-lockfile.yml
+++ b/.github/workflows/refresh-lockfile.yml
@@ -0,0 +1,93 @@
+name: Refresh Lockfile
+
+on:
+  push:
+    branches:
+      - master
+  workflow_dispatch:
+
+concurrency:
+  group: refresh-lockfile-master
+  cancel-in-progress: false
+
+jobs:
+  refresh:
+    runs-on: ubuntu-latest
+    timeout-minutes: 10
+    permissions:
+      contents: write
+      pull-requests: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+          run_install: false
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 20
+          cache: pnpm
+
+      - name: Refresh pnpm lockfile
+        run: pnpm install --lockfile-only --ignore-scripts --no-frozen-lockfile
+
+      - name: Fail on unexpected file changes
+        run: |
+          changed="$(git status --porcelain)"
+          if [ -z "$changed" ]; then
+            echo "Lockfile is already up to date."
+            exit 0
+          fi
+          if printf '%s\n' "$changed" | grep -Fvq ' pnpm-lock.yaml'; then
+            echo "Unexpected files changed during lockfile refresh:"
+            echo "$changed"
+            exit 1
+          fi
+
+      - name: Create or update pull request
+        env:
+          GH_TOKEN: ${{ github.token }}
+        run: |
+          if git diff --quiet -- pnpm-lock.yaml; then
+            echo "Lockfile unchanged, nothing to do."
+            exit 0
+          fi
+
+          BRANCH="chore/refresh-lockfile"
+          git config user.name "lockfile-bot"
+          git config user.email "lockfile-bot@users.noreply.github.com"
+
+          git checkout -B "$BRANCH"
+          git add pnpm-lock.yaml
+          git commit -m "chore(lockfile): refresh pnpm-lock.yaml"
+          git push --force origin "$BRANCH"
+
+          # Create PR if one doesn't already exist
+          existing=$(gh pr list --head "$BRANCH" --json number --jq '.[0].number')
+          if [ -z "$existing" ]; then
+            gh pr create \
+              --head "$BRANCH" \
+              --title "chore(lockfile): refresh pnpm-lock.yaml" \
+              --body "Auto-generated lockfile refresh after dependencies changed on master. This PR only updates pnpm-lock.yaml."
+            echo "Created new PR."
+          else
+            echo "PR #$existing already exists, branch updated via force push."
+          fi
+
+      - name: Enable auto-merge for lockfile PR
+        env:
+          GH_TOKEN: ${{ github.token }}
+        run: |
+          pr_url="$(gh pr list --head chore/refresh-lockfile --json url --jq '.[0].url')"
+          if [ -z "$pr_url" ]; then
+            echo "Error: lockfile PR was not found." >&2
+            exit 1
+          fi
+
+          gh pr merge --auto --squash --delete-branch "$pr_url"
--- a/.github/workflows/release.yml
+++ b/.github/workflows/release.yml
@@ -1,38 +1,33 @@
 name: Release

 on:
+  push:
+    branches:
+      - master
  workflow_dispatch:
    inputs:
-      channel:
-        description: Release channel
+      source_ref:
+        description: Commit SHA, branch, or tag to publish as stable
        required: true
-        type: choice
-        default: canary
-        options:
-          - canary
-          - stable
-      bump:
-        description: Semantic version bump
-        required: true
-        type: choice
-        default: patch
-        options:
-          - patch
-          - minor
-          - major
+        type: string
+        default: master
+      stable_date:
+        description: Stable release date in UTC (YYYY-MM-DD). Defaults to today.
+        required: false
+        type: string
      dry_run:
-        description: Preview the release without publishing
+        description: Preview the stable release without publishing
        required: true
        type: boolean
-        default: true
+        default: false

 concurrency:
-  group: release-${{ github.ref }}
+  group: release-${{ github.event_name }}-${{ github.ref }}
  cancel-in-progress: false

 jobs:
-  verify:
-    if: startsWith(github.ref, 'refs/heads/release/')
+  verify_canary:
+    if: github.event_name == 'push'
    runs-on: ubuntu-latest
    timeout-minutes: 30
    permissions:
@@ -56,7 +51,7 @@ jobs:
          cache: pnpm

      - name: Install dependencies
-        run: pnpm install --frozen-lockfile
+        run: pnpm install --no-frozen-lockfile

      - name: Typecheck
        run: pnpm -r typecheck
@@ -67,12 +62,12 @@ jobs:
      - name: Build
        run: pnpm build

-  publish:
-    if: startsWith(github.ref, 'refs/heads/release/')
-    needs: verify
+  publish_canary:
+    if: github.event_name == 'push'
+    needs: verify_canary
    runs-on: ubuntu-latest
    timeout-minutes: 45
-    environment: npm-release
+    environment: npm-canary
    permissions:
      contents: write
      id-token: write
@@ -95,32 +90,165 @@ jobs:
          cache: pnpm

      - name: Install dependencies
-        run: pnpm install --frozen-lockfile
+        run: pnpm install --no-frozen-lockfile
+
+      - name: Restore tracked install-time changes
+        run: git checkout -- pnpm-lock.yaml

      - name: Configure git author
        run: |
          git config user.name "github-actions[bot]"
          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"

-      - name: Run release script
+      - name: Publish canary
+        env:
+          GITHUB_ACTIONS: "true"
+        run: ./scripts/release.sh canary --skip-verify
+
+      - name: Push canary tag
+        run: |
+          tag="$(git tag --points-at HEAD | grep '^canary/v' | head -1)"
+          if [ -z "$tag" ]; then
+            echo "Error: no canary tag points at HEAD after release." >&2
+            exit 1
+          fi
+          git push origin "refs/tags/${tag}"
+
+  verify_stable:
+    if: github.event_name == 'workflow_dispatch'
+    runs-on: ubuntu-latest
+    timeout-minutes: 30
+    permissions:
+      contents: read
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          ref: ${{ inputs.source_ref }}
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 24
+          cache: pnpm
+
+      - name: Install dependencies
+        run: pnpm install --no-frozen-lockfile
+
+      - name: Typecheck
+        run: pnpm -r typecheck
+
+      - name: Run tests
+        run: pnpm test:run
+
+      - name: Build
+        run: pnpm build
+
+  preview_stable:
+    if: github.event_name == 'workflow_dispatch' && inputs.dry_run
+    needs: verify_stable
+    runs-on: ubuntu-latest
+    timeout-minutes: 45
+    permissions:
+      contents: read
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          ref: ${{ inputs.source_ref }}
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 24
+          cache: pnpm
+
+      - name: Install dependencies
+        run: pnpm install --no-frozen-lockfile
+
+      - name: Dry-run stable release
        env:
          GITHUB_ACTIONS: "true"
        run: |
-          args=("${{ inputs.bump }}")
-          if [ "${{ inputs.channel }}" = "canary" ]; then
-            args+=("--canary")
-          fi
-          if [ "${{ inputs.dry_run }}" = "true" ]; then
-            args+=("--dry-run")
+          args=(stable --skip-verify --dry-run)
+          if [ -n "${{ inputs.stable_date }}" ]; then
+            args+=(--date "${{ inputs.stable_date }}")
          fi
          ./scripts/release.sh "${args[@]}"

-      - name: Push stable release branch commit and tag
-        if: inputs.channel == 'stable' && !inputs.dry_run
-        run: git push origin "HEAD:${GITHUB_REF_NAME}" --follow-tags
+  publish_stable:
+    if: github.event_name == 'workflow_dispatch' && !inputs.dry_run
+    needs: verify_stable
+    runs-on: ubuntu-latest
+    timeout-minutes: 45
+    environment: npm-stable
+    permissions:
+      contents: write
+      id-token: write
+
+    steps:
+      - name: Checkout repository
+        uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          ref: ${{ inputs.source_ref }}
+
+      - name: Setup pnpm
+        uses: pnpm/action-setup@v4
+        with:
+          version: 9.15.4
+
+      - name: Setup Node.js
+        uses: actions/setup-node@v4
+        with:
+          node-version: 24
+          cache: pnpm
+
+      - name: Install dependencies
+        run: pnpm install --no-frozen-lockfile
+
+      - name: Restore tracked install-time changes
+        run: git checkout -- pnpm-lock.yaml
+
+      - name: Configure git author
+        run: |
+          git config user.name "github-actions[bot]"
+          git config user.email "41898282+github-actions[bot]@users.noreply.github.com"
+
+      - name: Publish stable
+        env:
+          GITHUB_ACTIONS: "true"
+        run: |
+          args=(stable --skip-verify)
+          if [ -n "${{ inputs.stable_date }}" ]; then
+            args+=(--date "${{ inputs.stable_date }}")
+          fi
+          ./scripts/release.sh "${args[@]}"
+
+      - name: Push stable tag
+        run: |
+          tag="$(git tag --points-at HEAD | grep '^v' | head -1)"
+          if [ -z "$tag" ]; then
+            echo "Error: no stable tag points at HEAD after release." >&2
+            exit 1
+          fi
+          git push origin "refs/tags/${tag}"

      - name: Create GitHub Release
-        if: inputs.channel == 'stable' && !inputs.dry_run
        env:
          GH_TOKEN: ${{ github.token }}
        run: |
--- a/.gitignore
+++ b/.gitignore
@@ -37,7 +37,14 @@ tmp/
 .vscode/
 .claude/settings.local.json
 .paperclip-local/
+/.idea/
+/.agents/
+
+# Doc maintenance cursor
+.doc-review-cursor

 # Playwright
 tests/e2e/test-results/
-tests/e2e/playwright-report/
+tests/e2e/playwright-report/
+.superset/
+.claude/worktrees/
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -78,6 +78,9 @@ If you change schema/API behavior, update all impacted layers:
 4. Do not replace strategic docs wholesale unless asked.
 Prefer additive updates. Keep `doc/SPEC.md` and `doc/SPEC-implementation.md` aligned.

+5. Keep plan docs dated and centralized.
+New plan documents belong in `doc/plans/` and should use `YYYY-MM-DD-slug.md` filenames.
+
 ## 6. Database Change Workflow

 When changing data model:
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -7,6 +7,7 @@ We really appreciate both small fixes and thoughtful larger changes.
 ## Two Paths to Get Your Pull Request Accepted

 ### Path 1: Small, Focused Changes (Fastest way to get merged)
+
 - Pick **one** clear thing to fix/improve
 - Touch the **smallest possible number of files**
 - Make sure the change is very targeted and easy to review
@@ -16,6 +17,7 @@ We really appreciate both small fixes and thoughtful larger changes.
 These almost always get merged quickly when they're clean.

 ### Path 2: Bigger or Impactful Changes
+
 - **First** talk about it in Discord → #dev channel  
  → Describe what you're trying to solve  
  → Share rough ideas / approach
@@ -30,12 +32,43 @@ These almost always get merged quickly when they're clean.
 PRs that follow this path are **much** more likely to be accepted, even when they're large.

 ## General Rules (both paths)
+
 - Write clear commit messages
 - Keep PR title + description meaningful
 - One PR = one logical change (unless it's a small related group)
 - Run tests locally first
 - Be kind in discussions 😄

+## Writing a Good PR message
+
+Please include a "thinking path" at the top of your PR message that explains from the top of the project down to what you fixed. E.g.:
+
+### Thinking Path Example 1:
+
+> - Paperclip orchestrates ai-agents for zero-human companies
+> - There are many types of adapters for each LLM model provider
+> - But LLM's have a context limit and not all agents can automatically compact their context
+> - So we need to have an adapter-specific configuration for which adapters can and cannot automatically compact their context
+> - This pull request adds per-adapter configuration of compaction, either auto or paperclip managed
+> - That way we can get optimal performance from any adapter/provider in Paperclip
+
+### Thinking Path Example 2:
+
+> - Paperclip orchestrates ai-agents for zero-human companies
+> - But humans want to watch the agents and oversee their work
+> - Human users also operate in teams and so they need their own logins, profiles, views etc.
+> - So we have a multi-user system for humans
+> - But humans want to be able to update their own profile picture and avatar
+> - But the avatar upload form wasn't saving the avatar to the file storage system
+> - So this PR fixes the avatar upload form to use the file storage service
+> - The benefit is we don't have a one-off file storage for just one aspect of the system, which would cause confusion and extra configuration
+
+Then have the rest of your normal PR message after the Thinking Path.
+
+This should include details about what you did, why you did it, why it matters & the benefits, how we can verify it works, and any risks.
+
+Please include screenshots if possible if you have a visible change. (use something like the [agent-browser skill](https://github.com/vercel-labs/agent-browser/blob/main/skills/agent-browser/SKILL.md) or similar to take screenshots). Ideally, you include before and after screenshots.
+
 Questions? Just ask in #dev — we're happy to help.

 Happy hacking!
--- a/README.md
+++ b/README.md
@@ -239,7 +239,7 @@ See [doc/DEVELOPING.md](doc/DEVELOPING.md) for the full development guide.
 - ⚪ ClipMart - buy and sell entire agent companies
 - ⚪ Easy agent configurations / easier to understand
 - ⚪ Better support for harness engineering
- ⚪ Plugin system (e.g. if you want to add a knowledgebase, custom tracing, queues, etc)
+- 🟢 Plugin system (e.g. if you want to add a knowledgebase, custom tracing, queues, etc)
 - ⚪ Better docs

 <br/>
--- a/cli/CHANGELOG.md
+++ b/cli/CHANGELOG.md
@@ -1,5 +1,23 @@
 # paperclipai

+## 0.3.1
+
+### Patch Changes
+
+- Stable release preparation for 0.3.1
+- Updated dependencies
+  - @paperclipai/adapter-utils@0.3.1
+  - @paperclipai/adapter-claude-local@0.3.1
+  - @paperclipai/adapter-codex-local@0.3.1
+  - @paperclipai/adapter-cursor-local@0.3.1
+  - @paperclipai/adapter-gemini-local@0.3.1
+  - @paperclipai/adapter-openclaw-gateway@0.3.1
+  - @paperclipai/adapter-opencode-local@0.3.1
+  - @paperclipai/adapter-pi-local@0.3.1
+  - @paperclipai/db@0.3.1
+  - @paperclipai/shared@0.3.1
+  - @paperclipai/server@0.3.1
+
 ## 0.3.0

 ### Minor Changes
--- a/cli/package.json
+++ b/cli/package.json
@@ -1,6 +1,6 @@
 {
  "name": "paperclipai",
-  "version": "0.3.0",
+  "version": "0.3.1",
  "description": "Paperclip CLI — orchestrate AI agent teams to run a business",
  "type": "module",
  "bin": {
@@ -16,10 +16,13 @@
  "license": "MIT",
  "repository": {
    "type": "git",
-    "url": "https://github.com/paperclipai/paperclip.git",
+    "url": "https://github.com/paperclipai/paperclip",
    "directory": "cli"
  },
  "homepage": "https://github.com/paperclipai/paperclip",
+  "bugs": {
+    "url": "https://github.com/paperclipai/paperclip/issues"
+  },
  "files": [
    "dist"
  ],
--- a/cli/src/tests/agent-jwt-env.test.ts
+++ b/cli/src/tests/agent-jwt-env.test.ts
@@ -4,7 +4,9 @@ import path from "node:path";
 import { afterEach, beforeEach, describe, expect, it } from "vitest";
 import {
  ensureAgentJwtSecret,
+  mergePaperclipEnvEntries,
  readAgentJwtSecretFromEnv,
+  readPaperclipEnvEntries,
  resolveAgentJwtEnvFile,
 } from "../config/env.js";
 import { agentJwtSecretCheck } from "../checks/agent-jwt-secret-check.js";
@@ -58,4 +60,20 @@ describe("agent jwt env helpers", () => {
    const result = agentJwtSecretCheck(configPath);
    expect(result.status).toBe("pass");
  });
+
+  it("quotes hash-prefixed env values so dotenv round-trips them", () => {
+    const configPath = tempConfigPath();
+    const envPath = resolveAgentJwtEnvFile(configPath);
+
+    mergePaperclipEnvEntries(
+      {
+        PAPERCLIP_WORKTREE_COLOR: "#439edb",
+      },
+      envPath,
+    );
+
+    const contents = fs.readFileSync(envPath, "utf-8");
+    expect(contents).toContain('PAPERCLIP_WORKTREE_COLOR="#439edb"');
+    expect(readPaperclipEnvEntries(envPath).PAPERCLIP_WORKTREE_COLOR).toBe("#439edb");
+  });
 });
--- a/cli/src/tests/company-delete.test.ts
+++ b/cli/src/tests/company-delete.test.ts
@@ -8,12 +8,16 @@ function makeCompany(overrides: Partial<Company>): Company {
    name: "Alpha",
    description: null,
    status: "active",
+    pauseReason: null,
+    pausedAt: null,
    issuePrefix: "ALP",
    issueCounter: 1,
    budgetMonthlyCents: 0,
    spentMonthlyCents: 0,
    requireBoardApprovalForNewAgents: false,
    brandColor: null,
+    logoAssetId: null,
+    logoUrl: null,
    createdAt: new Date(),
    updatedAt: new Date(),
    ...overrides,
--- a/cli/src/tests/worktree.test.ts
+++ b/cli/src/tests/worktree.test.ts
@@ -7,6 +7,7 @@ import {
  copyGitHooksToWorktreeGitDir,
  copySeededSecretsKey,
  rebindWorkspaceCwd,
+  resolveSourceConfigPath,
  resolveGitWorktreeAddArgs,
  resolveWorktreeMakeTargetPath,
  worktreeInitCommand,
@@ -16,6 +17,7 @@ import {
  buildWorktreeConfig,
  buildWorktreeEnvEntries,
  formatShellExports,
+  generateWorktreeColor,
  resolveWorktreeSeedPlan,
  resolveWorktreeLocalPaths,
  rewriteLocalUrlPort,
@@ -181,13 +183,22 @@ describe("worktree helpers", () => {
      path.resolve("/tmp/paperclip-worktrees", "instances", "feature-worktree-support", "data", "storage"),
    );

-    const env = buildWorktreeEnvEntries(paths);
+    const env = buildWorktreeEnvEntries(paths, {
+      name: "feature-worktree-support",
+      color: "#3abf7a",
+    });
    expect(env.PAPERCLIP_HOME).toBe(path.resolve("/tmp/paperclip-worktrees"));
    expect(env.PAPERCLIP_INSTANCE_ID).toBe("feature-worktree-support");
    expect(env.PAPERCLIP_IN_WORKTREE).toBe("true");
+    expect(env.PAPERCLIP_WORKTREE_NAME).toBe("feature-worktree-support");
+    expect(env.PAPERCLIP_WORKTREE_COLOR).toBe("#3abf7a");
    expect(formatShellExports(env)).toContain("export PAPERCLIP_INSTANCE_ID='feature-worktree-support'");
  });

+  it("generates vivid worktree colors as hex", () => {
+    expect(generateWorktreeColor()).toMatch(/^#[0-9a-f]{6}$/);
+  });
+
  it("uses minimal seed mode to keep app state but drop heavy runtime history", () => {
    const minimal = resolveWorktreeSeedPlan("minimal");
    const full = resolveWorktreeSeedPlan("full");
@@ -280,7 +291,10 @@ describe("worktree helpers", () => {
      });

      const envPath = path.join(repoRoot, ".paperclip", ".env");
-      expect(fs.readFileSync(envPath, "utf8")).toContain("PAPERCLIP_AGENT_JWT_SECRET=worktree-shared-secret");
+      const envContents = fs.readFileSync(envPath, "utf8");
+      expect(envContents).toContain("PAPERCLIP_AGENT_JWT_SECRET=worktree-shared-secret");
+      expect(envContents).toContain("PAPERCLIP_WORKTREE_NAME=repo");
+      expect(envContents).toMatch(/PAPERCLIP_WORKTREE_COLOR=\"#[0-9a-f]{6}\"/);
    } finally {
      process.chdir(originalCwd);
      if (originalJwtSecret === undefined) {
@@ -292,6 +306,59 @@ describe("worktree helpers", () => {
    }
  });

+  it("defaults the seed source config to the current repo-local Paperclip config", () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-source-config-"));
+    const repoRoot = path.join(tempRoot, "repo");
+    const localConfigPath = path.join(repoRoot, ".paperclip", "config.json");
+    const originalCwd = process.cwd();
+    const originalPaperclipConfig = process.env.PAPERCLIP_CONFIG;
+
+    try {
+      fs.mkdirSync(path.dirname(localConfigPath), { recursive: true });
+      fs.writeFileSync(localConfigPath, JSON.stringify(buildSourceConfig()), "utf8");
+      delete process.env.PAPERCLIP_CONFIG;
+      process.chdir(repoRoot);
+
+      expect(fs.realpathSync(resolveSourceConfigPath({}))).toBe(fs.realpathSync(localConfigPath));
+    } finally {
+      process.chdir(originalCwd);
+      if (originalPaperclipConfig === undefined) {
+        delete process.env.PAPERCLIP_CONFIG;
+      } else {
+        process.env.PAPERCLIP_CONFIG = originalPaperclipConfig;
+      }
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  });
+
+  it("preserves the source config path across worktree:make cwd changes", () => {
+    const tempRoot = fs.mkdtempSync(path.join(os.tmpdir(), "paperclip-worktree-source-override-"));
+    const sourceConfigPath = path.join(tempRoot, "source", "config.json");
+    const targetRoot = path.join(tempRoot, "target");
+    const originalCwd = process.cwd();
+    const originalPaperclipConfig = process.env.PAPERCLIP_CONFIG;
+
+    try {
+      fs.mkdirSync(path.dirname(sourceConfigPath), { recursive: true });
+      fs.mkdirSync(targetRoot, { recursive: true });
+      fs.writeFileSync(sourceConfigPath, JSON.stringify(buildSourceConfig()), "utf8");
+      delete process.env.PAPERCLIP_CONFIG;
+      process.chdir(targetRoot);
+
+      expect(resolveSourceConfigPath({ sourceConfigPathOverride: sourceConfigPath })).toBe(
+        path.resolve(sourceConfigPath),
+      );
+    } finally {
+      process.chdir(originalCwd);
+      if (originalPaperclipConfig === undefined) {
+        delete process.env.PAPERCLIP_CONFIG;
+      } else {
+        process.env.PAPERCLIP_CONFIG = originalPaperclipConfig;
+      }
+      fs.rmSync(tempRoot, { recursive: true, force: true });
+    }
+  });
+
  it("rebinds same-repo workspace paths onto the current worktree root", () => {
    expect(
      rebindWorkspaceCwd({
--- a/cli/src/commands/client/agent.ts
+++ b/cli/src/commands/client/agent.ts
@@ -1,5 +1,9 @@
 import { Command } from "commander";
 import type { Agent } from "@paperclipai/shared";
+import {
+  removeMaintainerOnlySkillSymlinks,
+  resolvePaperclipSkillsDir,
+} from "@paperclipai/adapter-utils/server-utils";
 import fs from "node:fs/promises";
 import os from "node:os";
 import path from "node:path";
@@ -34,15 +38,12 @@ interface SkillsInstallSummary {
  tool: "codex" | "claude";
  target: string;
  linked: string[];
+  removed: string[];
  skipped: string[];
  failed: Array<{ name: string; error: string }>;
 }

 const __moduleDir = path.dirname(fileURLToPath(import.meta.url));
-const PAPERCLIP_SKILLS_CANDIDATES = [
-  path.resolve(__moduleDir, "../../../../../skills"), // dev: cli/src/commands/client -> repo root/skills
-  path.resolve(process.cwd(), "skills"),
-];

 function codexSkillsHome(): string {
  const fromEnv = process.env.CODEX_HOME?.trim();
@@ -56,14 +57,6 @@ function claudeSkillsHome(): string {
  return path.join(base, "skills");
 }

-async function resolvePaperclipSkillsDir(): Promise<string | null> {
-  for (const candidate of PAPERCLIP_SKILLS_CANDIDATES) {
-    const isDir = await fs.stat(candidate).then((s) => s.isDirectory()).catch(() => false);
-    if (isDir) return candidate;
-  }
-  return null;
-}
-
 async function installSkillsForTarget(
  sourceSkillsDir: string,
  targetSkillsDir: string,
@@ -73,20 +66,65 @@ async function installSkillsForTarget(
    tool,
    target: targetSkillsDir,
    linked: [],
+    removed: [],
    skipped: [],
    failed: [],
  };

  await fs.mkdir(targetSkillsDir, { recursive: true });
  const entries = await fs.readdir(sourceSkillsDir, { withFileTypes: true });
+  summary.removed = await removeMaintainerOnlySkillSymlinks(
+    targetSkillsDir,
+    entries.filter((entry) => entry.isDirectory()).map((entry) => entry.name),
+  );
  for (const entry of entries) {
    if (!entry.isDirectory()) continue;
    const source = path.join(sourceSkillsDir, entry.name);
    const target = path.join(targetSkillsDir, entry.name);
    const existing = await fs.lstat(target).catch(() => null);
    if (existing) {
-      summary.skipped.push(entry.name);
-      continue;
+      if (existing.isSymbolicLink()) {
+        let linkedPath: string | null = null;
+        try {
+          linkedPath = await fs.readlink(target);
+        } catch (err) {
+          await fs.unlink(target);
+          try {
+            await fs.symlink(source, target);
+            summary.linked.push(entry.name);
+            continue;
+          } catch (linkErr) {
+            summary.failed.push({
+              name: entry.name,
+              error:
+                err instanceof Error && linkErr instanceof Error
+                  ? `${err.message}; then ${linkErr.message}`
+                  : err instanceof Error
+                    ? err.message
+                    : `Failed to recover broken symlink: ${String(err)}`,
+            });
+            continue;
+          }
+        }
+
+        const resolvedLinkedPath = path.isAbsolute(linkedPath)
+          ? linkedPath
+          : path.resolve(path.dirname(target), linkedPath);
+        const linkedTargetExists = await fs
+          .stat(resolvedLinkedPath)
+          .then(() => true)
+          .catch(() => false);
+
+        if (!linkedTargetExists) {
+          await fs.unlink(target);
+        } else {
+          summary.skipped.push(entry.name);
+          continue;
+        }
+      } else {
+        summary.skipped.push(entry.name);
+        continue;
+      }
    }

    try {
@@ -210,7 +248,7 @@ export function registerAgentCommands(program: Command): void {

          const installSummaries: SkillsInstallSummary[] = [];
          if (opts.installSkills !== false) {
-            const skillsDir = await resolvePaperclipSkillsDir();
+            const skillsDir = await resolvePaperclipSkillsDir(__moduleDir, [path.resolve(process.cwd(), "skills")]);
            if (!skillsDir) {
              throw new Error(
                "Could not locate local Paperclip skills directory. Expected ./skills in the repo checkout.",
@@ -258,7 +296,7 @@ export function registerAgentCommands(program: Command): void {
          if (installSummaries.length > 0) {
            for (const summary of installSummaries) {
              console.log(
-                `${summary.tool}: linked=${summary.linked.length} skipped=${summary.skipped.length} failed=${summary.failed.length} target=${summary.target}`,
+                `${summary.tool}: linked=${summary.linked.length} removed=${summary.removed.length} skipped=${summary.skipped.length} failed=${summary.failed.length} target=${summary.target}`,
              );
              for (const failed of summary.failed) {
                console.log(`  failed ${failed.name}: ${failed.error}`);
--- a/cli/src/commands/client/plugin.ts
+++ b/cli/src/commands/client/plugin.ts
@@ -0,0 +1,374 @@
+import path from "node:path";
+import { Command } from "commander";
+import pc from "picocolors";
+import {
+  addCommonClientOptions,
+  handleCommandError,
+  printOutput,
+  resolveCommandContext,
+  type BaseClientOptions,
+} from "./common.js";
+
+// ---------------------------------------------------------------------------
+// Types mirroring server-side shapes
+// ---------------------------------------------------------------------------
+
+interface PluginRecord {
+  id: string;
+  pluginKey: string;
+  packageName: string;
+  version: string;
+  status: string;
+  displayName?: string;
+  lastError?: string | null;
+  installedAt: string;
+  updatedAt: string;
+}
+
+
+// ---------------------------------------------------------------------------
+// Option types
+// ---------------------------------------------------------------------------
+
+interface PluginListOptions extends BaseClientOptions {
+  status?: string;
+}
+
+interface PluginInstallOptions extends BaseClientOptions {
+  local?: boolean;
+  version?: string;
+}
+
+interface PluginUninstallOptions extends BaseClientOptions {
+  force?: boolean;
+}
+
+// ---------------------------------------------------------------------------
+// Helpers
+// ---------------------------------------------------------------------------
+
+/**
+ * Resolve a local path argument to an absolute path so the server can find the
+ * plugin on disk regardless of where the user ran the CLI.
+ */
+function resolvePackageArg(packageArg: string, isLocal: boolean): string {
+  if (!isLocal) return packageArg;
+  // Already absolute
+  if (path.isAbsolute(packageArg)) return packageArg;
+  // Expand leading ~ to home directory
+  if (packageArg.startsWith("~")) {
+    const home = process.env.HOME ?? process.env.USERPROFILE ?? "";
+    return path.resolve(home, packageArg.slice(1).replace(/^[\\/]/, ""));
+  }
+  return path.resolve(process.cwd(), packageArg);
+}
+
+function formatPlugin(p: PluginRecord): string {
+  const statusColor =
+    p.status === "ready"
+      ? pc.green(p.status)
+      : p.status === "error"
+        ? pc.red(p.status)
+        : p.status === "disabled"
+          ? pc.dim(p.status)
+          : pc.yellow(p.status);
+
+  const parts = [
+    `key=${pc.bold(p.pluginKey)}`,
+    `status=${statusColor}`,
+    `version=${p.version}`,
+    `id=${pc.dim(p.id)}`,
+  ];
+
+  if (p.lastError) {
+    parts.push(`error=${pc.red(p.lastError.slice(0, 80))}`);
+  }
+
+  return parts.join("  ");
+}
+
+// ---------------------------------------------------------------------------
+// Command registration
+// ---------------------------------------------------------------------------
+
+export function registerPluginCommands(program: Command): void {
+  const plugin = program.command("plugin").description("Plugin lifecycle management");
+
+  // -------------------------------------------------------------------------
+  // plugin list
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("list")
+      .description("List installed plugins")
+      .option("--status <status>", "Filter by status (ready, error, disabled, installed, upgrade_pending)")
+      .action(async (opts: PluginListOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const qs = opts.status ? `?status=${encodeURIComponent(opts.status)}` : "";
+          const plugins = await ctx.api.get<PluginRecord[]>(`/api/plugins${qs}`);
+
+          if (ctx.json) {
+            printOutput(plugins, { json: true });
+            return;
+          }
+
+          const rows = plugins ?? [];
+          if (rows.length === 0) {
+            console.log(pc.dim("No plugins installed."));
+            return;
+          }
+
+          for (const p of rows) {
+            console.log(formatPlugin(p));
+          }
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin install <package-or-path>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("install <package>")
+      .description(
+        "Install a plugin from a local path or npm package.\n" +
+          "  Examples:\n" +
+          "    paperclipai plugin install ./my-plugin              # local path\n" +
+          "    paperclipai plugin install @acme/plugin-linear      # npm package\n" +
+          "    paperclipai plugin install @acme/plugin-linear@1.2  # pinned version",
+      )
+      .option("-l, --local", "Treat <package> as a local filesystem path", false)
+      .option("--version <version>", "Specific npm version to install (npm packages only)")
+      .action(async (packageArg: string, opts: PluginInstallOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+
+          // Auto-detect local paths: starts with . or / or ~ or is an absolute path
+          const isLocal =
+            opts.local ||
+            packageArg.startsWith("./") ||
+            packageArg.startsWith("../") ||
+            packageArg.startsWith("/") ||
+            packageArg.startsWith("~");
+
+          const resolvedPackage = resolvePackageArg(packageArg, isLocal);
+
+          if (!ctx.json) {
+            console.log(
+              pc.dim(
+                isLocal
+                  ? `Installing plugin from local path: ${resolvedPackage}`
+                  : `Installing plugin: ${resolvedPackage}${opts.version ? `@${opts.version}` : ""}`,
+              ),
+            );
+          }
+
+          const installedPlugin = await ctx.api.post<PluginRecord>("/api/plugins/install", {
+            packageName: resolvedPackage,
+            version: opts.version,
+            isLocalPath: isLocal,
+          });
+
+          if (ctx.json) {
+            printOutput(installedPlugin, { json: true });
+            return;
+          }
+
+          if (!installedPlugin) {
+            console.log(pc.dim("Install returned no plugin record."));
+            return;
+          }
+
+          console.log(
+            pc.green(
+              `✓ Installed ${pc.bold(installedPlugin.pluginKey)} v${installedPlugin.version} (${installedPlugin.status})`,
+            ),
+          );
+
+          if (installedPlugin.lastError) {
+            console.log(pc.red(`  Warning: ${installedPlugin.lastError}`));
+          }
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin uninstall <plugin-key-or-id>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("uninstall <pluginKey>")
+      .description(
+        "Uninstall a plugin by its plugin key or database ID.\n" +
+          "  Use --force to hard-purge all state and config.",
+      )
+      .option("--force", "Purge all plugin state and config (hard delete)", false)
+      .action(async (pluginKey: string, opts: PluginUninstallOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const purge = opts.force === true;
+          const qs = purge ? "?purge=true" : "";
+
+          if (!ctx.json) {
+            console.log(
+              pc.dim(
+                purge
+                  ? `Uninstalling and purging plugin: ${pluginKey}`
+                  : `Uninstalling plugin: ${pluginKey}`,
+              ),
+            );
+          }
+
+          const result = await ctx.api.delete<PluginRecord | null>(
+            `/api/plugins/${encodeURIComponent(pluginKey)}${qs}`,
+          );
+
+          if (ctx.json) {
+            printOutput(result, { json: true });
+            return;
+          }
+
+          console.log(pc.green(`✓ Uninstalled ${pc.bold(pluginKey)}${purge ? " (purged)" : ""}`));
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin enable <plugin-key-or-id>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("enable <pluginKey>")
+      .description("Enable a disabled or errored plugin")
+      .action(async (pluginKey: string, opts: BaseClientOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const result = await ctx.api.post<PluginRecord>(
+            `/api/plugins/${encodeURIComponent(pluginKey)}/enable`,
+          );
+
+          if (ctx.json) {
+            printOutput(result, { json: true });
+            return;
+          }
+
+          console.log(pc.green(`✓ Enabled ${pc.bold(pluginKey)} — status: ${result?.status ?? "unknown"}`));
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin disable <plugin-key-or-id>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("disable <pluginKey>")
+      .description("Disable a running plugin without uninstalling it")
+      .action(async (pluginKey: string, opts: BaseClientOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const result = await ctx.api.post<PluginRecord>(
+            `/api/plugins/${encodeURIComponent(pluginKey)}/disable`,
+          );
+
+          if (ctx.json) {
+            printOutput(result, { json: true });
+            return;
+          }
+
+          console.log(pc.dim(`Disabled ${pc.bold(pluginKey)} — status: ${result?.status ?? "unknown"}`));
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin inspect <plugin-key-or-id>
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("inspect <pluginKey>")
+      .description("Show full details for an installed plugin")
+      .action(async (pluginKey: string, opts: BaseClientOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const result = await ctx.api.get<PluginRecord>(
+            `/api/plugins/${encodeURIComponent(pluginKey)}`,
+          );
+
+          if (ctx.json) {
+            printOutput(result, { json: true });
+            return;
+          }
+
+          if (!result) {
+            console.log(pc.red(`Plugin not found: ${pluginKey}`));
+            process.exit(1);
+          }
+
+          console.log(formatPlugin(result));
+          if (result.lastError) {
+            console.log(`\n${pc.red("Last error:")}\n${result.lastError}`);
+          }
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+
+  // -------------------------------------------------------------------------
+  // plugin examples
+  // -------------------------------------------------------------------------
+  addCommonClientOptions(
+    plugin
+      .command("examples")
+      .description("List bundled example plugins available for local install")
+      .action(async (opts: BaseClientOptions) => {
+        try {
+          const ctx = resolveCommandContext(opts);
+          const examples = await ctx.api.get<
+            Array<{
+              packageName: string;
+              pluginKey: string;
+              displayName: string;
+              description: string;
+              localPath: string;
+              tag: string;
+            }>
+          >("/api/plugins/examples");
+
+          if (ctx.json) {
+            printOutput(examples, { json: true });
+            return;
+          }
+
+          const rows = examples ?? [];
+          if (rows.length === 0) {
+            console.log(pc.dim("No bundled examples available."));
+            return;
+          }
+
+          for (const ex of rows) {
+            console.log(
+              `${pc.bold(ex.displayName)}  ${pc.dim(ex.pluginKey)}\n` +
+                `  ${ex.description}\n` +
+                `  ${pc.cyan(`paperclipai plugin install ${ex.localPath}`)}`,
+            );
+          }
+        } catch (err) {
+          handleCommandError(err);
+        }
+      }),
+  );
+}
--- a/cli/src/commands/worktree-lib.ts
+++ b/cli/src/commands/worktree-lib.ts
@@ -1,3 +1,4 @@
+import { randomInt } from "node:crypto";
 import path from "node:path";
 import type { PaperclipConfig } from "../config/schema.js";
 import { expandHomePrefix } from "../config/home.js";
@@ -44,6 +45,11 @@ export type WorktreeLocalPaths = {
  storageDir: string;
 };

+export type WorktreeUiBranding = {
+  name: string;
+  color: string;
+};
+
 export function isWorktreeSeedMode(value: string): value is WorktreeSeedMode {
  return (WORKTREE_SEED_MODES as readonly string[]).includes(value);
 }
@@ -87,6 +93,51 @@ export function resolveSuggestedWorktreeName(cwd: string, explicitName?: string)
  return nonEmpty(explicitName) ?? path.basename(path.resolve(cwd));
 }

+function hslComponentToHex(n: number): string {
+  return Math.round(Math.max(0, Math.min(255, n)))
+    .toString(16)
+    .padStart(2, "0");
+}
+
+function hslToHex(hue: number, saturation: number, lightness: number): string {
+  const s = Math.max(0, Math.min(100, saturation)) / 100;
+  const l = Math.max(0, Math.min(100, lightness)) / 100;
+  const c = (1 - Math.abs((2 * l) - 1)) * s;
+  const h = ((hue % 360) + 360) % 360;
+  const x = c * (1 - Math.abs(((h / 60) % 2) - 1));
+  const m = l - (c / 2);
+
+  let r = 0;
+  let g = 0;
+  let b = 0;
+
+  if (h < 60) {
+    r = c;
+    g = x;
+  } else if (h < 120) {
+    r = x;
+    g = c;
+  } else if (h < 180) {
+    g = c;
+    b = x;
+  } else if (h < 240) {
+    g = x;
+    b = c;
+  } else if (h < 300) {
+    r = x;
+    b = c;
+  } else {
+    r = c;
+    b = x;
+  }
+
+  return `#${hslComponentToHex((r + m) * 255)}${hslComponentToHex((g + m) * 255)}${hslComponentToHex((b + m) * 255)}`;
+}
+
+export function generateWorktreeColor(): string {
+  return hslToHex(randomInt(0, 360), 68, 56);
+}
+
 export function resolveWorktreeLocalPaths(opts: {
  cwd: string;
  homeDir?: string;
@@ -196,13 +247,18 @@ export function buildWorktreeConfig(input: {
  };
 }

-export function buildWorktreeEnvEntries(paths: WorktreeLocalPaths): Record<string, string> {
+export function buildWorktreeEnvEntries(
+  paths: WorktreeLocalPaths,
+  branding?: WorktreeUiBranding,
+): Record<string, string> {
  return {
    PAPERCLIP_HOME: paths.homeDir,
    PAPERCLIP_INSTANCE_ID: paths.instanceId,
    PAPERCLIP_CONFIG: paths.configPath,
    PAPERCLIP_CONTEXT: paths.contextPath,
    PAPERCLIP_IN_WORKTREE: "true",
+    ...(branding?.name ? { PAPERCLIP_WORKTREE_NAME: branding.name } : {}),
+    ...(branding?.color ? { PAPERCLIP_WORKTREE_COLOR: branding.color } : {}),
  };
 }

--- a/cli/src/commands/worktree.ts
+++ b/cli/src/commands/worktree.ts
@@ -39,6 +39,7 @@ import {
  buildWorktreeEnvEntries,
  DEFAULT_WORKTREE_HOME,
  formatShellExports,
+  generateWorktreeColor,
  isWorktreeSeedMode,
  resolveSuggestedWorktreeName,
  resolveWorktreeSeedPlan,
@@ -55,6 +56,7 @@ type WorktreeInitOptions = {
  fromConfig?: string;
  fromDataDir?: string;
  fromInstance?: string;
+  sourceConfigPathOverride?: string;
  serverPort?: number;
  dbPort?: number;
  seed?: boolean;
@@ -83,6 +85,7 @@ type EmbeddedPostgresCtor = new (opts: {
  password: string;
  port: number;
  persistent: boolean;
+  initdbFlags?: string[];
  onLog?: (message: unknown) => void;
  onError?: (message: unknown) => void;
 }) => EmbeddedPostgresInstance;
@@ -127,6 +130,8 @@ function isCurrentSourceConfigPath(sourceConfigPath: string): boolean {
  return path.resolve(currentConfigPath) === path.resolve(sourceConfigPath);
 }

+const WORKTREE_NAME_PREFIX = "paperclip-";
+
 function resolveWorktreeMakeName(name: string): string {
  const value = nonEmpty(name);
  if (!value) {
@@ -137,7 +142,15 @@ function resolveWorktreeMakeName(name: string): string {
      "Worktree name must contain only letters, numbers, dots, underscores, or dashes.",
    );
  }
-  return value;
+  return value.startsWith(WORKTREE_NAME_PREFIX) ? value : `${WORKTREE_NAME_PREFIX}${value}`;
+}
+
+function resolveWorktreeHome(explicit?: string): string {
+  return explicit ?? process.env.PAPERCLIP_WORKTREES_DIR ?? DEFAULT_WORKTREE_HOME;
+}
+
+function resolveWorktreeStartPoint(explicit?: string): string | undefined {
+  return explicit ?? nonEmpty(process.env.PAPERCLIP_WORKTREE_START_POINT) ?? undefined;
 }

 export function resolveWorktreeMakeTargetPath(name: string): string {
@@ -414,8 +427,12 @@ async function rebindSeededProjectWorkspaces(input: {
  }
 }

-function resolveSourceConfigPath(opts: WorktreeInitOptions): string {
+export function resolveSourceConfigPath(opts: WorktreeInitOptions): string {
+  if (opts.sourceConfigPathOverride) return path.resolve(opts.sourceConfigPathOverride);
  if (opts.fromConfig) return path.resolve(opts.fromConfig);
+  if (!opts.fromDataDir && !opts.fromInstance) {
+    return resolveConfigPath();
+  }
  const sourceHome = path.resolve(expandHomePrefix(opts.fromDataDir ?? "~/.paperclip"));
  const sourceInstanceId = sanitizeWorktreeInstanceId(opts.fromInstance ?? "default");
  return path.resolve(sourceHome, "instances", sourceInstanceId, "config.json");
@@ -514,6 +531,7 @@ async function ensureEmbeddedPostgres(dataDir: string, preferredPort: number): P
    password: "paperclip",
    port,
    persistent: true,
+    initdbFlags: ["--encoding=UTF8", "--locale=C"],
    onLog: () => {},
    onError: () => {},
  });
@@ -611,7 +629,7 @@ async function seedWorktreeDatabase(input: {

 async function runWorktreeInit(opts: WorktreeInitOptions): Promise<void> {
  const cwd = process.cwd();
-  const name = resolveSuggestedWorktreeName(
+  const worktreeName = resolveSuggestedWorktreeName(
    cwd,
    opts.name ?? detectGitBranchName(cwd) ?? undefined,
  );
@@ -619,12 +637,16 @@ async function runWorktreeInit(opts: WorktreeInitOptions): Promise<void> {
  if (!isWorktreeSeedMode(seedMode)) {
    throw new Error(`Unsupported seed mode "${seedMode}". Expected one of: minimal, full.`);
  }
-  const instanceId = sanitizeWorktreeInstanceId(opts.instance ?? name);
+  const instanceId = sanitizeWorktreeInstanceId(opts.instance ?? worktreeName);
  const paths = resolveWorktreeLocalPaths({
    cwd,
-    homeDir: opts.home ?? DEFAULT_WORKTREE_HOME,
+    homeDir: resolveWorktreeHome(opts.home),
    instanceId,
  });
+  const branding = {
+    name: worktreeName,
+    color: generateWorktreeColor(),
+  };
  const sourceConfigPath = resolveSourceConfigPath(opts);
  const sourceConfig = existsSync(sourceConfigPath) ? readConfig(sourceConfigPath) : null;

@@ -657,7 +679,7 @@ async function runWorktreeInit(opts: WorktreeInitOptions): Promise<void> {
    nonEmpty(process.env.PAPERCLIP_AGENT_JWT_SECRET);
  mergePaperclipEnvEntries(
    {
-      ...buildWorktreeEnvEntries(paths),
+      ...buildWorktreeEnvEntries(paths, branding),
      ...(existingAgentJwtSecret ? { PAPERCLIP_AGENT_JWT_SECRET: existingAgentJwtSecret } : {}),
    },
    paths.envPath,
@@ -698,6 +720,7 @@ async function runWorktreeInit(opts: WorktreeInitOptions): Promise<void> {
  p.log.message(pc.dim(`Repo env: ${paths.envPath}`));
  p.log.message(pc.dim(`Isolated home: ${paths.homeDir}`));
  p.log.message(pc.dim(`Instance: ${paths.instanceId}`));
+  p.log.message(pc.dim(`Worktree badge: ${branding.name} (${branding.color})`));
  p.log.message(pc.dim(`Server port: ${serverPort} | DB port: ${databasePort}`));
  if (copiedGitHooks?.copied) {
    p.log.message(
@@ -731,15 +754,17 @@ export async function worktreeMakeCommand(nameArg: string, opts: WorktreeMakeOpt
  p.intro(pc.bgCyan(pc.black(" paperclipai worktree:make ")));

  const name = resolveWorktreeMakeName(nameArg);
+  const startPoint = resolveWorktreeStartPoint(opts.startPoint);
  const sourceCwd = process.cwd();
+  const sourceConfigPath = resolveSourceConfigPath(opts);
  const targetPath = resolveWorktreeMakeTargetPath(name);
  if (existsSync(targetPath)) {
    throw new Error(`Target path already exists: ${targetPath}`);
  }

  mkdirSync(path.dirname(targetPath), { recursive: true });
-  if (opts.startPoint) {
-    const [remote] = opts.startPoint.split("/", 1);
+  if (startPoint) {
+    const [remote] = startPoint.split("/", 1);
    try {
      execFileSync("git", ["fetch", remote], {
        cwd: sourceCwd,
@@ -755,8 +780,8 @@ export async function worktreeMakeCommand(nameArg: string, opts: WorktreeMakeOpt
  const worktreeArgs = resolveGitWorktreeAddArgs({
    branchName: name,
    targetPath,
-    branchExists: !opts.startPoint && localBranchExists(sourceCwd, name),
-    startPoint: opts.startPoint,
+    branchExists: !startPoint && localBranchExists(sourceCwd, name),
+    startPoint,
  });

  const spinner = p.spinner();
@@ -791,6 +816,7 @@ export async function worktreeMakeCommand(nameArg: string, opts: WorktreeMakeOpt
    await runWorktreeInit({
      ...opts,
      name,
+      sourceConfigPathOverride: sourceConfigPath,
    });
  } catch (error) {
    throw error;
@@ -799,6 +825,232 @@ export async function worktreeMakeCommand(nameArg: string, opts: WorktreeMakeOpt
  }
 }

+type WorktreeCleanupOptions = {
+  instance?: string;
+  home?: string;
+  force?: boolean;
+};
+
+type GitWorktreeListEntry = {
+  worktree: string;
+  branch: string | null;
+  bare: boolean;
+  detached: boolean;
+};
+
+function parseGitWorktreeList(cwd: string): GitWorktreeListEntry[] {
+  const raw = execFileSync("git", ["worktree", "list", "--porcelain"], {
+    cwd,
+    encoding: "utf8",
+    stdio: ["ignore", "pipe", "pipe"],
+  });
+  const entries: GitWorktreeListEntry[] = [];
+  let current: Partial<GitWorktreeListEntry> = {};
+  for (const line of raw.split("\n")) {
+    if (line.startsWith("worktree ")) {
+      current = { worktree: line.slice("worktree ".length) };
+    } else if (line.startsWith("branch ")) {
+      current.branch = line.slice("branch ".length);
+    } else if (line === "bare") {
+      current.bare = true;
+    } else if (line === "detached") {
+      current.detached = true;
+    } else if (line === "" && current.worktree) {
+      entries.push({
+        worktree: current.worktree,
+        branch: current.branch ?? null,
+        bare: current.bare ?? false,
+        detached: current.detached ?? false,
+      });
+      current = {};
+    }
+  }
+  if (current.worktree) {
+    entries.push({
+      worktree: current.worktree,
+      branch: current.branch ?? null,
+      bare: current.bare ?? false,
+      detached: current.detached ?? false,
+    });
+  }
+  return entries;
+}
+
+function branchHasUniqueCommits(cwd: string, branchName: string): boolean {
+  try {
+    const output = execFileSync(
+      "git",
+      ["log", "--oneline", branchName, "--not", "--remotes", "--exclude", `refs/heads/${branchName}`, "--branches"],
+      { cwd, encoding: "utf8", stdio: ["ignore", "pipe", "pipe"] },
+    ).trim();
+    return output.length > 0;
+  } catch {
+    return false;
+  }
+}
+
+function branchExistsOnAnyRemote(cwd: string, branchName: string): boolean {
+  try {
+    const output = execFileSync(
+      "git",
+      ["branch", "-r", "--list", `*/${branchName}`],
+      { cwd, encoding: "utf8", stdio: ["ignore", "pipe", "pipe"] },
+    ).trim();
+    return output.length > 0;
+  } catch {
+    return false;
+  }
+}
+
+function worktreePathHasUncommittedChanges(worktreePath: string): boolean {
+  try {
+    const output = execFileSync(
+      "git",
+      ["status", "--porcelain"],
+      { cwd: worktreePath, encoding: "utf8", stdio: ["ignore", "pipe", "pipe"] },
+    ).trim();
+    return output.length > 0;
+  } catch {
+    return false;
+  }
+}
+
+export async function worktreeCleanupCommand(nameArg: string, opts: WorktreeCleanupOptions): Promise<void> {
+  printPaperclipCliBanner();
+  p.intro(pc.bgCyan(pc.black(" paperclipai worktree:cleanup ")));
+
+  const name = resolveWorktreeMakeName(nameArg);
+  const sourceCwd = process.cwd();
+  const targetPath = resolveWorktreeMakeTargetPath(name);
+  const instanceId = sanitizeWorktreeInstanceId(opts.instance ?? name);
+  const homeDir = path.resolve(expandHomePrefix(resolveWorktreeHome(opts.home)));
+  const instanceRoot = path.resolve(homeDir, "instances", instanceId);
+
+  // ── 1. Assess current state ──────────────────────────────────────────
+
+  const hasBranch = localBranchExists(sourceCwd, name);
+  const hasTargetDir = existsSync(targetPath);
+  const hasInstanceData = existsSync(instanceRoot);
+
+  const worktrees = parseGitWorktreeList(sourceCwd);
+  const linkedWorktree = worktrees.find(
+    (wt) => wt.branch === `refs/heads/${name}` || path.resolve(wt.worktree) === path.resolve(targetPath),
+  );
+
+  if (!hasBranch && !hasTargetDir && !hasInstanceData && !linkedWorktree) {
+    p.log.info("Nothing to clean up — no branch, worktree directory, or instance data found.");
+    p.outro(pc.green("Already clean."));
+    return;
+  }
+
+  // ── 2. Safety checks ────────────────────────────────────────────────
+
+  const problems: string[] = [];
+
+  if (hasBranch && branchHasUniqueCommits(sourceCwd, name)) {
+    const onRemote = branchExistsOnAnyRemote(sourceCwd, name);
+    if (onRemote) {
+      p.log.info(
+        `Branch "${name}" has unique local commits, but the branch also exists on a remote — safe to delete locally.`,
+      );
+    } else {
+      problems.push(
+        `Branch "${name}" has commits not found on any other branch or remote. ` +
+          `Deleting it will lose work. Push it first, or use --force.`,
+      );
+    }
+  }
+
+  if (hasTargetDir && worktreePathHasUncommittedChanges(targetPath)) {
+    problems.push(
+      `Worktree directory ${targetPath} has uncommitted changes. Commit or stash first, or use --force.`,
+    );
+  }
+
+  if (problems.length > 0 && !opts.force) {
+    for (const problem of problems) {
+      p.log.error(problem);
+    }
+    throw new Error("Safety checks failed. Resolve the issues above or re-run with --force.");
+  }
+  if (problems.length > 0 && opts.force) {
+    for (const problem of problems) {
+      p.log.warning(`Overridden by --force: ${problem}`);
+    }
+  }
+
+  // ── 3. Clean up (idempotent steps) ──────────────────────────────────
+
+  // 3a. Remove the git worktree registration
+  if (linkedWorktree) {
+    const worktreeDirExists = existsSync(linkedWorktree.worktree);
+    const spinner = p.spinner();
+    if (worktreeDirExists) {
+      spinner.start(`Removing git worktree at ${linkedWorktree.worktree}...`);
+      try {
+        const removeArgs = ["worktree", "remove", linkedWorktree.worktree];
+        if (opts.force) removeArgs.push("--force");
+        execFileSync("git", removeArgs, {
+          cwd: sourceCwd,
+          stdio: ["ignore", "pipe", "pipe"],
+        });
+        spinner.stop(`Removed git worktree at ${linkedWorktree.worktree}.`);
+      } catch (error) {
+        spinner.stop(pc.yellow(`Could not remove worktree cleanly, will prune instead.`));
+        p.log.warning(extractExecSyncErrorMessage(error) ?? String(error));
+      }
+    } else {
+      spinner.start("Pruning stale worktree entry...");
+      execFileSync("git", ["worktree", "prune"], {
+        cwd: sourceCwd,
+        stdio: ["ignore", "pipe", "pipe"],
+      });
+      spinner.stop("Pruned stale worktree entry.");
+    }
+  } else {
+    // Even without a linked worktree, prune to clean up any orphaned entries
+    execFileSync("git", ["worktree", "prune"], {
+      cwd: sourceCwd,
+      stdio: ["ignore", "pipe", "pipe"],
+    });
+  }
+
+  // 3b. Remove the worktree directory if it still exists (e.g. partial creation)
+  if (existsSync(targetPath)) {
+    const spinner = p.spinner();
+    spinner.start(`Removing worktree directory ${targetPath}...`);
+    rmSync(targetPath, { recursive: true, force: true });
+    spinner.stop(`Removed worktree directory ${targetPath}.`);
+  }
+
+  // 3c. Delete the local branch (now safe — worktree is gone)
+  if (localBranchExists(sourceCwd, name)) {
+    const spinner = p.spinner();
+    spinner.start(`Deleting local branch "${name}"...`);
+    try {
+      const deleteFlag = opts.force ? "-D" : "-d";
+      execFileSync("git", ["branch", deleteFlag, name], {
+        cwd: sourceCwd,
+        stdio: ["ignore", "pipe", "pipe"],
+      });
+      spinner.stop(`Deleted local branch "${name}".`);
+    } catch (error) {
+      spinner.stop(pc.yellow(`Could not delete branch "${name}".`));
+      p.log.warning(extractExecSyncErrorMessage(error) ?? String(error));
+    }
+  }
+
+  // 3d. Remove instance data
+  if (existsSync(instanceRoot)) {
+    const spinner = p.spinner();
+    spinner.start(`Removing instance data at ${instanceRoot}...`);
+    rmSync(instanceRoot, { recursive: true, force: true });
+    spinner.stop(`Removed instance data at ${instanceRoot}.`);
+  }
+
+  p.outro(pc.green("Cleanup complete."));
+}
+
 export async function worktreeEnvCommand(opts: WorktreeEnvOptions): Promise<void> {
  const configPath = resolveConfigPath(opts.config);
  const envPath = resolvePaperclipEnvFile(configPath);
@@ -825,10 +1077,10 @@ export function registerWorktreeCommands(program: Command): void {
  program
    .command("worktree:make")
    .description("Create ~/NAME as a git worktree, then initialize an isolated Paperclip instance inside it")
-    .argument("<name>", "Worktree directory and branch name (created at ~/NAME)")
-    .option("--start-point <ref>", "Remote ref to base the new branch on (e.g. origin/main)")
+    .argument("<name>", "Worktree name — auto-prefixed with paperclip- if needed (created at ~/paperclip-NAME)")
+    .option("--start-point <ref>", "Remote ref to base the new branch on (env: PAPERCLIP_WORKTREE_START_POINT)")
    .option("--instance <id>", "Explicit isolated instance id")
-    .option("--home <path>", `Home root for worktree instances (default: ${DEFAULT_WORKTREE_HOME})`)
+    .option("--home <path>", `Home root for worktree instances (env: PAPERCLIP_WORKTREES_DIR, default: ${DEFAULT_WORKTREE_HOME})`)
    .option("--from-config <path>", "Source config.json to seed from")
    .option("--from-data-dir <path>", "Source PAPERCLIP_HOME used when deriving the source config")
    .option("--from-instance <id>", "Source instance id when deriving the source config", "default")
@@ -844,7 +1096,7 @@ export function registerWorktreeCommands(program: Command): void {
    .description("Create repo-local config/env and an isolated instance for this worktree")
    .option("--name <name>", "Display name used to derive the instance id")
    .option("--instance <id>", "Explicit isolated instance id")
-    .option("--home <path>", `Home root for worktree instances (default: ${DEFAULT_WORKTREE_HOME})`)
+    .option("--home <path>", `Home root for worktree instances (env: PAPERCLIP_WORKTREES_DIR, default: ${DEFAULT_WORKTREE_HOME})`)
    .option("--from-config <path>", "Source config.json to seed from")
    .option("--from-data-dir <path>", "Source PAPERCLIP_HOME used when deriving the source config")
    .option("--from-instance <id>", "Source instance id when deriving the source config", "default")
@@ -861,4 +1113,13 @@ export function registerWorktreeCommands(program: Command): void {
    .option("-c, --config <path>", "Path to config file")
    .option("--json", "Print JSON instead of shell exports")
    .action(worktreeEnvCommand);
+
+  program
+    .command("worktree:cleanup")
+    .description("Safely remove a worktree, its branch, and its isolated instance data")
+    .argument("<name>", "Worktree name — auto-prefixed with paperclip- if needed")
+    .option("--instance <id>", "Explicit instance id (if different from the worktree name)")
+    .option("--home <path>", `Home root for worktree instances (env: PAPERCLIP_WORKTREES_DIR, default: ${DEFAULT_WORKTREE_HOME})`)
+    .option("--force", "Bypass safety checks (uncommitted changes, unique commits)", false)
+    .action(worktreeCleanupCommand);
 }
--- a/cli/src/config/env.ts
+++ b/cli/src/config/env.ts
@@ -22,11 +22,18 @@ function parseEnvFile(contents: string) {
  }
 }

+function formatEnvValue(value: string): string {
+  if (/^[A-Za-z0-9_./:@-]+$/.test(value)) {
+    return value;
+  }
+  return JSON.stringify(value);
+}
+
 function renderEnvFile(entries: Record<string, string>) {
  const lines = [
    "# Paperclip environment variables",
    "# Generated by Paperclip CLI commands",
-    ...Object.entries(entries).map(([key, value]) => `${key}=${value}`),
+    ...Object.entries(entries).map(([key, value]) => `${key}=${formatEnvValue(value)}`),
    "",
  ];
  return lines.join("\n");
--- a/cli/src/index.ts
+++ b/cli/src/index.ts
@@ -18,6 +18,7 @@ import { registerDashboardCommands } from "./commands/client/dashboard.js";
 import { applyDataDirOverride, type DataDirOptionLike } from "./config/data-dir.js";
 import { loadPaperclipEnvFile } from "./config/env.js";
 import { registerWorktreeCommands } from "./commands/worktree.js";
+import { registerPluginCommands } from "./commands/client/plugin.js";

 const program = new Command();
 const DATA_DIR_OPTION_HELP =
@@ -136,6 +137,7 @@ registerApprovalCommands(program);
 registerActivityCommands(program);
 registerDashboardCommands(program);
 registerWorktreeCommands(program);
+registerPluginCommands(program);

 const auth = program.command("auth").description("Authentication and bootstrap utilities");

--- a/doc/DEVELOPING.md
+++ b/doc/DEVELOPING.md
@@ -19,9 +19,9 @@ Current implementation status:

 GitHub Actions owns `pnpm-lock.yaml`.

- Same-repo pull requests that change dependency manifests are auto-refreshed by GitHub Actions before merge.
- Fork pull requests that change dependency manifests must include the refreshed `pnpm-lock.yaml`.
- Pull request CI validates lockfile freshness when manifests change and verifies with `--frozen-lockfile`.
+- Do not commit `pnpm-lock.yaml` in pull requests.
+- Pull request CI validates dependency resolution when manifests change.
+- Pushes to `master` regenerate `pnpm-lock.yaml` with `pnpm install --lockfile-only --no-frozen-lockfile`, commit it back if needed, and then run verification with `--frozen-lockfile`.

 ## Start Dev

@@ -89,6 +89,10 @@ docker compose -f docker-compose.quickstart.yml up --build

 See `doc/DOCKER.md` for API key wiring (`OPENAI_API_KEY` / `ANTHROPIC_API_KEY`) and persistence details.

+## Docker For Untrusted PR Review
+
+For a separate review-oriented container that keeps `codex`/`claude` login state in Docker volumes and checks out PRs into an isolated scratch workspace, see `doc/UNTRUSTED-PR-REVIEW.md`.
+
 ## Database in Dev (Auto-Handled)

 For local development, leave `DATABASE_URL` unset.
@@ -142,7 +146,7 @@ This command:
 - creates an isolated instance under `~/.paperclip-worktrees/instances/<worktree-id>/`
 - when run inside a linked git worktree, mirrors the effective git hooks into that worktree's private git dir
 - picks a free app port and embedded PostgreSQL port
- by default seeds the isolated DB in `minimal` mode from your main instance via a logical SQL snapshot
+- by default seeds the isolated DB in `minimal` mode from the current effective Paperclip instance/config (repo-local worktree config when present, otherwise the default instance) via a logical SQL snapshot

 Seed modes:

@@ -152,7 +156,13 @@ Seed modes:

 After `worktree init`, both the server and the CLI auto-load the repo-local `.paperclip/.env` when run inside that worktree, so normal commands like `pnpm dev`, `paperclipai doctor`, and `paperclipai db:backup` stay scoped to the worktree instance.

-That repo-local env also sets `PAPERCLIP_IN_WORKTREE=true`, which the server can use for worktree-specific UI behavior such as an alternate favicon.
+That repo-local env also sets:
+
+- `PAPERCLIP_IN_WORKTREE=true`
+- `PAPERCLIP_WORKTREE_NAME=<worktree-name>`
+- `PAPERCLIP_WORKTREE_COLOR=<hex-color>`
+
+The server/UI use those values for worktree-specific branding such as the top banner and dynamically colored favicon.

 Print shell exports explicitly when needed:

--- a/doc/DOCKER.md
+++ b/doc/DOCKER.md
@@ -93,6 +93,12 @@ Notes:
 - Without API keys, the app still runs normally.
 - Adapter environment checks in Paperclip will surface missing auth/CLI prerequisites.

+## Untrusted PR Review Container
+
+If you want a separate Docker environment for reviewing untrusted pull requests with `codex` or `claude`, use the dedicated review workflow in `doc/UNTRUSTED-PR-REVIEW.md`.
+
+That setup keeps CLI auth state in Docker volumes instead of your host home directory and uses a separate scratch workspace for PR checkouts and preview runs.
+
 ## Onboard Smoke Test (Ubuntu + npm only)

 Use this when you want to mimic a fresh machine that only has Ubuntu + npm and verify:
--- a/doc/PRODUCT.md
+++ b/doc/PRODUCT.md
@@ -94,3 +94,53 @@ Canonical mode design and command expectations live in `doc/DEPLOYMENT-MODES.md`
 ## Further Detail

 See [SPEC.md](./SPEC.md) for the full technical specification and [TASKS.md](./TASKS.md) for the task management data model.
+
+---
+
+Paperclip’s core identity is a **control plane for autonomous AI companies**, centered on **companies, org charts, goals, issues/comments, heartbeats, budgets, approvals, and board governance**. The public docs are also explicit about the current boundaries: **tasks/comments are the built-in communication model**, Paperclip is **not a chatbot**, and it is **not a code review tool**. The roadmap already points toward **easier onboarding, cloud agents, easier agent configuration, plugins, better docs, and ClipMart/ClipHub-style reusable companies/templates**.
+
+## What Paperclip should do vs. not do
+
+**Do**
+
+- Stay **board-level and company-level**. Users should manage goals, orgs, budgets, approvals, and outputs.
+- Make the first five minutes feel magical: install, answer a few questions, see a CEO do something real.
+- Keep work anchored to **issues/comments/projects/goals**, even if the surface feels conversational.
+- Treat **agency / internal team / startup** as the same underlying abstraction with different templates and labels.
+- Make outputs first-class: files, docs, reports, previews, links, screenshots.
+- Provide **hooks into engineering workflows**: worktrees, preview servers, PR links, external review tools.
+- Use **plugins** for edge cases like rich chat, knowledge bases, doc editors, custom tracing.
+
+**Do not**
+
+- Do not make the core product a general chat app. The current product definition is explicitly task/comment-centric and “not a chatbot,” and that boundary is valuable.
+- Do not build a complete Jira/GitHub replacement. The repo/docs already position Paperclip as organization orchestration, not focused on pull-request review.
+- Do not build enterprise-grade RBAC first. The current V1 spec still treats multi-board governance and fine-grained human permissions as out of scope, so the first multi-user version should be coarse and company-scoped.
+- Do not lead with raw bash logs and transcripts. Default view should be human-readable intent/progress, with raw detail beneath.
+- Do not force users to understand provider/API-key plumbing unless absolutely necessary. There are active onboarding/auth issues already; friction here is clearly real.
+
+## Specific design goals
+
+1. **Time-to-first-success under 5 minutes**
+   A fresh user should go from install to “my CEO completed a first task” in one sitting.
+
+2. **Board-level abstraction always wins**
+   The default UI should answer: what is the company doing, who is doing it, why does it matter, what did it cost, and what needs my approval.
+
+3. **Conversation stays attached to work objects**
+   “Chat with CEO” should still resolve to strategy threads, decisions, tasks, or approvals.
+
+4. **Progressive disclosure**
+   Top layer: human-readable summary. Middle layer: checklist/steps/artifacts. Bottom layer: raw logs/tool calls/transcript.
+
+5. **Output-first**
+   Work is not done until the user can see the result: file, document, preview link, screenshot, plan, or PR.
+
+6. **Local-first, cloud-ready**
+   The mental model should not change between local solo use and shared/private or public/cloud deployment.
+
+7. **Safe autonomy**
+   Auto mode is allowed; hidden token burn is not.
+
+8. **Thin core, rich edges**
+   Put optional chat, knowledge, and special surfaces into plugins/extensions rather than bloating the control plane.
--- a/doc/PUBLISHING.md
+++ b/doc/PUBLISHING.md
@@ -1,18 +1,19 @@
 # Publishing to npm

-Low-level reference for how Paperclip packages are built for npm.
+Low-level reference for how Paperclip packages are prepared and published to npm.

-For the maintainer release workflow, use [doc/RELEASING.md](RELEASING.md). This document is only about packaging internals and the scripts that produce publishable artifacts.
+For the maintainer workflow, use [doc/RELEASING.md](RELEASING.md). This document focuses on packaging internals.

 ## Current Release Entry Points

-Use these scripts instead of older one-off publish commands:
+Use these scripts:

- [`scripts/release-start.sh`](../scripts/release-start.sh) to create or resume `release/X.Y.Z`
- [`scripts/release-preflight.sh`](../scripts/release-preflight.sh) before any canary or stable release
- [`scripts/release.sh`](../scripts/release.sh) for canary and stable npm publishes
- [`scripts/rollback-latest.sh`](../scripts/rollback-latest.sh) to repoint `latest` during rollback
- [`scripts/create-github-release.sh`](../scripts/create-github-release.sh) after pushing the stable branch tag
+- [`scripts/release.sh`](../scripts/release.sh) for canary and stable publish flows
+- [`scripts/create-github-release.sh`](../scripts/create-github-release.sh) after pushing a stable tag
+- [`scripts/rollback-latest.sh`](../scripts/rollback-latest.sh) to repoint `latest`
+- [`scripts/build-npm.sh`](../scripts/build-npm.sh) for the CLI packaging build
+
+Paperclip no longer uses release branches or Changesets for publishing.

 ## Why the CLI needs special packaging

@@ -23,7 +24,7 @@ The CLI package, `paperclipai`, imports code from workspace packages such as:
 - `@paperclipai/shared`
 - adapter packages under `packages/adapters/`

-Those workspace references use `workspace:*` during development. npm cannot install those references directly for end users, so the release build has to transform the CLI into a publishable standalone package.
+Those workspace references are valid in development but not in a publishable npm package. The release flow rewrites versions temporarily, then builds a publishable CLI bundle.

 ## `build-npm.sh`

@@ -33,89 +34,107 @@ Run:
 ./scripts/build-npm.sh
 ```

-This script does six things:
+This script:

-1. Runs the forbidden token check unless `--skip-checks` is supplied
-2. Runs `pnpm -r typecheck`
-3. Bundles the CLI entrypoint with esbuild into `cli/dist/index.js`
-4. Verifies the bundled entrypoint with `node --check`
-5. Rewrites `cli/package.json` into a publishable npm manifest and stores the dev copy as `cli/package.dev.json`
-6. Copies the repo `README.md` into `cli/README.md` for npm package metadata
+1. runs the forbidden token check unless `--skip-checks` is supplied
+2. runs `pnpm -r typecheck`
+3. bundles the CLI entrypoint with esbuild into `cli/dist/index.js`
+4. verifies the bundled entrypoint with `node --check`
+5. rewrites `cli/package.json` into a publishable npm manifest and stores the dev copy as `cli/package.dev.json`
+6. copies the repo `README.md` into `cli/README.md` for npm metadata

-`build-npm.sh` is used by the release script so that npm users install a real package rather than unresolved workspace dependencies.
+After the release script exits, the dev manifest and temporary files are restored automatically.

-## Publishable CLI layout
+## Package discovery and versioning

-During development, [`cli/package.json`](../cli/package.json) contains workspace references.
-
-During release preparation:
-
- `cli/package.json` becomes a publishable manifest with external npm dependency ranges
- `cli/package.dev.json` stores the development manifest temporarily
- `cli/dist/index.js` contains the bundled CLI entrypoint
- `cli/README.md` is copied in for npm metadata
-
-After release finalization, the release script restores the development manifest and removes the temporary README copy.
-
-## Package discovery
-
-The release tooling scans the workspace for public packages under:
+Public packages are discovered from:

 - `packages/`
 - `server/`
 - `cli/`

-`ui/` remains ignored for npm publishing because it is private.
+`ui/` is ignored because it is private.

-This matters because all public packages are versioned and published together as one release unit.
+The version rewrite step now uses [`scripts/release-package-map.mjs`](../scripts/release-package-map.mjs), which:

-## Canary packaging model
+- finds all public packages
+- sorts them topologically by internal dependencies
+- rewrites each package version to the target release version
+- rewrites internal `workspace:*` dependency references to the exact target version
+- updates the CLI's displayed version string

-Canaries are published as semver prereleases such as:
+Those rewrites are temporary. The working tree is restored after publish or dry-run.

- `1.2.3-canary.0`
- `1.2.3-canary.1`
+## Version formats

-They are published under the npm dist-tag `canary`.
+Paperclip uses calendar versions:

-This means:
+- stable: `YYYY.M.D`
+- canary: `YYYY.M.D-canary.N`

- `npx paperclipai@canary onboard` can install them explicitly
- `npx paperclipai onboard` continues to resolve `latest`
- the stable changelog can stay at `releases/v1.2.3.md`
+Examples:

-## Stable packaging model
+- stable: `2026.3.17`
+- canary: `2026.3.17-canary.2`

-Stable releases publish normal semver versions such as `1.2.3` under the npm dist-tag `latest`.
+## Publish model

-The stable publish flow also creates the local release commit and git tag on `release/X.Y.Z`. Pushing that branch commit/tag, creating the GitHub Release, and merging the release branch back to `master` happen afterward as separate maintainer steps.
+### Canary
+
+Canaries publish under the npm dist-tag `canary`.
+
+Example:
+
+- `paperclipai@2026.3.17-canary.2`
+
+This keeps the default install path unchanged while allowing explicit installs with:
+
+```bash
+npx paperclipai@canary onboard
+```
+
+### Stable
+
+Stable publishes use the npm dist-tag `latest`.
+
+Example:
+
+- `paperclipai@2026.3.17`
+
+Stable publishes do not create a release commit. Instead:
+
+- package versions are rewritten temporarily
+- packages are published from the chosen source commit
+- git tag `vYYYY.M.D` points at that original commit
+
+## Trusted publishing
+
+The intended CI model is npm trusted publishing through GitHub OIDC.
+
+That means:
+
+- no long-lived `NPM_TOKEN` in repository secrets
+- GitHub Actions obtains short-lived publish credentials
+- trusted publisher rules are configured per workflow file
+
+See [doc/RELEASE-AUTOMATION-SETUP.md](RELEASE-AUTOMATION-SETUP.md) for the GitHub/npm setup steps.

 ## Rollback model

-Rollback does not unpublish packages.
+Rollback does not unpublish anything.

-Instead, the maintainer should move the `latest` dist-tag back to the previous good stable version with:
+It repoints the `latest` dist-tag to a prior stable version:

 ```bash
-./scripts/rollback-latest.sh <stable-version>
+./scripts/rollback-latest.sh 2026.3.16
 ```

-That keeps history intact while restoring the default install path quickly.
-
-## Notes for CI
-
-The repo includes a manual GitHub Actions release workflow at [`.github/workflows/release.yml`](../.github/workflows/release.yml).
-
-Recommended CI release setup:
-
- use npm trusted publishing via GitHub OIDC
- require approval through the `npm-release` environment
- run releases from `release/X.Y.Z`
- use canary first, then stable
+This is the fastest way to restore the default install path if a stable release is bad.

 ## Related Files

 - [`scripts/build-npm.sh`](../scripts/build-npm.sh)
 - [`scripts/generate-npm-package-json.mjs`](../scripts/generate-npm-package-json.mjs)
+- [`scripts/release-package-map.mjs`](../scripts/release-package-map.mjs)
 - [`cli/esbuild.config.mjs`](../cli/esbuild.config.mjs)
 - [`doc/RELEASING.md`](RELEASING.md)
--- a/doc/RELEASE-AUTOMATION-SETUP.md
+++ b/doc/RELEASE-AUTOMATION-SETUP.md
@@ -0,0 +1,274 @@
+# Release Automation Setup
+
+This document covers the GitHub and npm setup required for the current Paperclip release model:
+
+- automatic canaries from `master`
+- manual stable promotion from a chosen source ref
+- npm trusted publishing via GitHub OIDC
+- protected release infrastructure in a public repository
+
+Repo-side files that depend on this setup:
+
+- `.github/workflows/release.yml`
+- `.github/CODEOWNERS`
+
+Note:
+
+- the release workflows intentionally use `pnpm install --no-frozen-lockfile`
+- this matches the repo's current policy where `pnpm-lock.yaml` is refreshed by GitHub automation after manifest changes land on `master`
+- the publish jobs then restore `pnpm-lock.yaml` before running `scripts/release.sh`, so the release script still sees a clean worktree
+
+## 1. Merge the Repo Changes First
+
+Before touching GitHub or npm settings, merge the release automation code so the referenced workflow filenames already exist on the default branch.
+
+Required files:
+
+- `.github/workflows/release.yml`
+- `.github/CODEOWNERS`
+
+## 2. Configure npm Trusted Publishing
+
+Do this for every public package that Paperclip publishes.
+
+At minimum that includes:
+
+- `paperclipai`
+- `@paperclipai/server`
+- public packages under `packages/`
+
+### 2.1. In npm, open each package settings page
+
+For each package:
+
+1. open npm as an owner of the package
+2. go to the package settings / publishing access area
+3. add a trusted publisher for the GitHub repository `paperclipai/paperclip`
+
+### 2.2. Add one trusted publisher entry per package
+
+npm currently allows one trusted publisher configuration per package.
+
+Configure:
+
+- workflow: `.github/workflows/release.yml`
+
+Repository:
+
+- `paperclipai/paperclip`
+
+Environment name:
+
+- leave the npm trusted-publisher environment field blank
+
+Why:
+
+- the single `release.yml` workflow handles both canary and stable publishing
+- GitHub environments `npm-canary` and `npm-stable` still enforce different approval rules on the GitHub side
+
+### 2.3. Verify trusted publishing before removing old auth
+
+After the workflows are live:
+
+1. run a canary publish
+2. confirm npm publish succeeds without any `NPM_TOKEN`
+3. run a stable dry-run
+4. run one real stable publish
+
+Only after that should you remove old token-based access.
+
+## 3. Remove Legacy npm Tokens
+
+After trusted publishing works:
+
+1. revoke any repository or organization `NPM_TOKEN` secrets used for publish
+2. revoke any personal automation token that used to publish Paperclip
+3. if npm offers a package-level setting to restrict publishing to trusted publishers, enable it
+
+Goal:
+
+- no long-lived npm publishing token should remain in GitHub Actions
+
+## 4. Create GitHub Environments
+
+Create two environments in the GitHub repository:
+
+- `npm-canary`
+- `npm-stable`
+
+Path:
+
+1. GitHub repository
+2. `Settings`
+3. `Environments`
+4. `New environment`
+
+## 5. Configure `npm-canary`
+
+Recommended settings for `npm-canary`:
+
+- environment name: `npm-canary`
+- required reviewers: none
+- wait timer: none
+- deployment branches and tags:
+  - selected branches only
+  - allow `master`
+
+Reasoning:
+
+- every push to `master` should be able to publish a canary automatically
+- no human approval should be required for canaries
+
+## 6. Configure `npm-stable`
+
+Recommended settings for `npm-stable`:
+
+- environment name: `npm-stable`
+- required reviewers: at least one maintainer other than the person triggering the workflow when possible
+- prevent self-review: enabled
+- admin bypass: disabled if your team can tolerate it
+- wait timer: optional
+- deployment branches and tags:
+  - selected branches only
+  - allow `master`
+
+Reasoning:
+
+- stable publishes should require an explicit human approval gate
+- the workflow is manual, but the environment should still be the real control point
+
+## 7. Protect `master`
+
+Open the branch protection settings for `master`.
+
+Recommended rules:
+
+1. require pull requests before merging
+2. require status checks to pass before merging
+3. require review from code owners
+4. dismiss stale approvals when new commits are pushed
+5. restrict who can push directly to `master`
+
+At minimum, make sure workflow and release script changes cannot land without review.
+
+## 8. Enforce CODEOWNERS Review
+
+This repo now includes `.github/CODEOWNERS`, but GitHub only enforces it if branch protection requires code owner reviews.
+
+In branch protection for `master`, enable:
+
+- `Require review from Code Owners`
+
+Then verify the owner entries are correct for your actual maintainer set.
+
+Current file:
+
+- `.github/CODEOWNERS`
+
+If `@cryppadotta` is not the right reviewer identity in the public repo, change it before enabling enforcement.
+
+## 9. Protect Release Infrastructure Specifically
+
+These files should always trigger code owner review:
+
+- `.github/workflows/release.yml`
+- `scripts/release.sh`
+- `scripts/release-lib.sh`
+- `scripts/release-package-map.mjs`
+- `scripts/create-github-release.sh`
+- `scripts/rollback-latest.sh`
+- `doc/RELEASING.md`
+- `doc/PUBLISHING.md`
+
+If you want stronger controls, add a repository ruleset that explicitly blocks direct pushes to:
+
+- `.github/workflows/**`
+- `scripts/release*`
+
+## 10. Do Not Store a Claude Token in GitHub Actions
+
+Do not add a personal Claude or Anthropic token for automatic changelog generation.
+
+Recommended policy:
+
+- stable changelog generation happens locally from a trusted maintainer machine
+- canaries never generate changelogs
+
+This keeps LLM spending intentional and avoids a high-value token sitting in Actions.
+
+## 11. Verify the Canary Workflow
+
+After setup:
+
+1. merge a harmless commit to `master`
+2. open the `Release` workflow run triggered by that push
+3. confirm it passes verification
+4. confirm publish succeeds under the `npm-canary` environment
+5. confirm npm now shows a new `canary` release
+6. confirm a git tag named `canary/vYYYY.M.D-canary.N` was pushed
+
+Install-path check:
+
+```bash
+npx paperclipai@canary onboard
+```
+
+## 12. Verify the Stable Workflow
+
+After at least one good canary exists:
+
+1. prepare `releases/vYYYY.M.D.md` on the source commit you want to promote
+2. open `Actions` -> `Release`
+3. run it with:
+   - `source_ref`: the tested commit SHA or canary tag source commit
+   - `stable_date`: leave blank or set the intended UTC date
+   - `dry_run`: `true`
+4. confirm the dry-run succeeds
+5. rerun with `dry_run: false`
+6. approve the `npm-stable` environment when prompted
+7. confirm npm `latest` points to the new stable version
+8. confirm git tag `vYYYY.M.D` exists
+9. confirm the GitHub Release was created
+
+## 13. Suggested Maintainer Policy
+
+Use this policy going forward:
+
+- canaries are automatic and cheap
+- stables are manual and approved
+- only stables get public notes and announcements
+- release notes are committed before stable publish
+- rollback uses `npm dist-tag`, not unpublish
+
+## 14. Troubleshooting
+
+### Trusted publishing fails with an auth error
+
+Check:
+
+1. the workflow filename on GitHub exactly matches the filename configured in npm
+2. the package has the trusted publisher entry for the correct repository
+3. the job has `id-token: write`
+4. the job is running from the expected repository, not a fork
+
+### Stable workflow runs but never asks for approval
+
+Check:
+
+1. the `publish` job uses environment `npm-stable`
+2. the environment actually has required reviewers configured
+3. the workflow is running in the canonical repository, not a fork
+
+### CODEOWNERS does not trigger
+
+Check:
+
+1. `.github/CODEOWNERS` is on the default branch
+2. branch protection on `master` requires code owner review
+3. the owner identities in the file are valid reviewers with repository access
+
+## Related Docs
+
+- [doc/RELEASING.md](RELEASING.md)
+- [doc/PUBLISHING.md](PUBLISHING.md)
+- [doc/plans/2026-03-17-release-automation-and-versioning.md](plans/2026-03-17-release-automation-and-versioning.md)
--- a/doc/RELEASING.md
+++ b/doc/RELEASING.md
@@ -1,74 +1,66 @@
 # Releasing Paperclip

-Maintainer runbook for shipping a full Paperclip release across npm, GitHub, and the website-facing changelog surface.
+Maintainer runbook for shipping Paperclip across npm, GitHub, and the website-facing changelog surface.

-The release model is branch-driven:
+The release model is now commit-driven:

-1. Start a release train on `release/X.Y.Z`
-2. Draft the stable changelog on that branch
-3. Publish one or more canaries from that branch
-4. Publish stable from that same branch head
-5. Push the branch commit and tag
-6. Create the GitHub Release
-7. Merge `release/X.Y.Z` back to `master` without squash or rebase
+1. Every push to `master` publishes a canary automatically.
+2. Stable releases are manually promoted from a chosen tested commit or canary tag.
+3. Stable release notes live in `releases/vYYYY.M.D.md`.
+4. Only stable releases get GitHub Releases.
+
+## Versioning Model
+
+Paperclip uses calendar versions that still fit semver syntax:
+
+- stable: `YYYY.M.D`
+- canary: `YYYY.M.D-canary.N`
+
+Examples:
+
+- stable on March 17, 2026: `2026.3.17`
+- fourth canary on March 17, 2026: `2026.3.17-canary.3`
+
+Important constraints:
+
+- do not use leading zeroes such as `2026.03.17`
+- do not use four numeric segments such as `2026.03.17.1`
+- the semver-safe canary form is `2026.3.17-canary.1`

 ## Release Surfaces

-Every release has four separate surfaces:
+Every stable release has four separate surfaces:

 1. **Verification** — the exact git SHA passes typecheck, tests, and build
 2. **npm** — `paperclipai` and public workspace packages are published
 3. **GitHub** — the stable release gets a git tag and GitHub Release
 4. **Website / announcements** — the stable changelog is published externally and announced

-A release is done only when all four surfaces are handled.
+A stable release is done only when all four surfaces are handled.
+
+Canaries only cover the first two surfaces plus an internal traceability tag.

 ## Core Invariants

- Canary and stable for `X.Y.Z` must come from the same `release/X.Y.Z` branch.
- The release scripts must run from the matching `release/X.Y.Z` branch.
- Once `vX.Y.Z` exists locally, on GitHub, or on npm, that release train is frozen.
- Do not squash-merge or rebase-merge a release branch PR back to `master`.
- The stable changelog is always `releases/vX.Y.Z.md`. Never create canary changelog files.
-
-The reason for the merge rule is simple: the tag must keep pointing at the exact published commit. Squash or rebase breaks that property.
+- canaries publish from `master`
+- stables publish from an explicitly chosen source ref
+- tags point at the original source commit, not a generated release commit
+- stable notes are always `releases/vYYYY.M.D.md`
+- canaries never create GitHub Releases
+- canaries never require changelog generation

 ## TL;DR

-### 1. Start the release train
+### Canary

-Use this to compute the next version, create or resume the branch, create or resume a dedicated worktree, and push the branch to GitHub.
+Every push to `master` runs the canary path inside [`.github/workflows/release.yml`](../.github/workflows/release.yml).

-```bash
-./scripts/release-start.sh patch
-```
+It:

-That script:
-
- fetches the release remote and tags
- computes the next stable version from the latest `v*` tag
- creates or resumes `release/X.Y.Z`
- creates or resumes a dedicated worktree
- pushes the branch to the remote by default
- refuses to reuse a frozen release train
-
-### 2. Draft the stable changelog
-
-From the release worktree:
-
-```bash
-VERSION=X.Y.Z
-claude --print --output-format stream-json --verbose --dangerously-skip-permissions --model claude-opus-4-6 "Use the release-changelog skill to draft or update releases/v${VERSION}.md for Paperclip. Read doc/RELEASING.md and .agents/skills/release-changelog/SKILL.md, then generate the stable changelog for v${VERSION} from commits since the last stable tag. Do not create a canary changelog."
-```
-
-### 3. Verify and publish a canary
-
-```bash
-./scripts/release-preflight.sh canary patch
-./scripts/release.sh patch --canary --dry-run
-./scripts/release.sh patch --canary
-PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
-```
+- verifies the pushed commit
+- computes the canary version for the current UTC date
+- publishes under npm dist-tag `canary`
+- creates a git tag `canary/vYYYY.M.D-canary.N`

 Users install canaries with:

@@ -76,145 +68,91 @@ Users install canaries with:
 npx paperclipai@canary onboard
 ```

-### 4. Publish stable
+### Stable
+
+Use [`.github/workflows/release.yml`](../.github/workflows/release.yml) from the Actions tab with the manual `workflow_dispatch` inputs.
+
+Inputs:
+
+- `source_ref`
+  - commit SHA, branch, or tag
+- `stable_date`
+  - optional UTC date override in `YYYY-MM-DD`
+- `dry_run`
+  - preview only when true
+
+Before running stable:
+
+1. pick the canary commit or tag you trust
+2. create or update `releases/vYYYY.M.D.md` on that source ref
+3. run the stable workflow from that source ref
+
+The workflow:
+
+- re-verifies the exact source ref
+- publishes `YYYY.M.D` under npm dist-tag `latest`
+- creates git tag `vYYYY.M.D`
+- creates or updates the GitHub Release from `releases/vYYYY.M.D.md`
+
+## Local Commands
+
+### Preview a canary locally

 ```bash
-./scripts/release-preflight.sh stable patch
-./scripts/release.sh patch --dry-run
-./scripts/release.sh patch
-git push public-gh HEAD --follow-tags
-./scripts/create-github-release.sh X.Y.Z
+./scripts/release.sh canary --dry-run
 ```

-Then open a PR from `release/X.Y.Z` to `master` and merge without squash or rebase.
-
-## Release Branches
-
-Paperclip uses one release branch per target stable version:
-
- `release/0.3.0`
- `release/0.3.1`
- `release/1.0.0`
-
-Do not create separate per-canary branches like `canary/0.3.0-1`. A canary is just a prerelease snapshot of the same stable train.
-
-## Script Entry Points
-
- [`scripts/release-start.sh`](../scripts/release-start.sh) — create or resume the release train branch/worktree
- [`scripts/release-preflight.sh`](../scripts/release-preflight.sh) — validate branch, version plan, git/npm state, and verification gate
- [`scripts/release.sh`](../scripts/release.sh) — publish canary or stable from the release branch
- [`scripts/create-github-release.sh`](../scripts/create-github-release.sh) — create or update the GitHub Release after pushing the tag
- [`scripts/rollback-latest.sh`](../scripts/rollback-latest.sh) — repoint `latest` to the last good stable version
-
-## Detailed Workflow
-
-### 1. Start or resume the release train
-
-Run:
+### Preview a stable locally

 ```bash
-./scripts/release-start.sh <patch|minor|major>
+./scripts/release.sh stable --dry-run
 ```

-Useful options:
+### Publish a stable locally
+
+This is mainly for emergency/manual use. The normal path is the GitHub workflow.

 ```bash
-./scripts/release-start.sh patch --dry-run
-./scripts/release-start.sh minor --worktree-dir ../paperclip-release-0.4.0
-./scripts/release-start.sh patch --no-push
+./scripts/release.sh stable
+git push public-gh refs/tags/vYYYY.M.D
+./scripts/create-github-release.sh YYYY.M.D
 ```

-The script is intentionally idempotent:
+## Stable Changelog Workflow

- if `release/X.Y.Z` already exists locally, it reuses it
- if the branch already exists on the remote, it resumes it locally
- if the branch is already checked out in another worktree, it points you there
- if `vX.Y.Z` already exists locally, remotely, or on npm, it refuses to reuse that train
+Stable changelog files live at:

-### 2. Write the stable changelog early
+- `releases/vYYYY.M.D.md`

-Create or update:
+Canaries do not get changelog files.

- `releases/vX.Y.Z.md`
-
-That file is for the eventual stable release. It should not include `-canary` in the filename or heading.
-
-Recommended structure:
-
- `Breaking Changes` when needed
- `Highlights`
- `Improvements`
- `Fixes`
- `Upgrade Guide` when needed
- `Contributors` — @-mention every contributor by GitHub username (no emails)
-
-Package-level `CHANGELOG.md` files are generated as part of the release mechanics. They are not the main release narrative.
-
-### 3. Run release preflight
-
-From the `release/X.Y.Z` worktree:
+Recommended local generation flow:

 ```bash
-./scripts/release-preflight.sh canary <patch|minor|major>
-# or
-./scripts/release-preflight.sh stable <patch|minor|major>
+VERSION=2026.3.17
+claude --print --output-format stream-json --verbose --dangerously-skip-permissions --model claude-opus-4-6 "Use the release-changelog skill to draft or update releases/v${VERSION}.md for Paperclip. Read doc/RELEASING.md and .agents/skills/release-changelog/SKILL.md, then generate the stable changelog for v${VERSION} from commits since the last stable tag. Do not create a canary changelog."
 ```

-The preflight script now checks all of the following before it runs the verification gate:
+The repo intentionally does not run this through GitHub Actions because:

- the worktree is clean, including untracked files
- the current branch matches the computed `release/X.Y.Z`
- the release train is not frozen
- the target version is still free on npm
- the target tag does not already exist locally or remotely
- whether the remote release branch already exists
- whether `releases/vX.Y.Z.md` is present
+- canaries are too frequent
+- stable notes are the only public narrative surface that needs LLM help
+- maintainer LLM tokens should not live in Actions

-Then it runs:
+## Smoke Testing

-```bash
-pnpm -r typecheck
-pnpm test:run
-pnpm build
-```
-
-### 4. Publish one or more canaries
-
-Run:
-
-```bash
-./scripts/release.sh <patch|minor|major> --canary --dry-run
-./scripts/release.sh <patch|minor|major> --canary
-```
-
-Result:
-
- npm gets a prerelease such as `1.2.3-canary.0` under dist-tag `canary`
- `latest` is unchanged
- no git tag is created
- no GitHub Release is created
- the worktree returns to clean after the script finishes
-
-Guardrails:
-
- the script refuses to run from the wrong branch
- the script refuses to publish from a frozen train
- the canary is always derived from the next stable version
- if the stable notes file is missing, the script warns before you forget it
-
-Concrete example:
-
- if the latest stable is `0.2.7`, a patch canary targets `0.2.8-canary.0`
- `0.2.7-canary.N` is invalid because `0.2.7` is already stable
-
-### 5. Smoke test the canary
-
-Run the actual install path in Docker:
+For a canary:

 ```bash
 PAPERCLIPAI_VERSION=canary ./scripts/docker-onboard-smoke.sh
 ```

+For the current stable:
+
+```bash
+PAPERCLIPAI_VERSION=latest ./scripts/docker-onboard-smoke.sh
+```
+
 Useful isolated variants:

 ```bash
@@ -222,14 +160,6 @@ HOST_PORT=3232 DATA_DIR=./data/release-smoke-canary PAPERCLIPAI_VERSION=canary .
 HOST_PORT=3233 DATA_DIR=./data/release-smoke-stable PAPERCLIPAI_VERSION=latest ./scripts/docker-onboard-smoke.sh
 ```

-If you want to exercise onboarding from the current committed ref instead of npm, use:
-
-```bash
-./scripts/clean-onboard-ref.sh
-PAPERCLIP_PORT=3234 ./scripts/clean-onboard-ref.sh
-./scripts/clean-onboard-ref.sh HEAD
-```
-
 Minimum checks:

 - `npx paperclipai@canary onboard` installs
@@ -238,185 +168,59 @@ Minimum checks:
 - the UI loads
 - basic company creation and dashboard load work

-If smoke testing fails:
+## Rollback

-1. stop the stable release
-2. fix the issue on the same `release/X.Y.Z` branch
-3. publish another canary
-4. rerun smoke testing
+Rollback does not unpublish versions.

-### 6. Publish stable from the same release branch
-
-Once the branch head is vetted, run:
+It only moves the `latest` dist-tag back to a previous stable:

 ```bash
-./scripts/release.sh <patch|minor|major> --dry-run
-./scripts/release.sh <patch|minor|major>
+./scripts/rollback-latest.sh 2026.3.16 --dry-run
+./scripts/rollback-latest.sh 2026.3.16
 ```

-Stable publish:
-
- publishes `X.Y.Z` to npm under `latest`
- creates the local release commit
- creates the local tag `vX.Y.Z`
-
-Stable publish refuses to proceed if:
-
- the current branch is not `release/X.Y.Z`
- the remote release branch does not exist yet
- the stable notes file is missing
- the target tag already exists locally or remotely
- the stable version already exists on npm
-
-Those checks intentionally freeze the train after stable publish.
-
-### 7. Push the stable branch commit and tag
-
-After stable publish succeeds:
-
-```bash
-git push public-gh HEAD --follow-tags
-./scripts/create-github-release.sh X.Y.Z
-```
-
-The GitHub Release notes come from:
-
- `releases/vX.Y.Z.md`
-
-### 8. Merge the release branch back to `master`
-
-Open a PR:
-
- base: `master`
- head: `release/X.Y.Z`
-
-Merge rule:
-
- allowed: merge commit or fast-forward
- forbidden: squash merge
- forbidden: rebase merge
-
-Post-merge verification:
-
-```bash
-git fetch public-gh --tags
-git merge-base --is-ancestor "vX.Y.Z" "public-gh/master"
-```
-
-That command must succeed. If it fails, the published tagged commit is not reachable from `master`, which means the merge strategy was wrong.
-
-### 9. Finish the external surfaces
-
-After GitHub is correct:
-
- publish the changelog on the website
- write and send the announcement copy
- ensure public docs and install guidance point to the stable version
-
-## GitHub Actions Release
-
-There is also a manual workflow at [`.github/workflows/release.yml`](../.github/workflows/release.yml).
-
-Use it from the Actions tab on the relevant `release/X.Y.Z` branch:
-
-1. Choose `Release`
-2. Choose `channel`: `canary` or `stable`
-3. Choose `bump`: `patch`, `minor`, or `major`
-4. Choose whether this is a `dry_run`
-5. Run it from the release branch, not from `master`
-
-The workflow:
-
- reruns `typecheck`, `test:run`, and `build`
- gates publish behind the `npm-release` environment
- can publish canaries without touching `latest`
- can publish stable, push the stable branch commit and tag, and create the GitHub Release
-
-It does not merge the release branch back to `master` for you.
-
-## Release Checklist
-
-### Before any publish
-
- [ ] The release train exists on `release/X.Y.Z`
- [ ] The working tree is clean, including untracked files
- [ ] If package manifests changed, the CI-owned `pnpm-lock.yaml` refresh is already merged on `master` before the train is cut
- [ ] The required verification gate passed on the exact branch head you want to publish
- [ ] The bump type is correct for the user-visible impact
- [ ] The stable changelog file exists or is ready at `releases/vX.Y.Z.md`
- [ ] You know which previous stable version you would roll back to if needed
-
-### Before a stable
-
- [ ] The candidate has already passed smoke testing
- [ ] The remote `release/X.Y.Z` branch exists
- [ ] You are ready to push the stable branch commit and tag immediately after npm publish
- [ ] You are ready to create the GitHub Release immediately after the push
- [ ] You are ready to open the PR back to `master`
-
-### After a stable
-
- [ ] `npm view paperclipai@latest version` matches the new stable version
- [ ] The git tag exists on GitHub
- [ ] The GitHub Release exists and uses `releases/vX.Y.Z.md`
- [ ] `vX.Y.Z` is reachable from `master`
- [ ] The website changelog is updated
- [ ] Announcement copy matches the stable release, not the canary
+Then fix forward with a new stable release date.

 ## Failure Playbooks

-### If the canary publishes but the smoke test fails
+### If the canary publishes but smoke testing fails

-Do not publish stable.
+Do not run stable.

 Instead:

-1. fix the issue on `release/X.Y.Z`
-2. publish another canary
-3. rerun smoke testing
+1. fix the issue on `master`
+2. merge the fix
+3. wait for the next automatic canary
+4. rerun smoke testing

-### If stable npm publish succeeds but push or GitHub release creation fails
+### If stable npm publish succeeds but tag push or GitHub release creation fails

 This is a partial release. npm is already live.

 Do this immediately:

-1. fix the git or GitHub issue from the same checkout
-2. push the stable branch commit and tag
-3. create the GitHub Release
+1. push the missing tag
+2. rerun `./scripts/create-github-release.sh YYYY.M.D`
+3. verify the GitHub Release notes point at `releases/vYYYY.M.D.md`

 Do not republish the same version.

 ### If `latest` is broken after stable publish

-Preview:
+Roll back the dist-tag:

 ```bash
-./scripts/rollback-latest.sh X.Y.Z --dry-run
+./scripts/rollback-latest.sh YYYY.M.D
 ```

-Roll back:
+Then fix forward with a new stable release.

-```bash
-./scripts/rollback-latest.sh X.Y.Z
-```
+## Related Files

-This does not unpublish anything. It only moves the `latest` dist-tag back to the last good stable release.
-
-Then fix forward with a new patch release.
-
-### If the GitHub Release notes are wrong
-
-Re-run:
-
-```bash
-./scripts/create-github-release.sh X.Y.Z
-```
-
-If the release already exists, the script updates it.
-
-## Related Docs
-
- [doc/PUBLISHING.md](PUBLISHING.md) — low-level npm build and packaging internals
- [.agents/skills/release/SKILL.md](../.agents/skills/release/SKILL.md) — maintainer release coordination workflow
- [.agents/skills/release-changelog/SKILL.md](../.agents/skills/release-changelog/SKILL.md) — stable changelog drafting workflow
+- [`scripts/release.sh`](../scripts/release.sh)
+- [`scripts/release-package-map.mjs`](../scripts/release-package-map.mjs)
+- [`scripts/create-github-release.sh`](../scripts/create-github-release.sh)
+- [`scripts/rollback-latest.sh`](../scripts/rollback-latest.sh)
+- [`doc/PUBLISHING.md`](PUBLISHING.md)
+- [`doc/RELEASE-AUTOMATION-SETUP.md`](RELEASE-AUTOMATION-SETUP.md)
--- a/doc/SPEC-implementation.md
+++ b/doc/SPEC-implementation.md
@@ -330,6 +330,34 @@ Operational policy:
  - `asset_id` uuid fk not null
  - `issue_comment_id` uuid fk null

+## 7.15 `documents` + `document_revisions` + `issue_documents`
+
+- `documents` stores editable text-first documents:
+  - `id` uuid pk
+  - `company_id` uuid fk not null
+  - `title` text null
+  - `format` text not null (`markdown`)
+  - `latest_body` text not null
+  - `latest_revision_id` uuid null
+  - `latest_revision_number` int not null
+  - `created_by_agent_id` uuid fk null
+  - `created_by_user_id` uuid/text fk null
+  - `updated_by_agent_id` uuid fk null
+  - `updated_by_user_id` uuid/text fk null
+- `document_revisions` stores append-only history:
+  - `id` uuid pk
+  - `company_id` uuid fk not null
+  - `document_id` uuid fk not null
+  - `revision_number` int not null
+  - `body` text not null
+  - `change_summary` text null
+- `issue_documents` links documents to issues with a stable workflow key:
+  - `id` uuid pk
+  - `company_id` uuid fk not null
+  - `issue_id` uuid fk not null
+  - `document_id` uuid fk not null
+  - `key` text not null (`plan`, `design`, `notes`, etc.)
+
 ## 8. State Machines

 ## 8.1 Agent Status
@@ -441,6 +469,11 @@ All endpoints are under `/api` and return JSON.
 - `POST /companies/:companyId/issues`
 - `GET /issues/:issueId`
 - `PATCH /issues/:issueId`
+- `GET /issues/:issueId/documents`
+- `GET /issues/:issueId/documents/:key`
+- `PUT /issues/:issueId/documents/:key`
+- `GET /issues/:issueId/documents/:key/revisions`
+- `DELETE /issues/:issueId/documents/:key`
 - `POST /issues/:issueId/checkout`
 - `POST /issues/:issueId/release`
 - `POST /issues/:issueId/comments`
--- a/doc/SPEC.md
+++ b/doc/SPEC.md
@@ -188,12 +188,15 @@ The heartbeat is a protocol, not a runtime. Paperclip defines how to initiate an

 Agent configuration includes an **adapter** that defines how Paperclip invokes the agent. Initial adapters:

-| Adapter   | Mechanism               | Example                                       |
-| --------- | ----------------------- | --------------------------------------------- |
-| `process` | Execute a child process | `python run_agent.py --agent-id {id}`         |
-| `http`    | Send an HTTP request    | `POST https://openclaw.example.com/hook/{id}` |
+| Adapter              | Mechanism               | Example                                       |
+| -------------------- | ----------------------- | --------------------------------------------- |
+| `process`            | Execute a child process | `python run_agent.py --agent-id {id}`         |
+| `http`               | Send an HTTP request    | `POST https://openclaw.example.com/hook/{id}` |
+| `openclaw_gateway`   | OpenClaw gateway API    | Managed OpenClaw agent via gateway             |
+| `gemini_local`       | Gemini CLI process      | Local Gemini CLI with sandbox and approval     |
+| `hermes_local`       | Hermes agent process    | Local Hermes agent                             |

-The `process` and `http` adapters ship as defaults. Additional adapters can be added via the plugin system (see Plugin / Extension Architecture).
+The `process` and `http` adapters ship as defaults. Additional adapters have been added for specific agent runtimes (see list above), and new adapter types can be registered via the plugin system (see Plugin / Extension Architecture).

 ### Adapter Interface

@@ -429,7 +432,7 @@ The core Paperclip system must be extensible. Features like knowledge bases, ext
 - **Agent Adapter plugins** — new Adapter types can be registered via the plugin system
 - Plugin-registrable UI components (future)

-This isn't a V1 deliverable (we're not building a plugin framework upfront), but the architecture should not paint us into a corner. Keep boundaries clean so extensions are possible.
+The plugin framework has shipped. Plugins can register new adapter types, hook into lifecycle events, and contribute UI components (e.g. global toolbar buttons). A plugin SDK and CLI commands (`paperclipai plugin`) are available for authoring and installing plugins.

 ---

--- a/doc/UNTRUSTED-PR-REVIEW.md
+++ b/doc/UNTRUSTED-PR-REVIEW.md
@@ -0,0 +1,135 @@
+# Untrusted PR Review In Docker
+
+Use this workflow when you want Codex or Claude to inspect a pull request that you do not want touching your host machine directly.
+
+This is intentionally separate from the normal Paperclip dev image.
+
+## What this container isolates
+
+- `codex` auth/session state in a Docker volume, not your host `~/.codex`
+- `claude` auth/session state in a Docker volume, not your host `~/.claude`
+- `gh` auth state in the same container-local home volume
+- review clones, worktrees, dependency installs, and local databases in a writable scratch volume under `/work`
+
+By default this workflow does **not** mount your host repo checkout, your host home directory, or your SSH agent.
+
+## Files
+
+- `docker/untrusted-review/Dockerfile`
+- `docker-compose.untrusted-review.yml`
+- `review-checkout-pr` inside the container
+
+## Build and start a shell
+
+```sh
+docker compose -f docker-compose.untrusted-review.yml build
+docker compose -f docker-compose.untrusted-review.yml run --rm --service-ports review
+```
+
+That opens an interactive shell in the review container with:
+
+- Node + Corepack/pnpm
+- `codex`
+- `claude`
+- `gh`
+- `git`, `rg`, `fd`, `jq`
+
+## First-time login inside the container
+
+Run these once. The resulting login state persists in the `review-home` Docker volume.
+
+```sh
+gh auth login
+codex login
+claude login
+```
+
+If you prefer API-key auth instead of CLI login, pass keys through Compose env:
+
+```sh
+OPENAI_API_KEY=... ANTHROPIC_API_KEY=... docker compose -f docker-compose.untrusted-review.yml run --rm review
+```
+
+## Check out a PR safely
+
+Inside the container:
+
+```sh
+review-checkout-pr paperclipai/paperclip 432
+cd /work/checkouts/paperclipai-paperclip/pr-432
+```
+
+What this does:
+
+1. Creates or reuses a repo clone under `/work/repos/...`
+2. Fetches `pull/<pr>/head` from GitHub
+3. Creates a detached git worktree under `/work/checkouts/...`
+
+The checkout lives entirely inside the container volume.
+
+## Ask Codex or Claude to review it
+
+Inside the PR checkout:
+
+```sh
+codex
+```
+
+Then give it a prompt like:
+
+```text
+Review this PR as hostile input. Focus on security issues, data exfiltration paths, sandbox escapes, dangerous install/runtime scripts, auth changes, and subtle behavioral regressions. Do not modify files. Produce findings ordered by severity with file references.
+```
+
+Or with Claude:
+
+```sh
+claude
+```
+
+## Preview the Paperclip app from the PR
+
+Only do this when you intentionally want to execute the PR's code inside the container.
+
+Inside the PR checkout:
+
+```sh
+pnpm install
+HOST=0.0.0.0 pnpm dev
+```
+
+Open from the host:
+
+- `http://localhost:3100`
+
+The Compose file also exposes Vite's default port:
+
+- `http://localhost:5173`
+
+Notes:
+
+- `pnpm install` can run untrusted lifecycle scripts from the PR. That is why this happens inside the isolated container instead of on your host.
+- If you only want static inspection, do not run install/dev commands.
+- Paperclip's embedded PostgreSQL and local storage stay inside the container home volume via `PAPERCLIP_HOME=/home/reviewer/.paperclip-review`.
+
+## Reset state
+
+Remove the review container volumes when you want a clean environment:
+
+```sh
+docker compose -f docker-compose.untrusted-review.yml down -v
+```
+
+That deletes:
+
+- Codex/Claude/GitHub login state stored in `review-home`
+- cloned repos, worktrees, installs, and scratch data stored in `review-work`
+
+## Security limits
+
+This is a useful isolation boundary, but it is still Docker, not a full VM.
+
+- A reviewed PR can still access the container's network unless you disable it.
+- Any secrets you pass into the container are available to code you execute inside it.
+- Do not mount your host repo, host home, `.ssh`, or Docker socket unless you are intentionally weakening the boundary.
+- If you need a stronger boundary than this, use a disposable VM instead of Docker.
--- a/doc/memory-landscape.md
+++ b/doc/memory-landscape.md
@@ -0,0 +1,172 @@
+# Memory Landscape
+
+Date: 2026-03-17
+
+This document summarizes the memory systems referenced in task `PAP-530` and extracts the design patterns that matter for Paperclip.
+
+## What Paperclip Needs From This Survey
+
+Paperclip is not trying to become a single opinionated memory engine. The more useful target is a control-plane memory surface that:
+
+- stays company-scoped
+- lets each company choose a default memory provider
+- lets specific agents override that default
+- keeps provenance back to Paperclip runs, issues, comments, and documents
+- records memory-related cost and latency the same way the rest of the control plane records work
+- works with plugin-provided providers, not only built-ins
+
+The question is not "which memory project wins?" The question is "what is the smallest Paperclip contract that can sit above several very different memory systems without flattening away the useful differences?"
+
+## Quick Grouping
+
+### Hosted memory APIs
+
+- `mem0`
+- `supermemory`
+- `Memori`
+
+These optimize for a simple application integration story: send conversation/content plus an identity, then query for relevant memory or user context later.
+
+### Agent-centric memory frameworks / memory OSes
+
+- `MemOS`
+- `memU`
+- `EverMemOS`
+- `OpenViking`
+
+These treat memory as an agent runtime subsystem, not only as a search index. They usually add task memory, profiles, filesystem-style organization, async ingestion, or skill/resource management.
+
+### Local-first memory stores / indexes
+
+- `nuggets`
+- `memsearch`
+
+These emphasize local persistence, inspectability, and low operational overhead. They are useful because Paperclip is local-first today and needs at least one zero-config path.
+
+## Per-Project Notes
+
+| Project | Shape | Notable API / model | Strong fit for Paperclip | Main mismatch |
+|---|---|---|---|---|
+| [nuggets](https://github.com/NeoVertex1/nuggets) | local memory engine + messaging gateway | topic-scoped HRR memory with `remember`, `recall`, `forget`, fact promotion into `MEMORY.md` | good example of lightweight local memory and automatic promotion | very specific architecture; not a general multi-tenant service |
+| [mem0](https://github.com/mem0ai/mem0) | hosted + OSS SDK | `add`, `search`, `getAll`, `get`, `update`, `delete`, `deleteAll`; entity partitioning via `user_id`, `agent_id`, `run_id`, `app_id` | closest to a clean provider API with identities and metadata filters | provider owns extraction heavily; Paperclip should not assume every backend behaves like mem0 |
+| [MemOS](https://github.com/MemTensor/MemOS) | memory OS / framework | unified add-retrieve-edit-delete, memory cubes, multimodal memory, tool memory, async scheduler, feedback/correction | strong source for optional capabilities beyond plain search | much broader than the minimal contract Paperclip should standardize first |
+| [supermemory](https://github.com/supermemoryai/supermemory) | hosted memory + context API | `add`, `profile`, `search.memories`, `search.documents`, document upload, settings; automatic profile building and forgetting | strong example of "context bundle" rather than raw search results | heavily productized around its own ontology and hosted flow |
+| [memU](https://github.com/NevaMind-AI/memU) | proactive agent memory framework | file-system metaphor, proactive loop, intent prediction, always-on companion model | good source for when memory should trigger agent behavior, not just retrieval | proactive assistant framing is broader than Paperclip's task-centric control plane |
+| [Memori](https://github.com/MemoriLabs/Memori) | hosted memory fabric + SDK wrappers | registers against LLM SDKs, attribution via `entity_id` + `process_id`, sessions, cloud + BYODB | strong example of automatic capture around model clients | wrapper-centric design does not map 1:1 to Paperclip's run / issue / comment lifecycle |
+| [EverMemOS](https://github.com/EverMind-AI/EverMemOS) | conversational long-term memory system | MemCell extraction, structured narratives, user profiles, hybrid retrieval / reranking | useful model for provenance-rich structured memories and evolving profiles | focused on conversational memory rather than generalized control-plane events |
+| [memsearch](https://github.com/zilliztech/memsearch) | markdown-first local memory index | markdown as source of truth, `index`, `search`, `watch`, transcript parsing, plugin hooks | excellent baseline for a local built-in provider and inspectable provenance | intentionally simple; no hosted service semantics or rich correction workflow |
+| [OpenViking](https://github.com/volcengine/OpenViking) | context database | filesystem-style organization of memories/resources/skills, tiered loading, visualized retrieval trajectories | strong source for browse/inspect UX and context provenance | treats "context database" as a larger product surface than Paperclip should own |
+
+## Common Primitives Across The Landscape
+
+Even though the systems disagree on architecture, they converge on a few primitives:
+
+- `ingest`: add memory from text, messages, documents, or transcripts
+- `query`: search or retrieve memory given a task, question, or scope
+- `scope`: partition memory by user, agent, project, process, or session
+- `provenance`: carry enough metadata to explain where a memory came from
+- `maintenance`: update, forget, dedupe, compact, or correct memories over time
+- `context assembly`: turn raw memories into a prompt-ready bundle for the agent
+
+If Paperclip does not expose these, it will not adapt well to the systems above.
+
+## Where The Systems Differ
+
+These differences are exactly why Paperclip needs a layered contract instead of a single hard-coded engine.
+
+### 1. Who owns extraction?
+
+- `mem0`, `supermemory`, and `Memori` expect the provider to infer memories from conversations.
+- `memsearch` expects the host to decide what markdown to write, then indexes it.
+- `MemOS`, `memU`, `EverMemOS`, and `OpenViking` sit somewhere in between and often expose richer memory construction pipelines.
+
+Paperclip should support both:
+
+- provider-managed extraction
+- Paperclip-managed extraction with provider-managed storage / retrieval
+
+### 2. What is the source of truth?
+
+- `memsearch` and `nuggets` make the source inspectable on disk.
+- hosted APIs often make the provider store canonical.
+- filesystem-style systems like `OpenViking` and `memU` treat hierarchy itself as part of the memory model.
+
+Paperclip should not require a single storage shape. It should require normalized references back to Paperclip entities.
+
+### 3. Is memory just search, or also profile and planning state?
+
+- `mem0` and `memsearch` center search and CRUD.
+- `supermemory` adds user profiles as a first-class output.
+- `MemOS`, `memU`, `EverMemOS`, and `OpenViking` expand into tool traces, task memory, resources, and skills.
+
+Paperclip should make plain search the minimum contract and richer outputs optional capabilities.
+
+### 4. Is memory synchronous or asynchronous?
+
+- local tools often work synchronously in-process.
+- larger systems add schedulers, background indexing, compaction, or sync jobs.
+
+Paperclip needs both direct request/response operations and background maintenance hooks.
+
+## Paperclip-Specific Takeaways
+
+### Paperclip should own these concerns
+
+- binding a provider to a company and optionally overriding it per agent
+- mapping Paperclip entities into provider scopes
+- provenance back to issue comments, documents, runs, and activity
+- cost / token / latency reporting for memory work
+- browse and inspect surfaces in the Paperclip UI
+- governance on destructive operations
+
+### Providers should own these concerns
+
+- extraction heuristics
+- embedding / indexing strategy
+- ranking and reranking
+- profile synthesis
+- contradiction resolution and forgetting logic
+- storage engine details
+
+### The control-plane contract should stay small
+
+Paperclip does not need to standardize every feature from every provider. It needs:
+
+- a required portable core
+- optional capability flags for richer providers
+- a way to record provider-native ids and metadata without pretending all providers are equivalent internally
+
+## Recommended Direction
+
+Paperclip should adopt a two-layer memory model:
+
+1. `Memory binding + control plane layer`
+   Paperclip decides which provider key is in effect for a company, agent, or project, and it logs every memory operation with provenance and usage.
+
+2. `Provider adapter layer`
+   A built-in or plugin-supplied adapter turns Paperclip memory requests into provider-specific calls.
+
+The portable core should cover:
+
+- ingest / write
+- search / recall
+- browse / inspect
+- get by provider record handle
+- forget / correction
+- usage reporting
+
+Optional capabilities can cover:
+
+- profile synthesis
+- async ingestion
+- multimodal content
+- tool / resource / skill memory
+- provider-native graph browsing
+
+That is enough to support:
+
+- a local markdown-first baseline similar to `memsearch`
+- hosted services similar to `mem0`, `supermemory`, or `Memori`
+- richer agent-memory systems like `MemOS` or `OpenViking`
+
+without forcing Paperclip itself to become a monolithic memory engine.
--- a/doc/plans/2026-02-16-module-system.md
+++ b/doc/plans/2026-02-16-module-system.md
--- a/doc/plans/2026-02-18-agent-authentication-implementation.md
+++ b/doc/plans/2026-02-18-agent-authentication-implementation.md
--- a/doc/plans/2026-02-18-agent-authentication.md
+++ b/doc/plans/2026-02-18-agent-authentication.md
--- a/doc/plans/2026-02-19-agent-mgmt-followup-plan.md
+++ b/doc/plans/2026-02-19-agent-mgmt-followup-plan.md
--- a/doc/plans/2026-02-19-ceo-agent-creation-and-hiring.md
+++ b/doc/plans/2026-02-19-ceo-agent-creation-and-hiring.md
--- a/doc/plans/2026-02-20-issue-run-orchestration-plan.md
+++ b/doc/plans/2026-02-20-issue-run-orchestration-plan.md
--- a/doc/plans/2026-02-20-storage-system-implementation.md
+++ b/doc/plans/2026-02-20-storage-system-implementation.md
--- a/doc/plans/2026-02-21-humans-and-permissions-implementation.md
+++ b/doc/plans/2026-02-21-humans-and-permissions-implementation.md
--- a/doc/plans/2026-02-21-humans-and-permissions.md
+++ b/doc/plans/2026-02-21-humans-and-permissions.md
--- a/doc/plans/2026-02-23-cursor-cloud-adapter.md
+++ b/doc/plans/2026-02-23-cursor-cloud-adapter.md
--- a/doc/plans/2026-02-23-deployment-auth-mode-consolidation.md
+++ b/doc/plans/2026-02-23-deployment-auth-mode-consolidation.md
--- a/doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md
+++ b/doc/plans/2026-03-10-workspace-strategy-and-git-worktrees.md
--- a/doc/plans/2026-03-11-agent-chat-ui-and-issue-backed-conversations.md
+++ b/doc/plans/2026-03-11-agent-chat-ui-and-issue-backed-conversations.md
--- a/doc/plans/2026-03-13-TOKEN-OPTIMIZATION-PLAN.md
+++ b/doc/plans/2026-03-13-TOKEN-OPTIMIZATION-PLAN.md
@@ -0,0 +1,397 @@
+# Token Optimization Plan
+
+Date: 2026-03-13  
+Related discussion: https://github.com/paperclipai/paperclip/discussions/449
+
+## Goal
+
+Reduce token consumption materially without reducing agent capability, control-plane visibility, or task completion quality.
+
+This plan is based on:
+
+- the current V1 control-plane design
+- the current adapter and heartbeat implementation
+- the linked user discussion
+- local runtime data from the default Paperclip instance on 2026-03-13
+
+## Executive Summary
+
+The discussion is directionally right about two things:
+
+1. We should preserve session and prompt-cache locality more aggressively.
+2. We should separate stable startup instructions from per-heartbeat dynamic context.
+
+But that is not enough on its own.
+
+After reviewing the code and local run data, the token problem appears to have four distinct causes:
+
+1. **Measurement inflation on sessioned adapters.** Some token counters, especially for `codex_local`, appear to be recorded as cumulative session totals instead of per-heartbeat deltas.
+2. **Avoidable session resets.** Task sessions are intentionally reset on timer wakes and manual wakes, which destroys cache locality for common heartbeat paths.
+3. **Repeated context reacquisition.** The `paperclip` skill tells agents to re-fetch assignments, issue details, ancestors, and full comment threads on every heartbeat. The API does not currently offer efficient delta-oriented alternatives.
+4. **Large static instruction surfaces.** Agent instruction files and globally injected skills are reintroduced at startup even when most of that content is unchanged and not needed for the current task.
+
+The correct approach is:
+
+1. fix telemetry so we can trust the numbers
+2. preserve reuse where it is safe
+3. make context retrieval incremental
+4. add session compaction/rotation so long-lived sessions do not become progressively more expensive
+
+## Validated Findings
+
+### 1. Token telemetry is at least partly overstated today
+
+Observed from the local default instance:
+
+- `heartbeat_runs`: 11,360 runs between 2026-02-18 and 2026-03-13
+- summed `usage_json.inputTokens`: `2,272,142,368,952`
+- summed `usage_json.cachedInputTokens`: `2,217,501,559,420`
+
+Those totals are not credible as true per-heartbeat usage for the observed prompt sizes.
+
+Supporting evidence:
+
+- `adapter.invoke.payload.prompt` averages were small:
+  - `codex_local`: ~193 chars average, 6,067 chars max
+  - `claude_local`: ~160 chars average, 1,160 chars max
+- despite that, many `codex_local` runs report millions of input tokens
+- one reused Codex session in local data spans 3,607 runs and recorded `inputTokens` growing up to `1,155,283,166`
+
+Interpretation:
+
+- for sessioned adapters, especially Codex, we are likely storing usage reported by the runtime as a **session total**, not a **per-run delta**
+- this makes trend reporting, optimization work, and customer trust worse
+
+This does **not** mean there is no real token problem. It means we need a trustworthy baseline before we can judge optimization impact.
+
+### 2. Timer wakes currently throw away reusable task sessions
+
+In `server/src/services/heartbeat.ts`, `shouldResetTaskSessionForWake(...)` returns `true` for:
+
+- `wakeReason === "issue_assigned"`
+- `wakeSource === "timer"`
+- manual on-demand wakes
+
+That means many normal heartbeats skip saved task-session resume even when the workspace is stable.
+
+Local data supports the impact:
+
+- `timer/system` runs: 6,587 total
+- only 976 had a previous session
+- only 963 ended with the same session
+
+So timer wakes are the largest heartbeat path and are mostly not resuming prior task state.
+
+### 3. We repeatedly ask agents to reload the same task context
+
+The `paperclip` skill currently tells agents to do this on essentially every heartbeat:
+
+- fetch assignments
+- fetch issue details
+- fetch ancestor chain
+- fetch full issue comments
+
+Current API shape reinforces that pattern:
+
+- `GET /api/issues/:id/comments` returns the full thread
+- there is no `since`, cursor, digest, or summary endpoint for heartbeat consumption
+- `GET /api/issues/:id` returns full enriched issue context, not a minimal delta payload
+
+This is safe but expensive. It forces the model to repeatedly consume unchanged information.
+
+### 4. Static instruction payloads are not separated cleanly from dynamic heartbeat prompts
+
+The user discussion suggested a bootstrap prompt. That is the right direction.
+
+Current state:
+
+- the UI exposes `bootstrapPromptTemplate`
+- adapter execution paths do not currently use it
+- several adapters prepend `instructionsFilePath` content directly into the per-run prompt or system prompt
+
+Result:
+
+- stable instructions are re-sent or re-applied in the same path as dynamic heartbeat content
+- we are not deliberately optimizing for provider prompt caching
+
+### 5. We inject more skill surface than most agents need
+
+Local adapters inject repo skills into runtime skill directories.
+
+Important `codex_local` nuance:
+
+- Codex does not read skills directly from the active worktree.
+- Paperclip discovers repo skills from the current checkout, then symlinks them into `$CODEX_HOME/skills` or `~/.codex/skills`.
+- If an existing Paperclip skill symlink already points at another live checkout, the current implementation skips it instead of repointing it.
+- This can leave Codex using stale skill content from a different worktree even after Paperclip-side skill changes land.
+- That is both a correctness risk and a token-analysis risk, because runtime behavior may not reflect the instructions in the checkout being tested.
+
+Current repo skill sizes:
+
+- `skills/paperclip/SKILL.md`: 17,441 bytes
+- `.agents/skills/create-agent-adapter/SKILL.md`: 31,832 bytes
+- `skills/paperclip-create-agent/SKILL.md`: 4,718 bytes
+- `skills/para-memory-files/SKILL.md`: 3,978 bytes
+
+That is nearly 58 KB of skill markdown before any company-specific instructions.
+
+Not all of that is necessarily loaded into model context every run, but it increases startup surface area and should be treated as a token budget concern.
+
+## Principles
+
+We should optimize tokens under these rules:
+
+1. **Do not lose functionality.** Agents must still be able to resume work safely, understand why tasks exist, and act within governance rules.
+2. **Prefer stable context over repeated context.** Unchanged instructions should not be resent through the most expensive path.
+3. **Prefer deltas over full reloads.** Heartbeats should consume only what changed since the last useful run.
+4. **Measure normalized deltas, not raw adapter claims.** Especially for sessioned CLIs.
+5. **Keep escape hatches.** Board/manual runs may still want a forced fresh session.
+
+## Plan
+
+## Phase 1: Make token telemetry trustworthy
+
+This should happen first.
+
+### Changes
+
+- Store both:
+  - raw adapter-reported usage
+  - Paperclip-normalized per-run usage
+- For sessioned adapters, compute normalized deltas against prior usage for the same persisted session.
+- Add explicit fields for:
+  - `sessionReused`
+  - `taskSessionReused`
+  - `promptChars`
+  - `instructionsChars`
+  - `hasInstructionsFile`
+  - `skillSetHash` or skill count
+  - `contextFetchMode` (`full`, `delta`, `summary`)
+- Add per-adapter parser tests that distinguish cumulative-session counters from per-run counters.
+
+### Why
+
+Without this, we cannot tell whether a reduction came from a real optimization or a reporting artifact.
+
+### Success criteria
+
+- per-run token totals stop exploding on long-lived sessions
+- a resumed session’s usage curve is believable and monotonic at the session level, but not double-counted at the run level
+- cost pages can show both raw and normalized numbers while we migrate
+
+## Phase 2: Preserve safe session reuse by default
+
+This is the highest-leverage behavior change.
+
+### Changes
+
+- Stop resetting task sessions on ordinary timer wakes.
+- Keep resetting on:
+  - explicit manual “fresh run” invocations
+  - assignment changes
+  - workspace mismatch
+  - model mismatch / invalid resume errors
+- Add an explicit wake flag like `forceFreshSession: true` when the board wants a reset.
+- Record why a session was reused or reset in run metadata.
+
+### Why
+
+Timer wakes are the dominant heartbeat path. Resetting them destroys both session continuity and prompt cache reuse.
+
+### Success criteria
+
+- timer wakes resume the prior task session in the large majority of stable-workspace cases
+- no increase in stale-session failures
+- lower normalized input tokens per timer heartbeat
+
+## Phase 3: Separate static bootstrap context from per-heartbeat context
+
+This is the right version of the discussion’s bootstrap idea.
+
+### Changes
+
+- Implement `bootstrapPromptTemplate` in adapter execution paths.
+- Use it only when starting a fresh session, not on resumed sessions.
+- Keep `promptTemplate` intentionally small and stable:
+  - who I am
+  - what triggered this wake
+  - which task/comment/approval to prioritize
+- Move long-lived setup text out of recurring per-run prompts where possible.
+- Add UI guidance and warnings when `promptTemplate` contains high-churn or large inline content.
+
+### Why
+
+Static instructions and dynamic wake context have different cache behavior and should be modeled separately.
+
+For `codex_local`, this also requires isolating the Codex skill home per worktree or teaching Paperclip to repoint its own skill symlinks when the source checkout changes. Otherwise prompt and skill improvements in the active worktree may not reach the running agent.
+
+### Success criteria
+
+- fresh-session prompts can remain richer without inflating every resumed heartbeat
+- resumed prompts become short and structurally stable
+- cache hit rates improve for session-preserving adapters
+
+## Phase 4: Make issue/task context incremental
+
+This is the biggest product change and likely the biggest real token saver after session reuse.
+
+### Changes
+
+Add heartbeat-oriented endpoints and skill behavior:
+
+- `GET /api/agents/me/inbox-lite`
+  - minimal assignment list
+  - issue id, identifier, status, priority, updatedAt, lastExternalCommentAt
+- `GET /api/issues/:id/heartbeat-context`
+  - compact issue state
+  - parent-chain summary
+  - latest execution summary
+  - change markers
+- `GET /api/issues/:id/comments?after=<cursor>` or `?since=<timestamp>`
+  - return only new comments
+- optional `GET /api/issues/:id/context-digest`
+  - server-generated compact summary for heartbeat use
+
+Update the `paperclip` skill so the default pattern becomes:
+
+1. fetch compact inbox
+2. fetch compact task context
+3. fetch only new comments unless this is the first read, a mention-triggered wake, or a cache miss
+4. fetch full thread only on demand
+
+### Why
+
+Today we are using full-fidelity board APIs as heartbeat APIs. That is convenient but token-inefficient.
+
+### Success criteria
+
+- after first task acquisition, most heartbeats consume only deltas
+- repeated blocked-task or long-thread work no longer replays the whole comment history
+- mention-triggered wakes still have enough context to respond correctly
+
+## Phase 5: Add session compaction and controlled rotation
+
+This protects against long-lived session bloat.
+
+### Changes
+
+- Add rotation thresholds per adapter/session:
+  - turns
+  - normalized input tokens
+  - age
+  - cache hit degradation
+- Before rotating, produce a structured carry-forward summary:
+  - current objective
+  - work completed
+  - open decisions
+  - blockers
+  - files/artifacts touched
+  - next recommended action
+- Persist that summary in task session state or runtime state.
+- Start the next session with:
+  - bootstrap prompt
+  - compact carry-forward summary
+  - current wake trigger
+
+### Why
+
+Even when reuse is desirable, some sessions become too expensive to keep alive indefinitely.
+
+### Success criteria
+
+- very long sessions stop growing without bound
+- rotating a session does not cause loss of task continuity
+- successful task completion rate stays flat or improves
+
+## Phase 6: Reduce unnecessary skill surface
+
+### Changes
+
+- Move from “inject all repo skills” to an allowlist per agent or per adapter.
+- Default local runtime skill set should likely be:
+  - `paperclip`
+- Add opt-in skills for specialized agents:
+  - `paperclip-create-agent`
+  - `para-memory-files`
+  - `create-agent-adapter`
+- Expose active skill set in agent config and run metadata.
+- For `codex_local`, either:
+  - run with a worktree-specific `CODEX_HOME`, or
+  - treat Paperclip-owned Codex skill symlinks as repairable when they point at a different checkout
+
+### Why
+
+Most agents do not need adapter-authoring or memory-system skills on every run.
+
+### Success criteria
+
+- smaller startup instruction surface
+- no loss of capability for specialist agents that explicitly need extra skills
+
+## Rollout Order
+
+Recommended order:
+
+1. telemetry normalization
+2. timer-wake session reuse
+3. bootstrap prompt implementation
+4. heartbeat delta APIs + `paperclip` skill rewrite
+5. session compaction/rotation
+6. skill allowlists
+
+## Acceptance Metrics
+
+We should treat this plan as successful only if we improve both efficiency and task outcomes.
+
+Primary metrics:
+
+- normalized input tokens per successful heartbeat
+- normalized input tokens per completed issue
+- cache-hit ratio for sessioned adapters
+- session reuse rate by invocation source
+- fraction of heartbeats that fetch full comment threads
+
+Guardrail metrics:
+
+- task completion rate
+- blocked-task rate
+- stale-session failure rate
+- manual intervention rate
+- issue reopen rate after agent completion
+
+Initial targets:
+
+- 30% to 50% reduction in normalized input tokens per successful resumed heartbeat
+- 80%+ session reuse on stable timer wakes
+- 80%+ reduction in full-thread comment reloads after first task read
+- no statistically meaningful regression in completion rate or failure rate
+
+## Concrete Engineering Tasks
+
+1. Add normalized usage fields and migration support for run analytics.
+2. Patch sessioned adapter accounting to compute deltas from prior session totals.
+3. Change `shouldResetTaskSessionForWake(...)` so timer wakes do not reset by default.
+4. Implement `bootstrapPromptTemplate` end-to-end in adapter execution.
+5. Add compact heartbeat context and incremental comment APIs.
+6. Rewrite `skills/paperclip/SKILL.md` around delta-fetch behavior.
+7. Add session rotation with carry-forward summaries.
+8. Replace global skill injection with explicit allowlists.
+9. Fix `codex_local` skill resolution so worktree-local skill changes reliably reach the runtime.
+
+## Recommendation
+
+Treat this as a two-track effort:
+
+- **Track A: correctness and no-regret wins**
+  - telemetry normalization
+  - timer-wake session reuse
+  - bootstrap prompt implementation
+- **Track B: structural token reduction**
+  - delta APIs
+  - skill rewrite
+  - session compaction
+  - skill allowlists
+
+If we only do Track A, we will improve things, but agents will still re-read too much unchanged task context.
+
+If we only do Track B without fixing telemetry first, we will not be able to prove the gains cleanly.
--- a/doc/plans/2026-03-13-agent-evals-framework.md
+++ b/doc/plans/2026-03-13-agent-evals-framework.md
@@ -0,0 +1,775 @@
+# Agent Evals Framework Plan
+
+Date: 2026-03-13
+
+## Context
+
+We need evals for the thing Paperclip actually ships:
+
+- agent behavior produced by adapter config
+- prompt templates and bootstrap prompts
+- skill sets and skill instructions
+- model choice
+- runtime policy choices that affect outcomes and cost
+
+We do **not** primarily need a fine-tuning pipeline.
+We need a regression framework that can answer:
+
+- if we change prompts or skills, do agents still do the right thing?
+- if we switch models, what got better, worse, or more expensive?
+- if we optimize tokens, did we preserve task outcomes?
+- can we grow the suite over time from real Paperclip usage?
+
+This plan is based on:
+
+- `doc/GOAL.md`
+- `doc/PRODUCT.md`
+- `doc/SPEC-implementation.md`
+- `docs/agents-runtime.md`
+- `doc/plans/2026-03-13-TOKEN-OPTIMIZATION-PLAN.md`
+- Discussion #449: <https://github.com/paperclipai/paperclip/discussions/449>
+- OpenAI eval best practices: <https://developers.openai.com/api/docs/guides/evaluation-best-practices>
+- Promptfoo docs: <https://www.promptfoo.dev/docs/configuration/test-cases/> and <https://www.promptfoo.dev/docs/providers/custom-api/>
+- LangSmith complex agent eval docs: <https://docs.langchain.com/langsmith/evaluate-complex-agent>
+- Braintrust dataset/scorer docs: <https://www.braintrust.dev/docs/annotate/datasets> and <https://www.braintrust.dev/docs/evaluate/write-scorers>
+
+## Recommendation
+
+Paperclip should take a **two-stage approach**:
+
+1. **Start with Promptfoo now** for narrow, prompt-and-skill behavior evals across models.
+2. **Grow toward a first-party, repo-local eval harness in TypeScript** for full Paperclip scenario evals.
+
+So the recommendation is no longer “skip Promptfoo.” It is:
+
+- use Promptfoo as the fastest bootstrap layer
+- keep eval cases and fixtures in this repo
+- avoid making Promptfoo config the deepest long-term abstraction
+
+More specifically:
+
+1. The canonical eval definitions should live in this repo under a top-level `evals/` directory.
+2. `v0` should use Promptfoo to run focused test cases across models and providers.
+3. The longer-term harness should run **real Paperclip scenarios** against seeded companies/issues/agents, not just raw prompt completions.
+4. The scoring model should combine:
+   - deterministic checks
+   - structured rubric scoring
+   - pairwise candidate-vs-baseline judging
+   - efficiency metrics from normalized usage/cost telemetry
+5. The framework should compare **bundles**, not just models.
+
+A bundle is:
+
+- adapter type
+- model id
+- prompt template(s)
+- bootstrap prompt template
+- skill allowlist / skill content version
+- relevant runtime flags
+
+That is the right unit because that is what actually changes behavior in Paperclip.
+
+## Why This Is The Right Shape
+
+### 1. We need to evaluate system behavior, not only prompt output
+
+Prompt-only tools are useful, but Paperclip’s real failure modes are often:
+
+- wrong issue chosen
+- wrong API call sequence
+- bad delegation
+- failure to respect approval boundaries
+- stale session behavior
+- over-reading context
+- claiming completion without producing artifacts or comments
+
+Those are control-plane behaviors. They require scenario setup, execution, and trace inspection.
+
+### 2. The repo is already TypeScript-first
+
+The existing monorepo already uses:
+
+- `pnpm`
+- `tsx`
+- `vitest`
+- TypeScript across server, UI, shared contracts, and adapters
+
+A TypeScript-first harness will fit the repo and CI better than introducing a Python-first test subsystem as the default path.
+
+Python can stay optional later for specialty scorers or research experiments.
+
+### 3. We need provider/model comparison without vendor lock-in
+
+OpenAI’s guidance is directionally right:
+
+- eval early and often
+- use task-specific evals
+- log everything
+- prefer pairwise/comparison-style judging over open-ended scoring
+
+But OpenAI’s Evals API is not the right control plane for Paperclip as the primary system because our target is explicitly multi-model and multi-provider.
+
+### 4. Hosted eval products are useful, and Promptfoo is the right bootstrap tool
+
+The current tradeoff:
+
+- Promptfoo is very good for local, repo-based prompt/provider matrices and CI integration.
+- LangSmith is strong on trajectory-style agent evals.
+- Braintrust has a clean dataset + scorer + experiment model and strong TypeScript support.
+
+The community suggestion is directionally right:
+
+- Promptfoo lets us start small
+- it supports simple assertions like contains / not-contains / regex / custom JS
+- it can run the same cases across multiple models
+- it supports OpenRouter
+- it can move into CI later
+
+That makes it the best `v0` tool for “did this prompt/skill/model change obviously regress?”
+
+But Paperclip should still avoid making a hosted platform or a third-party config format the core abstraction before we have our own stable eval model.
+
+The right move is:
+
+- start with Promptfoo for quick wins
+- keep the data portable and repo-owned
+- build a thin first-party harness around Paperclip concepts as the system grows
+- optionally export to or integrate with other tools later if useful
+
+## What We Should Evaluate
+
+We should split evals into four layers.
+
+### Layer 1: Deterministic contract evals
+
+These should require no judge model.
+
+Examples:
+
+- agent comments on the assigned issue
+- no mutation outside the agent’s company
+- approval-required actions do not bypass approval flow
+- task transitions are legal
+- output contains required structured fields
+- artifact links exist when the task required an artifact
+- no full-thread refetch on delta-only cases once the API supports it
+
+These are cheap, reliable, and should be the first line of defense.
+
+### Layer 2: Single-step behavior evals
+
+These test narrow behaviors in isolation.
+
+Examples:
+
+- chooses the correct issue from inbox
+- writes a reasonable first status comment
+- decides to ask for approval instead of acting directly
+- delegates to the correct report
+- recognizes blocked state and reports it clearly
+
+These are the closest thing to prompt evals, but still framed in Paperclip terms.
+
+### Layer 3: End-to-end scenario evals
+
+These run a full heartbeat or short sequence of heartbeats against a seeded scenario.
+
+Examples:
+
+- new assignment pickup
+- long-thread continuation
+- mention-triggered clarification
+- approval-gated hire request
+- manager escalation
+- workspace coding task that must leave a meaningful issue update
+
+These should evaluate both final state and trace quality.
+
+### Layer 4: Efficiency and regression evals
+
+These are not “did the answer look good?” evals. They are “did we preserve quality while improving cost/latency?” evals.
+
+Examples:
+
+- normalized input tokens per successful heartbeat
+- normalized tokens per completed issue
+- session reuse rate
+- full-thread reload rate
+- wall-clock duration
+- cost per successful scenario
+
+This layer is especially important for token optimization work.
+
+## Core Design
+
+## 1. Canonical object: `EvalCase`
+
+Each eval case should define:
+
+- scenario setup
+- target bundle(s)
+- execution mode
+- expected invariants
+- scoring rubric
+- tags/metadata
+
+Suggested shape:
+
+```ts
+type EvalCase = {
+  id: string;
+  description: string;
+  tags: string[];
+  setup: {
+    fixture: string;
+    agentId: string;
+    trigger: "assignment" | "timer" | "on_demand" | "comment" | "approval";
+  };
+  inputs?: Record<string, unknown>;
+  checks: {
+    hard: HardCheck[];
+    rubric?: RubricCheck[];
+    pairwise?: PairwiseCheck[];
+  };
+  metrics: MetricSpec[];
+};
+```
+
+The important part is that the case is about a Paperclip scenario, not a standalone prompt string.
+
+## 2. Canonical object: `EvalBundle`
+
+Suggested shape:
+
+```ts
+type EvalBundle = {
+  id: string;
+  adapter: string;
+  model: string;
+  promptTemplate: string;
+  bootstrapPromptTemplate?: string;
+  skills: string[];
+  flags?: Record<string, string | number | boolean>;
+};
+```
+
+Every comparison run should say which bundle was tested.
+
+This avoids the common mistake of saying “model X is better” when the real change was model + prompt + skills + runtime behavior.
+
+## 3. Canonical output: `EvalTrace`
+
+We should capture a normalized trace for scoring:
+
+- run ids
+- prompts actually sent
+- session reuse metadata
+- issue mutations
+- comments created
+- approvals requested
+- artifacts created
+- token/cost telemetry
+- timing
+- raw outputs
+
+The scorer layer should never need to scrape ad hoc logs.
+
+## Scoring Framework
+
+## 1. Hard checks first
+
+Every eval should start with pass/fail checks that can invalidate the run immediately.
+
+Examples:
+
+- touched wrong company
+- skipped required approval
+- no issue update produced
+- returned malformed structured output
+- marked task done without required artifact
+
+If a hard check fails, the scenario fails regardless of style or judge score.
+
+## 2. Rubric scoring second
+
+Rubric scoring should use narrow criteria, not vague “how good was this?” prompts.
+
+Good rubric dimensions:
+
+- task understanding
+- governance compliance
+- useful progress communication
+- correct delegation
+- evidence of completion
+- concision / unnecessary verbosity
+
+Each rubric should be a small 0-1 or 0-2 decision, not a mushy 1-10 scale.
+
+## 3. Pairwise judging for candidate vs baseline
+
+OpenAI’s eval guidance is right that LLMs are better at discrimination than open-ended generation.
+
+So for non-deterministic quality checks, the default pattern should be:
+
+- run baseline bundle on the case
+- run candidate bundle on the same case
+- ask a judge model which is better on explicit criteria
+- allow `baseline`, `candidate`, or `tie`
+
+This is better than asking a judge for an absolute quality score with no anchor.
+
+## 4. Efficiency scoring is separate
+
+Do not bury efficiency inside a single blended quality score.
+
+Record it separately:
+
+- quality score
+- cost score
+- latency score
+
+Then compute a summary decision such as:
+
+- candidate is acceptable only if quality is non-inferior and efficiency is improved
+
+That is much easier to reason about than one magic number.
+
+## Suggested Decision Rule
+
+For PR gating:
+
+1. No hard-check regressions.
+2. No significant regression on required scenario pass rate.
+3. No significant regression on key rubric dimensions.
+4. If the change is token-optimization-oriented, require efficiency improvement on target scenarios.
+
+For deeper comparison reports, show:
+
+- pass rate
+- pairwise wins/losses/ties
+- median normalized tokens
+- median wall-clock time
+- cost deltas
+
+## Dataset Strategy
+
+We should explicitly build the dataset from three sources.
+
+### 1. Hand-authored seed cases
+
+Start here.
+
+These should cover core product invariants:
+
+- assignment pickup
+- status update
+- blocked reporting
+- delegation
+- approval request
+- cross-company access denial
+- issue comment follow-up
+
+These are small, clear, and stable.
+
+### 2. Production-derived cases
+
+Per OpenAI’s guidance, we should log everything and mine real usage for eval cases.
+
+Paperclip should grow eval coverage by promoting real runs into cases when we see:
+
+- regressions
+- interesting failures
+- edge cases
+- high-value success patterns worth preserving
+
+The initial version can be manual:
+
+- take a real run
+- redact/normalize it
+- convert it into an `EvalCase`
+
+Later we can automate trace-to-case generation.
+
+### 3. Adversarial and guardrail cases
+
+These should intentionally probe failure modes:
+
+- approval bypass attempts
+- wrong-company references
+- stale context traps
+- irrelevant long threads
+- misleading instructions in comments
+- verbosity traps
+
+This is where promptfoo-style red-team ideas can become useful later, but it is not the first slice.
+
+## Repo Layout
+
+Recommended initial layout:
+
+```text
+evals/
+  README.md
+  promptfoo/
+    promptfooconfig.yaml
+    prompts/
+    cases/
+  cases/
+    core/
+    approvals/
+    delegation/
+    efficiency/
+  fixtures/
+    companies/
+    issues/
+  bundles/
+    baseline/
+    experiments/
+  runners/
+    scenario-runner.ts
+    compare-runner.ts
+  scorers/
+    hard/
+    rubric/
+    pairwise/
+  judges/
+    rubric-judge.ts
+    pairwise-judge.ts
+  lib/
+    types.ts
+    traces.ts
+    metrics.ts
+  reports/
+    .gitignore
+```
+
+Why top-level `evals/`:
+
+- it makes evals feel first-class
+- it avoids hiding them inside `server/` even though they span adapters and runtime behavior
+- it leaves room for both TS and optional Python helpers later
+- it gives us a clean place for Promptfoo `v0` config plus the later first-party runner
+
+## Execution Model
+
+The harness should support three modes.
+
+### Mode A: Cheap local smoke
+
+Purpose:
+
+- run on PRs
+- keep cost low
+- catch obvious regressions
+
+Characteristics:
+
+- 5 to 20 cases
+- 1 or 2 bundles
+- mostly hard checks and narrow rubrics
+
+### Mode B: Candidate vs baseline compare
+
+Purpose:
+
+- evaluate a prompt/skill/model change before merge
+
+Characteristics:
+
+- paired runs
+- pairwise judging enabled
+- quality + efficiency diff report
+
+### Mode C: Nightly broader matrix
+
+Purpose:
+
+- compare multiple models and bundles
+- grow historical benchmark data
+
+Characteristics:
+
+- larger case set
+- multiple models
+- more expensive rubric/pairwise judging
+
+## CI and Developer Workflow
+
+Suggested commands:
+
+```sh
+pnpm evals:smoke
+pnpm evals:compare --baseline baseline/codex-default --candidate experiments/codex-lean-skillset
+pnpm evals:nightly
+```
+
+PR behavior:
+
+- run `evals:smoke` on prompt/skill/adapter/runtime changes
+- optionally trigger `evals:compare` for labeled PRs or manual runs
+
+Nightly behavior:
+
+- run larger matrix
+- save report artifact
+- surface trend lines on pass rate, pairwise wins, and efficiency
+
+## Framework Comparison
+
+## Promptfoo
+
+Best use for Paperclip:
+
+- prompt-level micro-evals
+- provider/model comparison
+- quick local CI integration
+- custom JS assertions and custom providers
+- bootstrap-layer evals for one skill or one agent workflow
+
+What changed in this recommendation:
+
+- Promptfoo is now the recommended **starting point**
+- especially for “one skill, a handful of cases, compare across models”
+
+Why it still should not be the only long-term system:
+
+- its primary abstraction is still prompt/provider/test-case oriented
+- Paperclip needs scenario setup, control-plane state inspection, and multi-step traces as first-class concepts
+
+Recommendation:
+
+- use Promptfoo first
+- store Promptfoo config and cases in-repo under `evals/promptfoo/`
+- use custom JS/TS assertions and, if needed later, a custom provider that calls Paperclip scenario runners
+- do not make Promptfoo YAML the only canonical Paperclip eval format once we outgrow prompt-level evals
+
+## LangSmith
+
+What it gets right:
+
+- final response evals
+- trajectory evals
+- single-step evals
+
+Why not the primary system today:
+
+- stronger fit for teams already centered on LangChain/LangGraph
+- introduces hosted/external workflow gravity before our own eval model is stable
+
+Recommendation:
+
+- copy the trajectory/final/single-step taxonomy
+- do not adopt the platform as the default requirement
+
+## Braintrust
+
+What it gets right:
+
+- TypeScript support
+- clean dataset/task/scorer model
+- production logging to datasets
+- experiment comparison over time
+
+Why not the primary system today:
+
+- still externalizes the canonical dataset and review workflow
+- we are not yet at the maturity where hosted experiment management should define the shape of the system
+
+Recommendation:
+
+- borrow its dataset/scorer/experiment mental model
+- revisit once we want hosted review and experiment history at scale
+
+## OpenAI Evals / Evals API
+
+What it gets right:
+
+- strong eval principles
+- emphasis on task-specific evals
+- continuous evaluation mindset
+
+Why not the primary system:
+
+- Paperclip must compare across models/providers
+- we do not want our primary eval runner coupled to one model vendor
+
+Recommendation:
+
+- use the guidance
+- do not use it as the core Paperclip eval runtime
+
+## First Implementation Slice
+
+The first version should be intentionally small.
+
+## Phase 0: Promptfoo bootstrap
+
+Build:
+
+- `evals/promptfoo/promptfooconfig.yaml`
+- 5 to 10 focused cases for one skill or one agent workflow
+- model matrix using the providers we care about most
+- mostly deterministic assertions:
+  - contains
+  - not-contains
+  - regex
+  - custom JS assertions
+
+Target scope:
+
+- one skill, or one narrow workflow such as assignment pickup / first status update
+- compare a small set of bundles across several models
+
+Success criteria:
+
+- we can run one command and compare outputs across models
+- prompt/skill regressions become visible quickly
+- the team gets signal before building heavier infrastructure
+
+## Phase 1: Skeleton and core cases
+
+Build:
+
+- `evals/` scaffold
+- `EvalCase`, `EvalBundle`, `EvalTrace` types
+- scenario runner for seeded local cases
+- 10 hand-authored core cases
+- hard checks only
+
+Target cases:
+
+- assigned issue pickup
+- write progress comment
+- ask for approval when required
+- respect company boundary
+- report blocked state
+- avoid marking done without artifact/comment evidence
+
+Success criteria:
+
+- a developer can run a local smoke suite
+- prompt/skill changes can fail the suite deterministically
+- Promptfoo `v0` cases either migrate into or coexist with this layer cleanly
+
+## Phase 2: Pairwise and rubric layer
+
+Build:
+
+- rubric scorer interface
+- pairwise judge runner
+- candidate vs baseline compare command
+- markdown/html report output
+
+Success criteria:
+
+- model/prompt bundle changes produce a readable diff report
+- we can tell “better”, “worse”, or “same” on curated scenarios
+
+## Phase 3: Efficiency integration
+
+Build:
+
+- normalized token/cost metrics into eval traces
+- cost and latency comparisons
+- efficiency gates for token optimization work
+
+Dependency:
+
+- this should align with the telemetry normalization work in `2026-03-13-TOKEN-OPTIMIZATION-PLAN.md`
+
+Success criteria:
+
+- quality and efficiency can be judged together
+- token-reduction work no longer relies on anecdotal improvements
+
+## Phase 4: Production-case ingestion
+
+Build:
+
+- tooling to promote real runs into new eval cases
+- metadata tagging
+- failure corpus growth process
+
+Success criteria:
+
+- the eval suite grows from real product behavior instead of staying synthetic
+
+## Initial Case Categories
+
+We should start with these categories:
+
+1. `core.assignment_pickup`
+2. `core.progress_update`
+3. `core.blocked_reporting`
+4. `governance.approval_required`
+5. `governance.company_boundary`
+6. `delegation.correct_report`
+7. `threads.long_context_followup`
+8. `efficiency.no_unnecessary_reloads`
+
+That is enough to start catching the classes of regressions we actually care about.
+
+## Important Guardrails
+
+### 1. Do not rely on judge models alone
+
+Every important scenario needs deterministic checks first.
+
+### 2. Do not gate PRs on a single noisy score
+
+Use pass/fail invariants plus a small number of stable rubric or pairwise checks.
+
+### 3. Do not confuse benchmark score with product quality
+
+The suite must keep growing from real runs, otherwise it will become a toy benchmark.
+
+### 4. Do not evaluate only final output
+
+Trajectory matters for agents:
+
+- did they call the right Paperclip APIs?
+- did they ask for approval?
+- did they communicate progress?
+- did they choose the right issue?
+
+### 5. Do not make the framework vendor-shaped
+
+Our eval model should survive changes in:
+
+- judge provider
+- candidate provider
+- adapter implementation
+- hosted tooling choices
+
+## Open Questions
+
+1. Should the first scenario runner invoke the real server over HTTP, or call services directly in-process?
+   My recommendation: start in-process for speed, then add HTTP-mode coverage once the model stabilizes.
+
+2. Should we support Python scorers in v1?
+   My recommendation: no. Keep v1 all-TypeScript.
+
+3. Should we commit baseline outputs?
+   My recommendation: commit case definitions and bundle definitions, but keep run artifacts out of git.
+
+4. Should we add hosted experiment tracking immediately?
+   My recommendation: no. Revisit after the local harness proves useful.
+
+## Final Recommendation
+
+Start with Promptfoo for immediate, narrow model-and-prompt comparisons, then grow into a first-party `evals/` framework in TypeScript that evaluates **Paperclip scenarios and bundles**, not just prompts.
+
+Use this structure:
+
+- Promptfoo for `v0` bootstrap
+- deterministic hard checks as the foundation
+- rubric and pairwise judging for non-deterministic quality
+- normalized efficiency metrics as a separate axis
+- repo-local datasets that grow from real runs
+
+Use external tools selectively:
+
+- Promptfoo as the initial path for narrow prompt/provider tests
+- Braintrust or LangSmith later if we want hosted experiment management
+
+But keep the canonical eval model inside the Paperclip repo and aligned to Paperclip’s actual control-plane behaviors.
--- a/doc/plans/2026-03-13-features.md
+++ b/doc/plans/2026-03-13-features.md
@@ -0,0 +1,780 @@
+# Feature specs
+
+## 1) Guided onboarding + first-job magic
+
+The repo already has `onboard`, `doctor`, `run`, deployment modes, and even agent-oriented onboarding text/skills endpoints, but there are also current onboarding/auth validation issues and an open “onboard failed” report. That means this is not just polish; it is product-critical. ([GitHub][1])
+
+### Product decision
+
+Replace “configuration-first onboarding” with **interview-first onboarding**.
+
+### What we want
+
+- Ask 3–4 questions up front, not 20 settings.
+- Generate the right path automatically: local solo, shared private, or public cloud.
+- Detect what agent/runtime environment already exists.
+- Make it normal to have Claude/OpenClaw/Codex help complete setup.
+- End onboarding with a **real first task**, not a blank dashboard.
+
+### What we do not want
+
+- Provider jargon before value.
+- “Go find an API key” as the default first instruction.
+- A successful install that still leaves users unsure what to do next.
+
+### Proposed UX
+
+On first run, show an interview:
+
+```ts
+type OnboardingProfile = {
+  useCase: "startup" | "agency" | "internal_team";
+  companySource: "new" | "existing";
+  deployMode: "local_solo" | "shared_private" | "shared_public";
+  autonomyMode: "hands_on" | "hybrid" | "full_auto";
+  primaryRuntime: "claude_code" | "codex" | "openclaw" | "other";
+};
+```
+
+Questions:
+
+1. What are you building?
+2. Is this a new company, an existing company, or a service/agency team?
+3. Are you working solo on one machine, sharing privately with a team, or deploying publicly?
+4. Do you want full auto, hybrid, or tight manual control?
+
+Then Paperclip should:
+
+- detect installed CLIs/providers/subscriptions
+- recommend the matching deployment/auth mode
+- generate a local `onboarding.txt` / LLM handoff prompt
+- offer a button: **“Open this in Claude / copy setup prompt”**
+- create starter objects:
+
+  - company
+  - company goal
+  - CEO
+  - founding engineer or equivalent first report
+  - first suggested task
+
+### Backend / API
+
+- Add `GET /api/onboarding/recommendation`
+- Add `GET /api/onboarding/llm-handoff.txt`
+- Reuse existing invite/onboarding/skills patterns for local-first bootstrap
+- Persist onboarding answers into instance config for later defaults
+
+### Acceptance criteria
+
+- Fresh install with a supported local runtime completes without manual JSON/env editing.
+- User sees first live agent action before leaving onboarding.
+- A blank dashboard is no longer the default post-install state.
+- If a required dependency is missing, the error is prescriptive and fixable from the UI/CLI.
+
+### Non-goals
+
+- Account creation
+- enterprise SSO
+- perfect provider auto-detection for every runtime
+
+---
+
+## 2) Board command surface, not generic chat
+
+There is a real tension here: the transcript says users want “chat with my CEO,” while the public product definition says Paperclip is **not a chatbot** and V1 communication is **tasks + comments only**. At the same time, the repo is already exploring plugin infrastructure and even a chat plugin via plugin SSE streaming. The clean resolution is: **make the core surface conversational, but keep the data model task/thread-centric; reserve full chat as an optional plugin**. ([GitHub][2])
+
+### Product decision
+
+Build a **Command Composer** backed by issues/comments/approvals, not a separate chat subsystem.
+
+### What we want
+
+- “Talk to the CEO” feeling for the user.
+- Every conversation ends up attached to a real company object.
+- Strategy discussion can produce issues, artifacts, and approvals.
+
+### What we do not want
+
+- A blank “chat with AI” home screen disconnected from the org.
+- Yet another agent-chat product.
+
+### Proposed UX
+
+Add a global composer with modes:
+
+```ts
+type ComposerMode = "ask" | "task" | "decision";
+type ThreadScope = "company" | "project" | "issue" | "agent";
+```
+
+Examples:
+
+- On dashboard: “Ask the CEO for a hiring plan” → creates a `strategy` issue/thread scoped to the company.
+- On agent page: “Tell the designer to make this cleaner” → appends an instruction comment to an issue or spawns a new delegated task.
+- On approval page: “Why are you asking to hire?” → appends a board comment to the approval context.
+
+Add issue kinds:
+
+```ts
+type IssueKind = "task" | "strategy" | "question" | "decision";
+```
+
+### Backend / data model
+
+Prefer extending existing `issues` rather than creating `chats`:
+
+- `issues.kind`
+- `issues.scope`
+- optional `issues.target_agent_id`
+- comment metadata: `comment.intent = hint | correction | board_question | board_decision`
+
+### Acceptance criteria
+
+- A user can “ask CEO” from the dashboard and receive a response in a company-scoped thread.
+- From that thread, the user can create/approve tasks with one click.
+- No separate chat database is required for v1 of this feature.
+
+### Non-goals
+
+- consumer chat UX
+- model marketplace
+- general-purpose assistant unrelated to company context
+
+---
+
+## 3) Live org visibility + explainability layer
+
+The core product promise is already visibility and governance, but right now the transcript makes clear that the UI is still too close to raw agent execution. The repo already has org charts, activity, heartbeat runs, costs, and agent detail surfaces; the missing piece is the explanatory layer above them. ([GitHub][1])
+
+### Product decision
+
+Default the UI to **human-readable operational summaries**, with raw logs one layer down.
+
+### What we want
+
+- At company level: “who is active, what are they doing, what is moving between teams”
+- At agent level: “what is the plan, what step is complete, what outputs were produced”
+- At run level: “summary first, transcript second”
+
+### Proposed UX
+
+Company page:
+
+- org chart with live active-state indicators
+- delegation animation between nodes when work moves
+- current open priorities
+- pending approvals
+- burn / budget warning strip
+
+Agent page:
+
+- status card
+- current issue
+- plan checklist
+- latest artifact(s)
+- summary of last run
+- expandable raw trace/logs
+
+Run page:
+
+- **Summary**
+- **Steps**
+- **Raw transcript / tool calls**
+
+### Backend / API
+
+Generate a run view model from current run/activity data:
+
+```ts
+type RunSummary = {
+  runId: string;
+  headline: string;
+  objective: string | null;
+  currentStep: string | null;
+  completedSteps: string[];
+  delegatedTo: { agentId: string; issueId?: string }[];
+  artifactIds: string[];
+  warnings: string[];
+};
+```
+
+Phase 1 can derive this server-side from existing run logs/comments. Persist only if needed later.
+
+### Acceptance criteria
+
+- Board can tell what is happening without reading shell commands.
+- Raw logs are still accessible, but not the default surface.
+- First task / first hire / first completion moments are visibly celebrated.
+
+### Non-goals
+
+- overdesigned animation system
+- perfect semantic summarization before core data quality exists
+
+---
+
+## 4) Artifact system: attachments, file browser, previews
+
+This gap is already showing up in the repo. Storage is present, attachment endpoints exist, but current issues show that attachments are still effectively image-centric and comment attachment rendering is incomplete. At the same time, your transcript wants plans, docs, files, and generated web pages surfaced cleanly. ([GitHub][4])
+
+### Product decision
+
+Introduce a first-class **Artifact** model that unifies:
+
+- uploaded/generated files
+- workspace files of interest
+- preview URLs
+- generated docs/reports
+
+### What we want
+
+- Plans, specs, CSVs, markdown, PDFs, logs, JSON, HTML outputs
+- easy discoverability from the issue/run/company pages
+- a lightweight file browser for project workspaces
+- preview links for generated websites/apps
+
+### What we do not want
+
+- forcing agents to paste everything inline into comments
+- HTML stuffed into comment bodies as a workaround
+- a full web IDE
+
+### Phase 1: fix the obvious gaps
+
+- Accept non-image MIME types for issue attachments
+- Attach files to comments correctly
+- Show file metadata + download/open on issue page
+
+### Phase 2: introduce artifacts
+
+```ts
+type ArtifactKind = "attachment" | "workspace_file" | "preview" | "report_link";
+
+interface Artifact {
+  id: string;
+  companyId: string;
+  issueId?: string;
+  runId?: string;
+  agentId?: string;
+  kind: ArtifactKind;
+  title: string;
+  mimeType?: string;
+  filename?: string;
+  sizeBytes?: number;
+  storageKind: "local_disk" | "s3" | "external_url";
+  contentPath?: string;
+  previewUrl?: string;
+  metadata: Record<string, unknown>;
+}
+```
+
+### UX
+
+Issue page gets a **Deliverables** section:
+
+- Files
+- Reports
+- Preview links
+- Latest generated artifact highlighted at top
+
+Project page gets a **Files** tab:
+
+- folder tree
+- recent changes
+- “Open produced files” shortcut
+
+### Preview handling
+
+For HTML/static outputs:
+
+- local deploy → open local preview URL
+- shared/public deploy → host via configured preview service or static storage
+- preview URL is registered back onto the issue as an artifact
+
+### Acceptance criteria
+
+- Agents can attach `.md`, `.txt`, `.json`, `.csv`, `.pdf`, and `.html`.
+- Users can open/download them from the issue page.
+- A generated static site can be opened from an issue without hunting through the filesystem.
+
+### Non-goals
+
+- browser IDE
+- collaborative docs editor
+- full object-storage admin UI
+
+---
+
+## 5) Shared/cloud deployment + cloud runtimes
+
+The repo already has a clear deployment story in docs: `local_trusted`, `authenticated/private`, and `authenticated/public`, plus Tailscale guidance. The roadmap explicitly calls out cloud agents like Cursor / e2b. That means the next step is not inventing a deployment model; it is making the shared/cloud path canonical and production-usable. ([GitHub][5])
+
+### Product decision
+
+Make **shared/private deploy** and **public/cloud deploy** first-class supported modes, and add **remote runtime drivers** for cloud-executed agents.
+
+### What we want
+
+- one instance a team can actually share
+- local-first path that upgrades to private/public without a mental model change
+- remote agent execution for non-local runtimes
+
+### Proposed architecture
+
+Separate **control plane** from **execution runtime** more explicitly:
+
+```ts
+type RuntimeDriver = "local_process" | "remote_sandbox" | "webhook";
+
+interface ExecutionHandle {
+  externalRunId: string;
+  status: "queued" | "running" | "completed" | "failed" | "cancelled";
+  previewUrl?: string;
+  logsUrl?: string;
+}
+```
+
+First remote driver: `remote_sandbox` for e2b-style execution.
+
+### Deliverables
+
+- canonical deploy recipes:
+
+  - local solo
+  - shared private (Tailscale/private auth)
+  - public cloud (managed Postgres + object storage + public URL)
+
+- runtime health page
+- adapter/runtime capability matrix
+- one official reference deployment path
+
+### UX
+
+New “Deployment” settings page:
+
+- instance mode
+- auth/exposure
+- storage/database status
+- runtime drivers configured
+- health and reachability checks
+
+### Acceptance criteria
+
+- Two humans can log into one authenticated/private instance and use it concurrently.
+- A public deployment can run agents via at least one remote runtime.
+- `doctor` catches missing public/private config and gives concrete fixes.
+
+### Non-goals
+
+- fully managed Paperclip SaaS
+- every possible cloud provider in v1
+
+---
+
+## 6) Multi-human collaboration (minimal, not enterprise RBAC)
+
+This is the biggest deliberate departure from the current V1 spec. Publicly, V1 still says “single human board operator” and puts role-based human granularity out of scope. But the transcript is right that shared use is necessary if Paperclip is going to be real for teams. The key is to do a **minimal collaboration model**, not a giant permission system. ([GitHub][2])
+
+### Product decision
+
+Ship **coarse multi-user company memberships**, not fine-grained enterprise RBAC.
+
+### Proposed roles
+
+```ts
+type CompanyRole = "owner" | "admin" | "operator" | "viewer";
+```
+
+- **owner**: instance/company ownership, user invites, config
+- **admin**: manage org, agents, budgets, approvals
+- **operator**: create/update issues, interact with agents, view artifacts
+- **viewer**: read-only
+
+### Data model
+
+```ts
+interface CompanyMembership {
+  userId: string;
+  companyId: string;
+  role: CompanyRole;
+  invitedByUserId: string;
+  createdAt: string;
+}
+```
+
+Stretch goal later:
+
+- optional project/team scoping
+
+### What we want
+
+- shared dashboard for real teams
+- user attribution in activity log
+- simple invite flow
+- company-level isolation preserved
+
+### What we do not want
+
+- per-field ACLs
+- SCIM/SSO/enterprise admin consoles
+- ten permission toggles per page
+
+### Acceptance criteria
+
+- Team of 3 can use one shared Paperclip instance.
+- Every user action is attributed correctly in activity.
+- Company membership boundaries are enforced.
+- Viewer cannot mutate; operator/admin can.
+
+### Non-goals
+
+- enterprise RBAC
+- cross-company matrix permissions
+- multi-board governance logic in first cut
+
+---
+
+## 7) Auto mode + interrupt/resume
+
+This is a product behavior issue, not a UI nicety. If agents cannot keep working or accept course correction without restarting, the autonomy model feels fake.
+
+### Product decision
+
+Make auto mode and mid-run interruption first-class runtime semantics.
+
+### What we want
+
+- Auto mode that continues until blocked by approvals, budgets, or explicit pause.
+- Mid-run “you missed this” correction without losing session continuity.
+- Clear state when an agent is waiting, blocked, or paused.
+
+### Proposed state model
+
+```ts
+type RunState =
+  | "queued"
+  | "running"
+  | "waiting_approval"
+  | "waiting_input"
+  | "paused"
+  | "completed"
+  | "failed"
+  | "cancelled";
+```
+
+Add board interjections as resumable input events:
+
+```ts
+interface RunMessage {
+  runId: string;
+  authorUserId: string;
+  mode: "hint" | "correction" | "hard_override";
+  body: string;
+  resumeCurrentSession: boolean;
+}
+```
+
+### UX
+
+Buttons on active run:
+
+- Pause
+- Resume
+- Interrupt
+- Abort
+- Restart from scratch
+
+Interrupt opens a small composer that explicitly says:
+
+- continue current session
+- or restart run
+
+### Acceptance criteria
+
+- A board comment can resume an active session instead of spawning a fresh one.
+- Session ID remains stable for “continue” path.
+- UI clearly distinguishes blocked vs. waiting vs. paused.
+
+### Non-goals
+
+- simultaneous multi-user live editing of the same run transcript
+- perfect conversational UX before runtime semantics are fixed
+
+---
+
+## 8) Cost safety + heartbeat/runtime hardening
+
+This is probably the most important immediate workstream. The transcript says token burn is the highest pain, and the repo currently has active issues around budget enforcement evidence, onboarding/auth validation, and circuit-breaker style waste prevention. Public docs already promise hard budgets, and the issue tracker is pointing at the missing operational protections. ([GitHub][6])
+
+### Product decision
+
+Treat this as a **P0 runtime contract**, not a nice-to-have.
+
+### Part A: deterministic wake gating
+
+Do cheap, explicit work detection before invoking an LLM.
+
+```ts
+type WakeReason =
+  | "new_assignment"
+  | "new_comment"
+  | "mention"
+  | "approval_resolved"
+  | "scheduled_scan"
+  | "manual";
+```
+
+Rules:
+
+- if no new actionable input exists, do not call the model
+- scheduled scan should be a cheap policy check first, not a full reasoning pass
+
+### Part B: budget contract
+
+Keep the existing public promise, but make it undeniable:
+
+- warning at 80%
+- auto-pause at 100%
+- visible audit trail
+- explicit board override to continue
+
+### Part C: circuit breaker
+
+Add per-agent runtime guards:
+
+```ts
+interface CircuitBreakerConfig {
+  enabled: boolean;
+  maxConsecutiveNoProgress: number;
+  maxConsecutiveFailures: number;
+  tokenVelocityMultiplier: number;
+}
+```
+
+Trip when:
+
+- no issue/status/comment progress for N runs
+- N failures in a row
+- token spike vs rolling average
+
+### Part D: refactor heartbeat service
+
+Split current orchestration into modules:
+
+- wake detector
+- checkout/lock manager
+- adapter runner
+- session manager
+- cost recorder
+- breaker evaluator
+- event streamer
+
+### Part E: regression suite
+
+Mandatory automated proofs for:
+
+- onboarding/auth matrix
+- 80/100 budget behavior
+- no cross-company auth leakage
+- no-spurious-wake idle behavior
+- active-run resume/interruption
+- remote runtime smoke
+
+### Acceptance criteria
+
+- Idle org with no new work does not generate model calls from heartbeat scans.
+- 80% shows warning only.
+- 100% pauses the agent and blocks continued execution until override.
+- Circuit breaker pause is visible in audit/activity.
+- Runtime modules have explicit contracts and are testable independently.
+
+### Non-goals
+
+- perfect autonomous optimization
+- eliminating all wasted calls in every adapter/provider
+
+---
+
+## 9) Project workspaces, previews, and PR handoff — without becoming GitHub
+
+This is the right way to resolve the code-workflow debate. The repo already has worktree-local instances, project `workspaceStrategy.provisionCommand`, and an RFC for adapter-level git worktree isolation. That is the correct architectural direction: **project execution policies and workspace isolation**, not built-in PR review. ([GitHub][7])
+
+### Product decision
+
+Paperclip should manage the **issue → workspace → preview/PR → review handoff** lifecycle, but leave diffs/review/merge to external tools.
+
+### Proposed config
+
+Prefer repo-local project config:
+
+```yaml
+# .paperclip/project.yml
+execution:
+  workspaceStrategy: shared | worktree | ephemeral_container
+  deliveryMode: artifact | preview | pull_request
+  provisionCommand: "pnpm install"
+  teardownCommand: "pnpm clean"
+  preview:
+    command: "pnpm dev --port $PAPERCLIP_PREVIEW_PORT"
+    healthPath: "/"
+    ttlMinutes: 120
+  vcs:
+    provider: github
+    repo: owner/repo
+    prPerIssue: true
+    baseBranch: main
+```
+
+### Rules
+
+- For non-code projects: `deliveryMode=artifact`
+- For UI/app work: `deliveryMode=preview`
+- For git-backed engineering projects: `deliveryMode=pull_request`
+- For git-backed projects with `prPerIssue=true`, one issue maps to one isolated branch/worktree
+
+### UX
+
+Issue page shows:
+
+- workspace link/status
+- preview URL if available
+- PR URL if created
+- “Reopen preview” button with TTL
+- lifecycle:
+
+  - `todo`
+  - `in_progress`
+  - `in_review`
+  - `done`
+
+### What we want
+
+- safe parallel agent work on one repo
+- previewable output
+- external PR review
+- project-defined hooks, not hardcoded assumptions
+
+### What we do not want
+
+- built-in diff viewer
+- merge queue
+- Jira clone
+- mandatory PRs for non-code work
+
+### Acceptance criteria
+
+- Multiple engineer agents can work concurrently without workspace contamination.
+- When a project is in PR mode, the issue contains branch/worktree/preview/PR metadata.
+- Preview can be reopened on demand until TTL expires.
+
+### Non-goals
+
+- replacing GitHub/GitLab
+- universal preview hosting for every framework on day one
+
+---
+
+## 10) Plugin system as the escape hatch
+
+The roadmap already includes plugins, GitHub discussions are active around it, and there is an open issue proposing an SSE bridge specifically to enable streaming plugin UIs such as chat, logs, and monitors. This is exactly the right place for optional surfaces. ([GitHub][1])
+
+### Product decision
+
+Keep the control-plane core thin; put optional high-variance experiences into plugins.
+
+### First-party plugin targets
+
+- Chat
+- Knowledge base / RAG
+- Log tail / live build output
+- Custom tracing or queues
+- Doc editor / proposal builder
+
+### Plugin manifest
+
+```ts
+interface PluginManifest {
+  id: string;
+  version: string;
+  requestedPermissions: (
+    | "read_company"
+    | "read_issue"
+    | "write_issue_comment"
+    | "create_issue"
+    | "stream_ui"
+  )[];
+  surfaces: ("company_home" | "issue_panel" | "agent_panel" | "sidebar")[];
+  workerEntry: string;
+  uiEntry: string;
+}
+```
+
+### Platform requirements
+
+- host ↔ worker action bridge
+- SSE/UI streaming
+- company-scoped auth
+- permission declaration
+- surface slots in UI
+
+### Acceptance criteria
+
+- A plugin can stream events to UI in real time.
+- A chat plugin can converse without requiring chat to become the core Paperclip product.
+- Plugin permissions are company-scoped and auditable.
+
+### Non-goals
+
+- plugins mutating core schema directly
+- arbitrary privileged code execution without explicit permissions
+
+---
+
+## Priority order I would use
+
+Given the repo state and the transcript, I would sequence it like this:
+
+**P0**
+
+1. Cost safety + heartbeat hardening
+2. Guided onboarding + first-job magic
+3. Shared/cloud deployment foundation
+4. Artifact phase 1: non-image attachments + deliverables surfacing
+
+**P1** 5. Board command surface 6. Visibility/explainability layer 7. Auto mode + interrupt/resume 8. Minimal multi-user collaboration
+
+**P2** 9. Project workspace / preview / PR lifecycle 10. Plugin system + optional chat plugin 11. Template/preset expansion for startup vs agency vs internal-team onboarding
+
+Why this order: the current repo is already getting pressure on onboarding failures, auth/onboarding validation, budget enforcement, and wasted token burn. If those are shaky, everything else feels impressive but unsafe. ([GitHub][3])
+
+## Bottom line
+
+The best synthesis is:
+
+- **Keep** Paperclip as the board-level control plane.
+- **Do not** make chat, code review, or workflow-building the core identity.
+- **Do** make the product feel conversational, visible, output-oriented, and shared.
+- **Do** make coding workflows an integration surface via workspaces/previews/PR links.
+- **Use plugins** for richer edges like chat and knowledge.
+
+That keeps the repo’s current product direction intact while solving almost every pain surfaced in the transcript.
+
+### Key references
+
+- README / positioning / roadmap / product boundaries. ([GitHub][1])
+- Product definition. ([GitHub][8])
+- V1 implementation spec and explicit non-goals. ([GitHub][2])
+- Core concepts and architecture. ([GitHub][9])
+- Deployment modes / Tailscale / local-to-cloud path. ([GitHub][5])
+- Developing guide: worktree-local instances, provision hooks, onboarding endpoints. ([GitHub][7])
+- Current issue pressure: onboarding failure, auth/onboarding validation, budget enforcement, circuit breaker, attachment gaps, plugin chat. ([GitHub][3])
+
+[1]: https://github.com/paperclipai/paperclip "https://github.com/paperclipai/paperclip"
+[2]: https://github.com/paperclipai/paperclip/blob/master/doc/SPEC-implementation.md "https://github.com/paperclipai/paperclip/blob/master/doc/SPEC-implementation.md"
+[3]: https://github.com/paperclipai/paperclip/issues/704 "https://github.com/paperclipai/paperclip/issues/704"
+[4]: https://github.com/paperclipai/paperclip/blob/master/docs/deploy/tailscale-private-access.md "https://github.com/paperclipai/paperclip/blob/master/docs/deploy/tailscale-private-access.md"
+[5]: https://github.com/paperclipai/paperclip/blob/master/docs/deploy/deployment-modes.md "https://github.com/paperclipai/paperclip/blob/master/docs/deploy/deployment-modes.md"
+[6]: https://github.com/paperclipai/paperclip/issues/692 "https://github.com/paperclipai/paperclip/issues/692"
+[7]: https://github.com/paperclipai/paperclip/blob/master/doc/DEVELOPING.md "https://github.com/paperclipai/paperclip/blob/master/doc/DEVELOPING.md"
+[8]: https://github.com/paperclipai/paperclip/blob/master/doc/PRODUCT.md "https://github.com/paperclipai/paperclip/blob/master/doc/PRODUCT.md"
+[9]: https://github.com/paperclipai/paperclip/blob/master/docs/start/core-concepts.md "https://github.com/paperclipai/paperclip/blob/master/docs/start/core-concepts.md"
--- a/doc/plans/2026-03-13-paperclip-skill-tightening-plan.md
+++ b/doc/plans/2026-03-13-paperclip-skill-tightening-plan.md
@@ -0,0 +1,186 @@
+# Paperclip Skill Tightening Plan
+
+## Status
+
+Deferred follow-up. Do not include in the current token-optimization PR beyond documenting the plan.
+
+## Why This Is Deferred
+
+The `paperclip` skill is part of the critical control-plane safety surface. Tightening it may reduce fresh-session token use, but it also carries prompt-regression risk. We do not yet have evals that would let us safely prove behavior preservation across assignment handling, checkout rules, comment etiquette, approval workflows, and escalation paths.
+
+The current PR should ship the lower-risk infrastructure wins first:
+
+- telemetry normalization
+- safe session reuse
+- incremental issue/comment context
+- bootstrap versus heartbeat prompt separation
+- Codex worktree isolation
+
+## Current Problem
+
+Fresh runs still spend substantial input tokens even after the context-path fixes. The remaining large startup cost appears to come from loading the full `paperclip` skill and related instruction surface into context at run start.
+
+The skill currently mixes three kinds of content in one file:
+
+- hot-path heartbeat procedure used on nearly every run
+- critical policy and safety invariants
+- rare workflow/reference material that most runs do not need
+
+That structure is safe but expensive.
+
+## Goals
+
+- reduce first-run instruction tokens without weakening agent safety
+- preserve all current Paperclip control-plane capabilities
+- keep common heartbeat behavior explicit and easy for agents to follow
+- move rare workflows and reference material out of the hot path
+- create a structure that can later be evaluated systematically
+
+## Non-Goals
+
+- changing Paperclip API semantics
+- removing required governance rules
+- deleting rare workflows
+- changing agent defaults in the current PR
+
+## Recommended Direction
+
+### 1. Split Hot Path From Lookup Material
+
+Restructure the skill into:
+
+- an always-loaded core section for the common heartbeat loop
+- on-demand material for infrequent workflows and deep reference
+
+The core should cover only what is needed on nearly every wake:
+
+- auth and required headers
+- inbox-first assignment retrieval
+- mandatory checkout behavior
+- `heartbeat-context` first
+- incremental comment retrieval rules
+- mention/self-assign exception
+- blocked-task dedup
+- status/comment/release expectations before exit
+
+### 2. Normalize The Skill Around One Canonical Procedure
+
+The same rules are currently expressed multiple times across:
+
+- heartbeat steps
+- critical rules
+- endpoint reference
+- workflow examples
+
+Refactor so each operational fact has one primary home:
+
+- procedure
+- invariant list
+- appendix/reference
+
+This reduces prompt weight and lowers the chance of internal instruction drift.
+
+### 3. Compress Prose Into High-Signal Instruction Forms
+
+Rewrite the hot path using compact operational forms:
+
+- short ordered checklist
+- flat invariant list
+- minimal examples only where ambiguity would be risky
+
+Reduce:
+
+- narrative explanation
+- repeated warnings already covered elsewhere
+- large example payloads for common operations
+- long endpoint matrices in the main body
+
+### 4. Move Rare Workflows Behind Explicit Triggers
+
+These workflows should remain available but should not dominate fresh-run context:
+
+- OpenClaw invite flow
+- project setup flow
+- planning `<plan/>` writeback flow
+- instructions-path update flow
+- detailed link-formatting examples
+
+Recommended approach:
+
+- keep a short pointer in the main skill
+- move detailed procedures into sibling skills or referenced docs that agents read only when needed
+
+### 5. Separate Policy From Reference
+
+The skill should distinguish:
+
+- mandatory operating rules
+- endpoint lookup/reference
+- business-process playbooks
+
+That separation makes it easier to evaluate prompt changes later and lets adapters or orchestration choose what must always be loaded.
+
+## Proposed Target Structure
+
+1. Purpose and authentication
+2. Compact heartbeat procedure
+3. Hard invariants
+4. Required comment/update style
+5. Triggered workflow index
+6. Appendix/reference
+
+## Rollout Plan
+
+### Phase 1. Inventory And Measure
+
+- annotate the current skill by section and estimate token weight
+- identify which sections are truly hot-path versus rare
+- capture representative runs to compare before/after prompt size and behavior
+
+### Phase 2. Structural Refactor Without Semantic Changes
+
+- rewrite the main skill into the target structure
+- preserve all existing rules and capabilities
+- move rare workflow details into referenced companion material
+- keep wording changes conservative
+
+### Phase 3. Validate Against Real Scenarios
+
+Run scenario checks for:
+
+- normal assigned heartbeat
+- comment-triggered wake
+- blocked-task dedup behavior
+- approval-resolution wake
+- delegation/subtask creation
+- board handoff back to user
+- plan-request handling
+
+### Phase 4. Decide Default Loading Strategy
+
+After validation, decide whether:
+
+- the entire main skill still loads by default, or
+- only the compact core loads by default and rare sections are fetched on demand
+
+Do not change this loading policy without validation.
+
+## Risks
+
+- prompt degradation on control-plane safety rules
+- agents forgetting rare but important workflows
+- accidental removal of repeated wording that was carrying useful behavior
+- introducing ambiguous instruction precedence between the core skill and companion materials
+
+## Preconditions Before Implementation
+
+- define acceptance scenarios for control-plane correctness
+- add at least lightweight eval or scripted scenario coverage for key Paperclip flows
+- confirm how adapter/bootstrap layering should load skill content versus references
+
+## Success Criteria
+
+- materially lower first-run input tokens for Paperclip-coordinated agents
+- no regression in checkout discipline, issue updates, blocked handling, or delegation
+- no increase in malformed API usage or ownership mistakes
+- agents still complete rare workflows correctly when explicitly asked
--- a/doc/plans/2026-03-13-plugin-kitchen-sink-example.md
+++ b/doc/plans/2026-03-13-plugin-kitchen-sink-example.md
@@ -0,0 +1,699 @@
+# Kitchen Sink Plugin Plan
+
+## Goal
+
+Add a new first-party example plugin, `Kitchen Sink (Example)`, that demonstrates every currently implemented Paperclip plugin API surface in one place.
+
+This plugin is meant to be:
+
+- a living reference implementation for contributors
+- a manual test harness for the plugin runtime
+- a discoverable demo of what plugins can actually do today
+
+It is not meant to be a polished end-user product plugin.
+
+## Why
+
+The current plugin system has a real API surface, but it is spread across:
+
+- SDK docs
+- SDK types
+- plugin spec prose
+- two example plugins that each show only a narrow slice
+
+That makes it hard to answer basic questions like:
+
+- what can plugins render?
+- what can plugin workers actually do?
+- which surfaces are real versus aspirational?
+- how should a new plugin be structured in this repo?
+
+The kitchen-sink plugin should answer those questions by example.
+
+## Success Criteria
+
+The plugin is successful if a contributor can install it and, without reading the SDK first, discover and exercise the current plugin runtime surface area from inside Paperclip.
+
+Concretely:
+
+- it installs from the bundled examples list
+- it exposes at least one demo for every implemented worker API surface
+- it exposes at least one demo for every host-mounted UI surface
+- it clearly labels local-only / trusted-only demos
+- it is safe enough for local development by default
+- it doubles as a regression harness for plugin runtime changes
+
+## Constraints
+
+- Keep it instance-installed, not company-installed.
+- Treat this as a trusted/local example plugin.
+- Do not rely on cloud-safe runtime assumptions.
+- Avoid destructive defaults.
+- Avoid irreversible mutations unless they are clearly labeled and easy to undo.
+
+## Source Of Truth For This Plan
+
+This plan is based on the currently implemented SDK/types/runtime, not only the long-horizon spec.
+
+Primary references:
+
+- `packages/plugins/sdk/README.md`
+- `packages/plugins/sdk/src/types.ts`
+- `packages/plugins/sdk/src/ui/types.ts`
+- `packages/shared/src/constants.ts`
+- `packages/shared/src/types/plugin.ts`
+
+## Current Surface Inventory
+
+### Worker/runtime APIs to demonstrate
+
+These are the concrete `ctx` clients currently exposed by the SDK:
+
+- `ctx.config`
+- `ctx.events`
+- `ctx.jobs`
+- `ctx.launchers`
+- `ctx.http`
+- `ctx.secrets`
+- `ctx.assets`
+- `ctx.activity`
+- `ctx.state`
+- `ctx.entities`
+- `ctx.projects`
+- `ctx.companies`
+- `ctx.issues`
+- `ctx.agents`
+- `ctx.goals`
+- `ctx.data`
+- `ctx.actions`
+- `ctx.streams`
+- `ctx.tools`
+- `ctx.metrics`
+- `ctx.logger`
+
+### UI surfaces to demonstrate
+
+Surfaces defined in the SDK:
+
+- `page`
+- `settingsPage`
+- `dashboardWidget`
+- `sidebar`
+- `sidebarPanel`
+- `detailTab`
+- `taskDetailView`
+- `projectSidebarItem`
+- `toolbarButton`
+- `contextMenuItem`
+- `commentAnnotation`
+- `commentContextMenuItem`
+
+### Current host confidence
+
+Confirmed or strongly indicated as mounted in the current app:
+
+- `page`
+- `settingsPage`
+- `dashboardWidget`
+- `detailTab`
+- `projectSidebarItem`
+- comment surfaces
+- launcher infrastructure
+
+Need explicit validation before claiming full demo coverage:
+
+- `sidebar`
+- `sidebarPanel`
+- `taskDetailView`
+- `toolbarButton` as direct slot, distinct from launcher placement
+- `contextMenuItem` as direct slot, distinct from comment menu and launcher placement
+
+The implementation should keep a small validation checklist for these before we call the plugin "complete".
+
+## Plugin Concept
+
+The plugin should be named:
+
+- display name: `Kitchen Sink (Example)`
+- package: `@paperclipai/plugin-kitchen-sink-example`
+- plugin id: `paperclip.kitchen-sink-example` or `paperclip-kitchen-sink-example`
+
+Recommendation: use `paperclip-kitchen-sink-example` to match current in-repo example naming style.
+
+Category mix:
+
+- `ui`
+- `automation`
+- `workspace`
+- `connector`
+
+That is intentionally broad because the point is coverage.
+
+## UX Shape
+
+The plugin should have one main full-page demo console plus smaller satellites on other surfaces.
+
+### 1. Plugin page
+
+Primary route: the plugin `page` surface should be the central dashboard for all demos.
+
+Recommended page sections:
+
+- `Overview`
+  - what this plugin demonstrates
+  - current capabilities granted
+  - current host context
+- `UI Surfaces`
+  - links explaining where each other surface should appear
+- `Data + Actions`
+  - buttons and forms for bridge-driven worker demos
+- `Events + Streams`
+  - emit event
+  - watch event log
+  - stream demo output
+- `Paperclip Domain APIs`
+  - companies
+  - projects/workspaces
+  - issues
+  - goals
+  - agents
+- `Local Workspace + Process`
+  - file listing
+  - file read/write scratch area
+  - child process demo
+- `Jobs + Webhooks + Tools`
+  - job status
+  - webhook URL and recent deliveries
+  - declared tools
+- `State + Entities + Assets`
+  - scoped state editor
+  - plugin entity inspector
+  - upload/generated asset demo
+- `Observability`
+  - metrics written
+  - activity log samples
+  - latest worker logs
+
+### 2. Dashboard widget
+
+A compact widget on the main dashboard should show:
+
+- plugin health
+- count of demos exercised
+- recent event/stream activity
+- shortcut to the full plugin page
+
+### 3. Project sidebar item
+
+Add a `Kitchen Sink` link under each project that deep-links into a project-scoped plugin tab.
+
+### 4. Detail tabs
+
+Use detail tabs to demonstrate entity-context rendering on:
+
+- `project`
+- `issue`
+- `agent`
+- `goal`
+
+Each tab should show:
+
+- the host context it received
+- the relevant entity fetch via worker bridge
+- one small action scoped to that entity
+
+### 5. Comment surfaces
+
+Use issue comment demos to prove comment-specific extension points:
+
+- `commentAnnotation`
+  - render parsed metadata below each comment
+  - show comment id, issue id, and a small derived status
+- `commentContextMenuItem`
+  - add a menu action like `Copy Context To Kitchen Sink`
+  - action writes a plugin entity or state record for later inspection
+
+### 6. Settings page
+
+Custom `settingsPage` should be intentionally simple and operational:
+
+- `About`
+- `Danger / Trust Model`
+- demo toggles
+- local process defaults
+- workspace scratch-path behavior
+- secret reference inputs
+- event/job/webhook sample config
+
+This plugin should also keep the generic plugin settings `Status` tab useful by writing health, logs, and metrics.
+
+## Feature Matrix
+
+Each implemented worker API should have a visible demo.
+
+### `ctx.config`
+
+Demo:
+
+- read live config
+- show config JSON
+- react to config changes without restart where possible
+
+### `ctx.events`
+
+Demos:
+
+- emit a plugin event
+- subscribe to plugin events
+- subscribe to a core Paperclip event such as `issue.created`
+- show recent received events in a timeline
+
+### `ctx.jobs`
+
+Demos:
+
+- one scheduled heartbeat-style demo job
+- one manual run button from the UI if host supports manual job trigger
+- show last run result and timestamps
+
+### `ctx.launchers`
+
+Demos:
+
+- declare launchers in manifest
+- optionally register one runtime launcher from the worker
+- show launcher metadata on the plugin page
+
+### `ctx.http`
+
+Demo:
+
+- make a simple outbound GET request to a safe endpoint
+- show status code, latency, and JSON result
+
+Recommendation: default to a Paperclip-local endpoint or a stable public echo endpoint to avoid flaky docs.
+
+### `ctx.secrets`
+
+Demo:
+
+- operator enters a secret reference in config
+- plugin resolves it on demand
+- UI only shows masked result length / success status, never raw secret
+
+### `ctx.assets`
+
+Demos:
+
+- generate a text asset from the UI
+- optionally upload a tiny JSON blob or screenshot-like text file
+- show returned asset URL
+
+### `ctx.activity`
+
+Demo:
+
+- button to write a plugin activity log entry against current company/entity
+
+### `ctx.state`
+
+Demos:
+
+- instance-scoped state
+- company-scoped state
+- project-scoped state
+- issue-scoped state
+- delete/reset controls
+
+Use a small state inspector/editor on the plugin page.
+
+### `ctx.entities`
+
+Demos:
+
+- create plugin-owned sample records
+- list/filter them
+- show one realistic use case such as "copied comments" or "demo sync records"
+
+### `ctx.projects`
+
+Demos:
+
+- list projects
+- list project workspaces
+- resolve primary workspace
+- resolve workspace for issue
+
+### `ctx.companies`
+
+Demo:
+
+- list companies and show current selected company
+
+### `ctx.issues`
+
+Demos:
+
+- list issues in current company
+- create issue
+- update issue status/title
+- list comments
+- create comment
+
+### `ctx.agents`
+
+Demos:
+
+- list agents
+- invoke one agent with a test prompt
+- pause/resume where safe
+
+Agent mutation controls should be behind an explicit warning.
+
+### `ctx.agents.sessions`
+
+Demos:
+
+- create agent chat session
+- send message
+- stream events back to the UI
+- close session
+
+This is a strong candidate for the best "wow" demo on the plugin page.
+
+### `ctx.goals`
+
+Demos:
+
+- list goals
+- create goal
+- update status/title
+
+### `ctx.data`
+
+Use throughout the plugin for all read-side bridge demos.
+
+### `ctx.actions`
+
+Use throughout the plugin for all mutation-side bridge demos.
+
+### `ctx.streams`
+
+Demos:
+
+- live event log stream
+- token-style stream from an agent session relay
+- fake progress stream for a long-running action
+
+### `ctx.tools`
+
+Demos:
+
+- declare 2-3 simple agent tools
+- tool 1: echo/diagnostics
+- tool 2: project/workspace summary
+- tool 3: create issue or write plugin state
+
+The plugin page should list declared tools and show example input payloads.
+
+### `ctx.metrics`
+
+Demo:
+
+- write a sample metric on each major demo action
+- surface a small recent metrics table in the plugin page
+
+### `ctx.logger`
+
+Demo:
+
+- every action logs structured entries
+- plugin settings `Status` page then doubles as the log viewer
+
+## Local Workspace And Process Demos
+
+The plugin SDK intentionally leaves file/process operations to the plugin itself once it has workspace metadata.
+
+The kitchen-sink plugin should demonstrate that explicitly.
+
+### Workspace demos
+
+- list files from a selected workspace
+- read a file
+- write to a plugin-owned scratch file
+- optionally search files with `rg` if available
+
+### Process demos
+
+- run a short-lived command like `pwd`, `ls`, or `git status`
+- stream stdout/stderr back to UI
+- show exit code and timing
+
+Important safeguards:
+
+- default commands must be read-only
+- no shell interpolation from arbitrary free-form input in v1
+- provide a curated command list or a strongly validated command form
+- clearly label this area as local-only and trusted-only
+
+## Proposed Manifest Coverage
+
+The plugin should aim to declare:
+
+- `page`
+- `settingsPage`
+- `dashboardWidget`
+- `detailTab` for `project`, `issue`, `agent`, `goal`
+- `projectSidebarItem`
+- `commentAnnotation`
+- `commentContextMenuItem`
+
+Then, after host validation, add if supported:
+
+- `sidebar`
+- `sidebarPanel`
+- `taskDetailView`
+- `toolbarButton`
+- `contextMenuItem`
+
+It should also declare one or more `ui.launchers` entries to exercise launcher behavior independently of slot rendering.
+
+## Proposed Package Layout
+
+New package:
+
+- `packages/plugins/examples/plugin-kitchen-sink-example/`
+
+Expected files:
+
+- `package.json`
+- `README.md`
+- `tsconfig.json`
+- `src/index.ts`
+- `src/manifest.ts`
+- `src/worker.ts`
+- `src/ui/index.tsx`
+- `src/ui/components/...`
+- `src/ui/hooks/...`
+- `src/lib/...`
+- optional `scripts/build-ui.mjs` if UI bundling needs esbuild
+
+## Proposed Internal Architecture
+
+### Worker modules
+
+Recommended split:
+
+- `src/worker.ts`
+  - plugin definition and wiring
+- `src/worker/data.ts`
+  - `ctx.data.register(...)`
+- `src/worker/actions.ts`
+  - `ctx.actions.register(...)`
+- `src/worker/events.ts`
+  - event subscriptions and event log buffer
+- `src/worker/jobs.ts`
+  - scheduled job handlers
+- `src/worker/tools.ts`
+  - tool declarations and handlers
+- `src/worker/local-runtime.ts`
+  - file/process demos
+- `src/worker/demo-store.ts`
+  - helpers for state/entities/assets/metrics
+
+### UI modules
+
+Recommended split:
+
+- `src/ui/index.tsx`
+  - exported slot components
+- `src/ui/page/KitchenSinkPage.tsx`
+- `src/ui/settings/KitchenSinkSettingsPage.tsx`
+- `src/ui/widgets/KitchenSinkDashboardWidget.tsx`
+- `src/ui/tabs/ProjectKitchenSinkTab.tsx`
+- `src/ui/tabs/IssueKitchenSinkTab.tsx`
+- `src/ui/tabs/AgentKitchenSinkTab.tsx`
+- `src/ui/tabs/GoalKitchenSinkTab.tsx`
+- `src/ui/comments/KitchenSinkCommentAnnotation.tsx`
+- `src/ui/comments/KitchenSinkCommentMenuItem.tsx`
+- `src/ui/shared/...`
+
+## Configuration Schema
+
+The plugin should have a substantial but understandable `instanceConfigSchema`.
+
+Recommended config fields:
+
+- `enableDangerousDemos`
+- `enableWorkspaceDemos`
+- `enableProcessDemos`
+- `showSidebarEntry`
+- `showSidebarPanel`
+- `showProjectSidebarItem`
+- `showCommentAnnotation`
+- `showCommentContextMenuItem`
+- `showToolbarLauncher`
+- `defaultDemoCompanyId` optional
+- `secretRefExample`
+- `httpDemoUrl`
+- `processAllowedCommands`
+- `workspaceScratchSubdir`
+
+Defaults should keep risky behavior off.
+
+## Safety Defaults
+
+Default posture:
+
+- UI and read-only demos on
+- mutating domain demos on but explicitly labeled
+- process demos off by default
+- no arbitrary shell input by default
+- no raw secret rendering ever
+
+## Phased Build Plan
+
+### Phase 1: Core plugin skeleton
+
+- scaffold package
+- add manifest, worker, UI entrypoints
+- add README
+- make it appear in bundled examples list
+
+### Phase 2: Core, confirmed UI surfaces
+
+- plugin page
+- settings page
+- dashboard widget
+- project sidebar item
+- detail tabs
+
+### Phase 3: Core worker APIs
+
+- config
+- state
+- entities
+- companies/projects/issues/goals
+- data/actions
+- metrics/logger/activity
+
+### Phase 4: Real-time and automation APIs
+
+- streams
+- events
+- jobs
+- webhooks
+- agent sessions
+- tools
+
+### Phase 5: Local trusted runtime demos
+
+- workspace file demos
+- child process demos
+- guarded by config
+
+### Phase 6: Secondary UI surfaces
+
+- comment annotation
+- comment context menu item
+- launchers
+
+### Phase 7: Validation-only surfaces
+
+Validate whether the current host truly mounts:
+
+- `sidebar`
+- `sidebarPanel`
+- `taskDetailView`
+- direct-slot `toolbarButton`
+- direct-slot `contextMenuItem`
+
+If mounted, add demos.
+If not mounted, document them as SDK-defined but host-pending.
+
+## Documentation Deliverables
+
+The plugin should ship with a README that includes:
+
+- what it demonstrates
+- which surfaces are local-only
+- how to install it
+- where each UI surface should appear
+- a mapping from demo card to SDK API
+
+It should also be referenced from plugin docs as the "reference everything plugin".
+
+## Testing And Verification
+
+Minimum verification:
+
+- package typecheck/build
+- install from bundled example list
+- page loads
+- widget appears
+- project tab appears
+- comment surfaces render
+- settings page loads
+- key actions succeed
+
+Recommended manual checklist:
+
+- create issue from plugin
+- create goal from plugin
+- emit and receive plugin event
+- stream action output
+- open agent session and receive streamed reply
+- upload an asset
+- write plugin activity log
+- run a safe local process demo
+
+## Open Questions
+
+1. Should the process demo remain curated-command-only in the first pass?
+   Recommendation: yes.
+
+2. Should the plugin create throwaway "kitchen sink demo" issues/goals automatically?
+   Recommendation: no. Make creation explicit.
+
+3. Should we expose unsupported-but-typed surfaces in the UI even if host mounting is not wired?
+   Recommendation: yes, but label them as `SDK-defined / host validation pending`.
+
+4. Should agent mutation demos include pause/resume by default?
+   Recommendation: probably yes, but behind a warning block.
+
+5. Should this plugin be treated as a supported regression harness in CI later?
+   Recommendation: yes. Long term, this should be the plugin-runtime smoke test package.
+
+## Recommended Next Step
+
+If this plan looks right, the next implementation pass should start by building only:
+
+- package skeleton
+- page
+- settings page
+- dashboard widget
+- one project detail tab
+- one issue detail tab
+- the basic worker/action/data/state/event scaffolding
+
+That is enough to lock the architecture before filling in every demo surface.
--- a/doc/plans/2026-03-13-workspace-product-model-and-work-product.md
+++ b/doc/plans/2026-03-13-workspace-product-model-and-work-product.md
--- a/doc/plans/2026-03-14-billing-ledger-and-reporting.md
+++ b/doc/plans/2026-03-14-billing-ledger-and-reporting.md
@@ -0,0 +1,468 @@
+# Billing Ledger and Reporting
+
+## Context
+
+Paperclip currently stores model spend in `cost_events` and operational run state in `heartbeat_runs`.
+That split is fine, but the current reporting code tries to infer billing semantics by mixing both tables:
+
+- `cost_events` knows provider, model, tokens, and dollars
+- `heartbeat_runs.usage_json` knows some per-run billing metadata
+- `heartbeat_runs.usage_json` does **not** currently carry enough normalized billing dimensions to support honest provider-level reporting
+
+This becomes incorrect as soon as a company uses more than one provider, more than one billing channel, or more than one billing mode.
+
+Examples:
+
+- direct OpenAI API usage
+- Claude subscription usage with zero marginal dollars
+- subscription overage with dollars and tokens
+- OpenRouter billing where the biller is OpenRouter but the upstream provider is Anthropic or OpenAI
+
+The system needs to support:
+
+- dollar reporting
+- token reporting
+- subscription-included usage
+- subscription overage
+- direct metered API usage
+- future aggregator billing such as OpenRouter
+
+## Product Decision
+
+`cost_events` becomes the canonical billing and usage ledger for reporting.
+
+`heartbeat_runs` remains an operational execution log. It may keep mirrored billing metadata for debugging and transcripts, but reporting must not reconstruct billing semantics from `heartbeat_runs.usage_json`.
+
+## Decision: One Ledger Or Two
+
+We do **not** need two tables to solve the current PR's problem.
+For request-level inference reporting, `cost_events` is enough if it carries the right dimensions:
+
+- upstream provider
+- biller
+- billing type
+- model
+- token fields
+- billed amount
+
+That is why the first implementation pass extends `cost_events` instead of introducing a second table immediately.
+
+However, if Paperclip needs to account for the full billing surface of aggregators and managed AI platforms, then `cost_events` alone is not enough.
+Some charges are not cleanly representable as a single model inference event:
+
+- account top-ups and credit purchases
+- platform fees charged at purchase time
+- BYOK platform fees that are account-level or threshold-based
+- prepaid credit expirations, refunds, and adjustments
+- provisioned throughput commitments
+- fine-tuning, training, model import, and storage charges
+- gateway logging or other platform overhead that is not attributable to one prompt/response pair
+
+So the decision is:
+
+- near term: keep `cost_events` as the inference and usage ledger
+- next phase: add `finance_events` for non-inference financial events
+
+This is a deliberate split between:
+
+- usage and inference accounting
+- account-level and platform-level financial accounting
+
+That separation keeps request reporting honest without forcing us to fake invoice semantics onto rows that were never request-scoped.
+
+## External Motivation And Sources
+
+The need for this model is not theoretical.
+It follows directly from the billing systems of providers and aggregators Paperclip needs to support.
+
+### OpenRouter
+
+Source URLs:
+
+- https://openrouter.ai/docs/faq#credit-and-billing-systems
+- https://openrouter.ai/pricing
+
+Relevant billing behavior as of March 14, 2026:
+
+- OpenRouter passes through underlying inference pricing and deducts request cost from purchased credits.
+- OpenRouter charges a 5.5% fee with a $0.80 minimum when purchasing credits.
+- Crypto payments are charged a 5% fee.
+- BYOK has its own fee model after a free request threshold.
+- OpenRouter billing is aggregated at the OpenRouter account level even when the upstream provider is Anthropic, OpenAI, Google, or another provider.
+
+Implication for Paperclip:
+
+- request usage belongs in `cost_events`
+- credit purchases, purchase fees, BYOK fees, refunds, and expirations belong in `finance_events`
+- `biller=openrouter` must remain distinct from `provider=anthropic|openai|google|...`
+
+### Cloudflare AI Gateway Unified Billing
+
+Source URL:
+
+- https://developers.cloudflare.com/ai-gateway/features/unified-billing/
+
+Relevant billing behavior as of March 14, 2026:
+
+- Unified Billing lets users call multiple upstream providers while receiving a single Cloudflare bill.
+- Usage is paid from Cloudflare-loaded credits.
+- Cloudflare supports manual top-ups and auto top-up thresholds.
+- Spend limits can stop request processing on daily, weekly, or monthly boundaries.
+- Unified Billing traffic can use Cloudflare-managed credentials rather than the user's direct provider key.
+
+Implication for Paperclip:
+
+- request usage needs `biller=cloudflare`
+- upstream provider still needs to be preserved separately
+- Cloudflare credit loads and related account-level events are not inference rows and should not be forced into `cost_events`
+- quota and limits reporting must support biller-level controls, not just upstream provider limits
+
+### Amazon Bedrock
+
+Source URL:
+
+- https://aws.amazon.com/bedrock/pricing/
+
+Relevant billing behavior as of March 14, 2026:
+
+- Bedrock supports on-demand and batch pricing.
+- Bedrock pricing varies by region.
+- some pricing tiers add premiums or discounts relative to standard pricing
+- provisioned throughput is commitment-based rather than request-based
+- custom model import uses Custom Model Units billed per minute, with monthly storage charges
+- imported model copies are billed in 5-minute windows once active
+- customization and fine-tuning introduce training and hosted-model charges beyond normal inference
+
+Implication for Paperclip:
+
+- normal tokenized inference fits in `cost_events`
+- provisioned throughput, custom model unit charges, training, and storage charges require `finance_events`
+- region and pricing tier need to be first-class dimensions in the financial model
+
+## Ledger Boundary
+
+To keep the system coherent, the table boundary should be explicit.
+
+### `cost_events`
+
+Use `cost_events` for request-scoped usage and inference charges:
+
+- one row per billable or usage-bearing run event
+- provider/model/biller/billingType/tokens/cost
+- optionally tied to `heartbeat_run_id`
+- supports direct APIs, subscriptions, overage, OpenRouter-routed inference, Cloudflare-routed inference, and Bedrock on-demand inference
+
+### `finance_events`
+
+Use `finance_events` for account-scoped or platform-scoped financial events:
+
+- credit purchase
+- top-up
+- refund
+- fee
+- expiry
+- provisioned capacity
+- training
+- model import
+- storage
+- invoice adjustment
+
+These rows may or may not have a related model, provider, or run id.
+Trying to force them into `cost_events` would either create fake request rows or create null-heavy rows that mean something fundamentally different from inference usage.
+
+## Canonical Billing Dimensions
+
+Every persisted billing event should model four separate axes:
+
+1. Usage provider
+   The upstream provider whose model performed the work.
+   Examples: `openai`, `anthropic`, `google`.
+
+2. Biller
+   The system that charged for the usage.
+   Examples: `openai`, `anthropic`, `openrouter`, `cursor`, `chatgpt`.
+
+3. Billing type
+   The pricing mode applied to the event.
+   Initial canonical values:
+   - `metered_api`
+   - `subscription_included`
+   - `subscription_overage`
+   - `credits`
+   - `fixed`
+   - `unknown`
+
+4. Measures
+   Usage and billing must both be storable:
+   - `input_tokens`
+   - `output_tokens`
+   - `cached_input_tokens`
+   - `cost_cents`
+
+These dimensions are independent.
+For example, an event may be:
+
+- provider: `anthropic`
+- biller: `openrouter`
+- billing type: `metered_api`
+- tokens: non-zero
+- cost cents: non-zero
+
+Or:
+
+- provider: `anthropic`
+- biller: `anthropic`
+- billing type: `subscription_included`
+- tokens: non-zero
+- cost cents: `0`
+
+## Schema Changes
+
+Extend `cost_events` with:
+
+- `heartbeat_run_id uuid null references heartbeat_runs.id`
+- `biller text not null default 'unknown'`
+- `billing_type text not null default 'unknown'`
+- `cached_input_tokens int not null default 0`
+
+Keep `provider` as the upstream usage provider.
+Do not overload `provider` to mean biller.
+
+Add a future `finance_events` table for account-level financial events with fields along these lines:
+
+- `company_id`
+- `occurred_at`
+- `event_kind`
+- `direction`
+- `biller`
+- `provider nullable`
+- `execution_adapter_type nullable`
+- `pricing_tier nullable`
+- `region nullable`
+- `model nullable`
+- `quantity nullable`
+- `unit nullable`
+- `amount_cents`
+- `currency`
+- `estimated`
+- `related_cost_event_id nullable`
+- `related_heartbeat_run_id nullable`
+- `external_invoice_id nullable`
+- `metadata_json nullable`
+
+Add indexes:
+
+- `(company_id, biller, occurred_at)`
+- `(company_id, provider, occurred_at)`
+- `(company_id, heartbeat_run_id)` if distinct-run reporting remains common
+
+## Shared Contract Changes
+
+### Shared types
+
+Add a shared billing type union and enrich cost types with:
+
+- `heartbeatRunId`
+- `biller`
+- `billingType`
+- `cachedInputTokens`
+
+Update reporting response types so the provider breakdown reflects the ledger directly rather than inferred run metadata.
+
+### Validators
+
+Extend `createCostEventSchema` to accept:
+
+- `heartbeatRunId`
+- `biller`
+- `billingType`
+- `cachedInputTokens`
+
+Defaults:
+
+- `biller` defaults to `provider`
+- `billingType` defaults to `unknown`
+- `cachedInputTokens` defaults to `0`
+
+## Adapter Contract Changes
+
+Extend adapter execution results so they can report:
+
+- `biller`
+- richer billing type values
+
+Backwards compatibility:
+
+- existing adapter values `api` and `subscription` are treated as legacy aliases
+- map `api -> metered_api`
+- map `subscription -> subscription_included`
+
+Future adapters may emit the canonical values directly.
+
+OpenRouter support will use:
+
+- `provider` = upstream provider when known
+- `biller` = `openrouter`
+- `billingType` = `metered_api` unless OpenRouter later exposes another billing mode
+
+Cloudflare Unified Billing support will use:
+
+- `provider` = upstream provider when known
+- `biller` = `cloudflare`
+- `billingType` = `credits` or `metered_api` depending on the normalized request billing contract
+
+Bedrock support will use:
+
+- `provider` = upstream provider or `aws_bedrock` depending on adapter shape
+- `biller` = `aws_bedrock`
+- `billingType` = request-scoped mode for inference rows
+- `finance_events` for provisioned, training, import, and storage charges
+
+## Write Path Changes
+
+### Heartbeat-created events
+
+When a heartbeat run produces usage or spend:
+
+1. normalize adapter billing metadata
+2. write a ledger row to `cost_events`
+3. attach `heartbeat_run_id`
+4. set `provider`, `biller`, `billing_type`, token fields, and `cost_cents`
+
+The write path should no longer depend on later inference from `heartbeat_runs`.
+
+### Manual API-created events
+
+Manual cost event creation remains supported.
+These events may have `heartbeatRunId = null`.
+
+Rules:
+
+- `provider` remains required
+- `biller` defaults to `provider`
+- `billingType` defaults to `unknown`
+
+## Reporting Changes
+
+### Server
+
+Refactor reporting queries to use `cost_events` only.
+
+#### `summary`
+
+- sum `cost_cents`
+
+#### `by-agent`
+
+- sum costs and token fields from `cost_events`
+- use `count(distinct heartbeat_run_id)` filtered by billing type for run counts
+- use token sums filtered by billing type for subscription usage
+
+#### `by-provider`
+
+- group by `provider`, `model`
+- sum costs and token fields directly from the ledger
+- derive billing-type slices from `cost_events.billing_type`
+- never pro-rate from unrelated `heartbeat_runs`
+
+#### future `by-biller`
+
+- group by `biller`
+- this is the right view for invoice and subscription accountability
+
+#### `window-spend`
+
+- continue to use `cost_events`
+
+#### project attribution
+
+Keep current project attribution logic for now, but prefer `cost_events.heartbeat_run_id` as the join anchor whenever possible.
+
+## UI Changes
+
+### Principles
+
+- Spend, usage, and quota are related but distinct
+- a missing quota fetch is not the same as “no quota”
+- provider and biller are different dimensions
+
+### Immediate UI changes
+
+1. Keep the current costs page structure.
+2. Make the provider cards accurate by reading only ledger-backed values.
+3. Show provider quota fetch errors explicitly instead of dropping them.
+
+### Follow-up UI direction
+
+The long-term board UI should expose:
+
+- Spend
+  Dollars by biller, provider, model, agent, project
+- Usage
+  Tokens by provider, model, agent, project
+- Quotas
+  Live provider or biller limits, credits, and reset windows
+- Financial events
+  Credit purchases, top-ups, fees, refunds, commitments, storage, and other non-inference charges
+
+## Migration Plan
+
+Migration behavior:
+
+- add new non-destructive columns with defaults
+- backfill existing rows:
+  - `biller = provider`
+  - `billing_type = 'unknown'`
+  - `cached_input_tokens = 0`
+  - `heartbeat_run_id = null`
+
+Do **not** attempt to backfill historical provider-level subscription attribution from `heartbeat_runs`.
+That data was never stored with the required dimensions.
+
+## Testing Plan
+
+Add or update tests for:
+
+1. heartbeat-created ledger rows persist `heartbeatRunId`, `biller`, `billingType`, and cached tokens
+2. legacy adapter billing values map correctly
+3. provider reporting uses ledger data only
+4. mixed-provider companies do not cross-attribute subscription usage
+5. zero-dollar subscription usage still appears in token reporting
+6. quota fetch failures render explicit UI state
+7. manual cost events still validate and write correctly
+8. biller reporting keeps upstream provider breakdowns separate
+9. OpenRouter-style rows can show `biller=openrouter` with non-OpenRouter upstream providers
+10. Cloudflare-style rows can show `biller=cloudflare` with preserved upstream provider identity
+11. future `finance_events` aggregation handles non-request charges without requiring a model or run id
+
+## Delivery Plan
+
+### Step 1
+
+- land the ledger contract and query rewrite
+- make the current costs page correct
+
+### Step 2
+
+- add biller-oriented reporting endpoints and UI
+
+### Step 3
+
+- wire OpenRouter and any future aggregator adapters to the same contract
+
+### Step 4
+
+- add `executionAdapterType` to persisted cost reporting if adapter-level grouping becomes a product requirement
+
+### Step 5
+
+- introduce `finance_events`
+- add non-inference accounting endpoints
+- add UI for platform/account charges alongside inference spend and usage
+
+## Non-Goals For This Change
+
+- multi-currency support
+- invoice reconciliation
+- provider-specific cost estimation beyond persisted billed cost
+- replacing `heartbeat_runs` as the operational run record
--- a/doc/plans/2026-03-14-budget-policies-and-enforcement.md
+++ b/doc/plans/2026-03-14-budget-policies-and-enforcement.md
@@ -0,0 +1,611 @@
+# Budget Policies and Enforcement
+
+## Context
+
+Paperclip already treats budgets as a core control-plane responsibility:
+
+- `doc/SPEC.md` gives the Board authority to set budgets, pause agents, pause work, and override any budget.
+- `doc/SPEC-implementation.md` says V1 must support monthly UTC budget windows, soft alerts, and hard auto-pause.
+- the current code only partially implements that intent.
+
+Today the system has narrow money-budget behavior:
+
+- companies track `budgetMonthlyCents` and `spentMonthlyCents`
+- agents track `budgetMonthlyCents` and `spentMonthlyCents`
+- `cost_events` ingestion increments those counters
+- when an agent exceeds its monthly budget, the agent is paused
+
+That leaves major product gaps:
+
+- no project budget model
+- no approval generated when budget is hit
+- no generic budget policy system
+- no project pause semantics tied to budget
+- no durable incident tracking to prevent duplicate alerts
+- no separation between enforceable spend budgets and advisory usage quotas
+
+This plan defines the concrete budgeting model Paperclip should implement next.
+
+## Product Goals
+
+Paperclip should let operators:
+
+1. Set budgets on agents and projects.
+2. Understand whether a budget is based on money or usage.
+3. Be warned before a budget is exhausted.
+4. Automatically pause work when a hard budget is hit.
+5. Approve, raise, or resume from a budget stop using obvious UI.
+6. See budget state on the dashboard, `/costs`, and scope detail pages.
+
+The system should make one thing very clear:
+
+- budgets are policy controls
+- quotas are usage visibility
+
+They are related, but they are not the same concept.
+
+## Product Decisions
+
+### V1 Budget Defaults
+
+For the next implementation pass, Paperclip should enforce these defaults:
+
+- agent budgets are recurring monthly budgets
+- project budgets are lifetime total budgets
+- hard-stop enforcement uses billed dollars, not tokens
+- monthly windows use UTC calendar months
+- project total budgets do not reset automatically
+
+This gives a clean mental model:
+
+- agents are ongoing workers, so monthly recurring budget is natural
+- projects are bounded workstreams, so lifetime cap is natural
+
+### Metric To Enforce First
+
+The first enforceable metric should be `billed_cents`.
+
+Reasoning:
+
+- it works across providers, billers, and models
+- it maps directly to real financial risk
+- it handles overage and metered usage consistently
+- it avoids cross-provider token normalization problems
+- it applies cleanly even when future finance events are not token-based
+
+Token budgets should not be the first hard-stop policy.
+They should come later as advisory usage controls once the money-based system is solid.
+
+### Subscription Usage Decision
+
+Paperclip should separate subscription-included usage from billed spend:
+
+- `subscription_included`
+  - visible in reporting
+  - visible in usage summaries
+  - does not count against money budget
+- `subscription_overage`
+  - visible in reporting
+  - counts against money budget
+- `metered_api`
+  - visible in reporting
+  - counts against money budget
+
+This keeps the budget system honest:
+
+- users should not see "spend" rise for usage that did not incur marginal billed cost
+- users should still see the token usage and provider quota state
+
+### Soft Alert Versus Hard Stop
+
+Paperclip should have two threshold classes:
+
+- soft alert
+  - creates visible notification state
+  - does not create an approval
+  - does not pause work
+- hard stop
+  - pauses the affected scope automatically
+  - creates an approval requiring human resolution
+  - prevents additional heartbeats or task pickup in that scope
+
+Default thresholds:
+
+- soft alert at `80%`
+- hard stop at `100%`
+
+These should be configurable per policy later, but they are good defaults now.
+
+## Scope Model
+
+### Supported Scope Types
+
+Budget policies should support:
+
+- `company`
+- `agent`
+- `project`
+
+This plan focuses on finishing `agent` and `project` first while preserving the existing company budget behavior.
+
+### Recommended V1.5 Policy Presets
+
+- Company
+  - metric: `billed_cents`
+  - window: `calendar_month_utc`
+- Agent
+  - metric: `billed_cents`
+  - window: `calendar_month_utc`
+- Project
+  - metric: `billed_cents`
+  - window: `lifetime`
+
+Future extensions can add:
+
+- token advisory policies
+- daily or weekly spend windows
+- provider- or biller-scoped budgets
+- inherited delegated budgets down the org tree
+
+## Current Implementation Baseline
+
+The current codebase is not starting from zero, but the existing shape is too ad hoc to extend safely.
+
+### What Exists Today
+
+- company and agent monthly cents counters
+- cost ingestion that updates those counters
+- agent hard-stop pause on monthly budget overrun
+
+### What Is Missing
+
+- project budgets
+- generic budget policy persistence
+- generic threshold crossing detection
+- incident deduplication per scope/window
+- approval creation on hard-stop
+- project execution blocking
+- budget timeline and incident UI
+- distinction between advisory quota and enforceable budget
+
+## Proposed Data Model
+
+### 1. `budget_policies`
+
+Create a new table for canonical budget definitions.
+
+Suggested fields:
+
+- `id`
+- `company_id`
+- `scope_type`
+- `scope_id`
+- `metric`
+- `window_kind`
+- `amount`
+- `warn_percent`
+- `hard_stop_enabled`
+- `notify_enabled`
+- `is_active`
+- `created_by_user_id`
+- `updated_by_user_id`
+- `created_at`
+- `updated_at`
+
+Notes:
+
+- `scope_type` is one of `company | agent | project`
+- `scope_id` is nullable only for company-level policy if company is implied; otherwise keep it explicit
+- `metric` should start with `billed_cents`
+- `window_kind` starts with `calendar_month_utc | lifetime`
+- `amount` is stored in the natural unit of the metric
+
+### 2. `budget_incidents`
+
+Create a durable record of threshold crossings.
+
+Suggested fields:
+
+- `id`
+- `company_id`
+- `policy_id`
+- `scope_type`
+- `scope_id`
+- `metric`
+- `window_kind`
+- `window_start`
+- `window_end`
+- `threshold_type`
+- `amount_limit`
+- `amount_observed`
+- `status`
+- `approval_id` nullable
+- `activity_id` nullable
+- `resolved_at` nullable
+- `created_at`
+- `updated_at`
+
+Notes:
+
+- `threshold_type`: `soft | hard`
+- `status`: `open | acknowledged | resolved | dismissed`
+- one open incident per policy per threshold per window prevents duplicate approvals and alert spam
+
+### 3. Project Pause State
+
+Projects need explicit pause semantics.
+
+Recommended approach:
+
+- extend project status or add a pause field so a project can be blocked by budget
+- preserve whether the project is paused due to budget versus manually paused
+
+Preferred shape:
+
+- keep project workflow status as-is
+- add execution-state fields:
+  - `execution_status`: `active | paused | archived`
+  - `pause_reason`: `manual | budget | system | null`
+
+If that is too large for the immediate pass, a smaller version is:
+
+- add `paused_at`
+- add `pause_reason`
+
+The key requirement is behavioral, not cosmetic:
+Paperclip must know that a project is budget-paused and enforce it.
+
+### 4. Compatibility With Existing Budget Columns
+
+Existing company and agent monthly budget columns should remain temporarily for compatibility.
+
+Migration plan:
+
+1. keep reading existing columns during transition
+2. create equivalent `budget_policies` rows
+3. switch enforcement and UI to policies
+4. later remove or deprecate legacy columns
+
+## Budget Engine
+
+Budget enforcement should move into a dedicated service.
+
+Current logic is buried inside cost ingestion.
+That is too narrow because budget checks must apply at more than one execution boundary.
+
+### Responsibilities
+
+New service: `budgetService`
+
+Responsibilities:
+
+- resolve applicable policies for a cost event
+- compute current window totals
+- detect threshold crossings
+- create incidents, activities, and approvals
+- pause affected scopes on hard-stop
+- provide preflight enforcement checks for execution entry points
+
+### Canonical Evaluation Flow
+
+When a new `cost_event` is written:
+
+1. persist the `cost_event`
+2. identify affected scopes
+   - company
+   - agent
+   - project
+3. fetch active policies for those scopes
+4. compute current observed amount for each policy window
+5. compare to thresholds
+6. create soft incident if soft threshold crossed for first time in window
+7. create hard incident if hard threshold crossed for first time in window
+8. if hard incident:
+   - pause the scope
+   - create approval
+   - create activity event
+   - emit notification state
+
+### Preflight Enforcement Checks
+
+Budget enforcement cannot rely only on post-hoc cost ingestion.
+
+Paperclip must also block execution before new work starts.
+
+Add budget checks to:
+
+- scheduler heartbeat dispatch
+- manual invoke endpoints
+- assignment-driven wakeups
+- queued run promotion
+- issue checkout or pickup paths where applicable
+
+If a scope is budget-paused:
+
+- do not start a new heartbeat
+- do not let the agent pick up additional work
+- present a clear reason in API and UI
+
+### Active Run Behavior
+
+When a hard-stop is triggered while a run is already active:
+
+- mark scope paused immediately for future work
+- request graceful cancellation of the current run
+- allow normal cancellation timeout behavior
+- write activity explaining that pause came from budget enforcement
+
+This mirrors the general pause semantics already expected by the product.
+
+## Approval Model
+
+Budget hard-stops should create a first-class approval.
+
+### New Approval Type
+
+Add approval type:
+
+- `budget_override_required`
+
+Payload should include:
+
+- `scopeType`
+- `scopeId`
+- `scopeName`
+- `metric`
+- `windowKind`
+- `thresholdType`
+- `budgetAmount`
+- `observedAmount`
+- `windowStart`
+- `windowEnd`
+- `topDrivers`
+- `paused`
+
+### Resolution Actions
+
+The approval UI should support:
+
+- raise budget and resume
+- resume once without changing policy
+- keep paused
+
+Optional later action:
+
+- disable budget policy
+
+### Soft Alerts Do Not Need Approval
+
+Soft alerts should create:
+
+- activity event
+- dashboard alert
+- inbox notification or similar board-visible signal
+
+They should not create an approval by default.
+
+## Notification And Activity Model
+
+Budget events need obvious operator visibility.
+
+Required outputs:
+
+- activity log entry on threshold crossings
+- dashboard surface for active budget incidents
+- detail page banner on paused agent or project
+- `/costs` summary of active incidents and policy health
+
+Later channels:
+
+- email
+- webhook
+- Slack or other integrations
+
+## API Plan
+
+### Policy Management
+
+Add routes for:
+
+- list budget policies for company
+- create budget policy
+- update budget policy
+- archive or disable budget policy
+
+### Incident Surfaces
+
+Add routes for:
+
+- list active budget incidents
+- list incident history
+- get incident detail for a scope
+
+### Approval Resolution
+
+Budget approvals should use the existing approval system once the new approval type is added.
+
+Expected flows:
+
+- create approval on hard-stop
+- resolve approval by changing policy and resuming
+- resolve approval by resuming once
+
+### Execution Errors
+
+When work is blocked by budget, the API should return explicit errors.
+
+Examples:
+
+- agent invocation blocked because agent budget is paused
+- issue execution blocked because project budget is paused
+
+Do not silently no-op.
+
+## UI Plan
+
+Budgeting should be visible in the places where operators make decisions.
+
+### `/costs`
+
+Add a budget section that includes:
+
+- active budget incidents
+- policy list with scope, window, metric, and threshold state
+- progress bars for current period or total
+- clear distinction between:
+  - spend budget
+  - subscription quota
+- quick actions:
+  - raise budget
+  - open approval
+  - resume scope if permitted
+
+The page should make this visual distinction obvious:
+
+- Budget
+  - enforceable spend policy
+- Quota
+  - provider or subscription usage window
+
+### Agent Detail
+
+Add an agent budget card:
+
+- monthly budget amount
+- current month spend
+- remaining spend
+- status
+- warning or paused banner
+- link to approval if blocked
+
+### Project Detail
+
+Add a project budget card:
+
+- total budget amount
+- total spend to date
+- remaining spend
+- pause status
+- approval link
+
+Project detail should also show if issue execution is blocked because the project is budget-paused.
+
+### Dashboard
+
+Add a high-signal budget section:
+
+- active budget breaches
+- upcoming soft alerts
+- counts of paused agents and paused projects due to budget
+
+The operator should not have to visit `/costs` to learn that work has stopped.
+
+## Budget Math
+
+### What Counts Toward Budget
+
+For V1.5 enforcement, include:
+
+- `metered_api` cost events
+- `subscription_overage` cost events
+- any future request-scoped cost event with non-zero billed cents
+
+Do not include:
+
+- `subscription_included` cost events with zero billed cents
+- advisory quota rows
+- account-level finance events unless and until company-level financial budgets are added explicitly
+
+### Why Not Tokens First
+
+Token budgets should not be the first hard-stop because:
+
+- providers count tokens differently
+- cached tokens complicate simple totals
+- some future charges are not token-based
+- subscription tokens do not necessarily imply spend
+- money remains the cleanest cross-provider enforcement metric
+
+### Future Budget Metrics
+
+Future policy metrics can include:
+
+- `total_tokens`
+- `input_tokens`
+- `output_tokens`
+- `requests`
+- `finance_amount_cents`
+
+But they should enter only after the money-budget path is stable.
+
+## Migration Plan
+
+### Phase 1: Foundation
+
+- add `budget_policies`
+- add `budget_incidents`
+- add new approval type
+- add project pause metadata
+
+### Phase 2: Compatibility
+
+- backfill policies from existing company and agent monthly budget columns
+- keep legacy columns readable during migration
+
+### Phase 3: Enforcement
+
+- move budget logic into dedicated service
+- add hard-stop incident creation
+- add activity and approval creation
+- add execution guards on heartbeat and invoke paths
+
+### Phase 4: UI
+
+- `/costs` budget section
+- agent detail budget card
+- project detail budget card
+- dashboard incident summary
+
+### Phase 5: Cleanup
+
+- move all reads/writes to `budget_policies`
+- reduce legacy column reliance
+- decide whether to remove old budget columns
+
+## Tests
+
+Required coverage:
+
+- agent monthly budget soft alert at 80%
+- agent monthly budget hard-stop at 100%
+- project lifetime budget soft alert
+- project lifetime budget hard-stop
+- `subscription_included` usage does not consume money budget
+- `subscription_overage` does consume money budget
+- hard-stop creates one incident per threshold per window
+- hard-stop creates approval and pauses correct scope
+- paused project blocks new issue execution
+- paused agent blocks new heartbeat dispatch
+- policy update and resume clears or resolves active incident correctly
+- dashboard and `/costs` surface active incidents
+
+## Open Questions
+
+These should be explicitly deferred unless they block implementation:
+
+- Should project budgets also support monthly mode, or is lifetime enough for the first release?
+- Should company-level budgets eventually include `finance_events` such as OpenRouter top-up fees and Bedrock provisioned charges?
+- Should delegated budget editing be limited by org hierarchy in V1, or remain board-only in the UI even if the data model can support delegation later?
+- Do we need "resume once" immediately, or can first approval resolution be "raise budget and resume" plus "keep paused"?
+
+## Recommendation
+
+Implement the first coherent budgeting system with these rules:
+
+- Agent budget = monthly billed dollars
+- Project budget = lifetime billed dollars
+- Hard-stop = auto-pause + approval
+- Soft alert = visible warning, no approval
+- Subscription usage = visible quota and token reporting, not money-budget enforcement
+
+This solves the real operator problem without mixing together spend control, provider quota windows, and token accounting.
--- a/doc/plans/2026-03-17-memory-service-surface-api.md
+++ b/doc/plans/2026-03-17-memory-service-surface-api.md
@@ -0,0 +1,426 @@
+# Paperclip Memory Service Plan
+
+## Goal
+
+Define a Paperclip memory service and surface API that can sit above multiple memory backends, while preserving Paperclip's control-plane requirements:
+
+- company scoping
+- auditability
+- provenance back to Paperclip work objects
+- budget / cost visibility
+- plugin-first extensibility
+
+This plan is based on the external landscape summarized in `doc/memory-landscape.md` and on the current Paperclip architecture in:
+
+- `doc/SPEC-implementation.md`
+- `doc/plugins/PLUGIN_SPEC.md`
+- `doc/plugins/PLUGIN_AUTHORING_GUIDE.md`
+- `packages/plugins/sdk/src/types.ts`
+
+## Recommendation In One Sentence
+
+Paperclip should not embed one opinionated memory engine into core. It should add a company-scoped memory control plane with a small normalized adapter contract, then let built-ins and plugins implement the provider-specific behavior.
+
+## Product Decisions
+
+### 1. Memory is company-scoped by default
+
+Every memory binding belongs to exactly one company.
+
+That binding can then be:
+
+- the company default
+- an agent override
+- a project override later if we need it
+
+No cross-company memory sharing in the initial design.
+
+### 2. Providers are selected by key
+
+Each configured memory provider gets a stable key inside a company, for example:
+
+- `default`
+- `mem0-prod`
+- `local-markdown`
+- `research-kb`
+
+Agents and services resolve the active provider by key, not by hard-coded vendor logic.
+
+### 3. Plugins are the primary provider path
+
+Built-ins are useful for a zero-config local path, but most providers should arrive through the existing Paperclip plugin runtime.
+
+That keeps the core small and matches the current direction that optional knowledge-like systems live at the edges.
+
+### 4. Paperclip owns routing, provenance, and accounting
+
+Providers should not decide how Paperclip entities map to governance.
+
+Paperclip core should own:
+
+- who is allowed to call a memory operation
+- which company / agent / project scope is active
+- what issue / run / comment / document the operation belongs to
+- how usage gets recorded
+
+### 5. Automatic memory should be narrow at first
+
+Automatic capture is useful, but broad silent capture is dangerous.
+
+Initial automatic hooks should be:
+
+- post-run capture from agent runs
+- issue comment / document capture when the binding enables it
+- pre-run recall for agent context hydration
+
+Everything else should start explicit.
+
+## Proposed Concepts
+
+### Memory provider
+
+A built-in or plugin-supplied implementation that stores and retrieves memory.
+
+Examples:
+
+- local markdown + vector index
+- mem0 adapter
+- supermemory adapter
+- MemOS adapter
+
+### Memory binding
+
+A company-scoped configuration record that points to a provider and carries provider-specific config.
+
+This is the object selected by key.
+
+### Memory scope
+
+The normalized Paperclip scope passed into a provider request.
+
+At minimum:
+
+- `companyId`
+- optional `agentId`
+- optional `projectId`
+- optional `issueId`
+- optional `runId`
+- optional `subjectId` for external/user identity
+
+### Memory source reference
+
+The provenance handle that explains where a memory came from.
+
+Supported source kinds should include:
+
+- `issue_comment`
+- `issue_document`
+- `issue`
+- `run`
+- `activity`
+- `manual_note`
+- `external_document`
+
+### Memory operation
+
+A normalized write, query, browse, or delete action performed through Paperclip.
+
+Paperclip should log every operation, whether the provider is local or external.
+
+## Required Adapter Contract
+
+The required core should be small enough to fit `memsearch`, `mem0`, `Memori`, `MemOS`, or `OpenViking`.
+
+```ts
+export interface MemoryAdapterCapabilities {
+  profile?: boolean;
+  browse?: boolean;
+  correction?: boolean;
+  asyncIngestion?: boolean;
+  multimodal?: boolean;
+  providerManagedExtraction?: boolean;
+}
+
+export interface MemoryScope {
+  companyId: string;
+  agentId?: string;
+  projectId?: string;
+  issueId?: string;
+  runId?: string;
+  subjectId?: string;
+}
+
+export interface MemorySourceRef {
+  kind:
+    | "issue_comment"
+    | "issue_document"
+    | "issue"
+    | "run"
+    | "activity"
+    | "manual_note"
+    | "external_document";
+  companyId: string;
+  issueId?: string;
+  commentId?: string;
+  documentKey?: string;
+  runId?: string;
+  activityId?: string;
+  externalRef?: string;
+}
+
+export interface MemoryUsage {
+  provider: string;
+  model?: string;
+  inputTokens?: number;
+  outputTokens?: number;
+  embeddingTokens?: number;
+  costCents?: number;
+  latencyMs?: number;
+  details?: Record<string, unknown>;
+}
+
+export interface MemoryWriteRequest {
+  bindingKey: string;
+  scope: MemoryScope;
+  source: MemorySourceRef;
+  content: string;
+  metadata?: Record<string, unknown>;
+  mode?: "append" | "upsert" | "summarize";
+}
+
+export interface MemoryRecordHandle {
+  providerKey: string;
+  providerRecordId: string;
+}
+
+export interface MemoryQueryRequest {
+  bindingKey: string;
+  scope: MemoryScope;
+  query: string;
+  topK?: number;
+  intent?: "agent_preamble" | "answer" | "browse";
+  metadataFilter?: Record<string, unknown>;
+}
+
+export interface MemorySnippet {
+  handle: MemoryRecordHandle;
+  text: string;
+  score?: number;
+  summary?: string;
+  source?: MemorySourceRef;
+  metadata?: Record<string, unknown>;
+}
+
+export interface MemoryContextBundle {
+  snippets: MemorySnippet[];
+  profileSummary?: string;
+  usage?: MemoryUsage[];
+}
+
+export interface MemoryAdapter {
+  key: string;
+  capabilities: MemoryAdapterCapabilities;
+  write(req: MemoryWriteRequest): Promise<{
+    records?: MemoryRecordHandle[];
+    usage?: MemoryUsage[];
+  }>;
+  query(req: MemoryQueryRequest): Promise<MemoryContextBundle>;
+  get(handle: MemoryRecordHandle, scope: MemoryScope): Promise<MemorySnippet | null>;
+  forget(handles: MemoryRecordHandle[], scope: MemoryScope): Promise<{ usage?: MemoryUsage[] }>;
+}
+```
+
+This contract intentionally does not force a provider to expose its internal graph, filesystem, or ontology.
+
+## Optional Adapter Surfaces
+
+These should be capability-gated, not required:
+
+- `browse(scope, filters)` for file-system / graph / timeline inspection
+- `correct(handle, patch)` for natural-language correction flows
+- `profile(scope)` when the provider can synthesize stable preferences or summaries
+- `sync(source)` for connectors or background ingestion
+- `explain(queryResult)` for providers that can expose retrieval traces
+
+## What Paperclip Should Persist
+
+Paperclip should not mirror the full provider memory corpus into Postgres unless the provider is a Paperclip-managed local provider.
+
+Paperclip core should persist:
+
+- memory bindings and overrides
+- provider keys and capability metadata
+- normalized memory operation logs
+- provider record handles returned by operations when available
+- source references back to issue comments, documents, runs, and activity
+- usage and cost data
+
+For external providers, the memory payload itself can remain in the provider.
+
+## Hook Model
+
+### Automatic hooks
+
+These should be low-risk and easy to reason about:
+
+1. `pre-run hydrate`
+   Before an agent run starts, Paperclip may call `query(... intent = "agent_preamble")` using the active binding.
+
+2. `post-run capture`
+   After a run finishes, Paperclip may write a summary or transcript-derived note tied to the run.
+
+3. `issue comment / document capture`
+   When enabled on the binding, Paperclip may capture selected issue comments or issue documents as memory sources.
+
+### Explicit hooks
+
+These should be tool- or UI-driven first:
+
+- `memory.search`
+- `memory.note`
+- `memory.forget`
+- `memory.correct`
+- `memory.browse`
+
+### Not automatic in the first version
+
+- broad web crawling
+- silent import of arbitrary repo files
+- cross-company memory sharing
+- automatic destructive deletion
+- provider migration between bindings
+
+## Agent UX Rules
+
+Paperclip should give agents both automatic recall and explicit tools, with simple guidance:
+
+- use `memory.search` when the task depends on prior decisions, people, projects, or long-running context that is not in the current issue thread
+- use `memory.note` when a durable fact, preference, or decision should survive this run
+- use `memory.correct` when the user explicitly says prior context is wrong
+- rely on post-run auto-capture for ordinary session residue so agents do not have to write memory notes for every trivial exchange
+
+This keeps memory available without forcing every agent prompt to become a memory-management protocol.
+
+## Browse And Inspect Surface
+
+Paperclip needs a first-class UI for memory, otherwise providers become black boxes.
+
+The initial browse surface should support:
+
+- active binding by company and agent
+- recent memory operations
+- recent write sources
+- query results with source backlinks
+- filters by agent, issue, run, source kind, and date
+- provider usage / cost / latency summaries
+
+When a provider supports richer browsing, the plugin can add deeper views through the existing plugin UI surfaces.
+
+## Cost And Evaluation
+
+Every adapter response should be able to return usage records.
+
+Paperclip should roll up:
+
+- memory inference tokens
+- embedding tokens
+- external provider cost
+- latency
+- query count
+- write count
+
+It should also record evaluation-oriented metrics where possible:
+
+- recall hit rate
+- empty query rate
+- manual correction count
+- per-binding success / failure counts
+
+This is important because a memory system that "works" but silently burns budget is not acceptable in Paperclip.
+
+## Suggested Data Model Additions
+
+At the control-plane level, the likely new core tables are:
+
+- `memory_bindings`
+  - company-scoped key
+  - provider id / plugin id
+  - config blob
+  - enabled status
+
+- `memory_binding_targets`
+  - target type (`company`, `agent`, later `project`)
+  - target id
+  - binding id
+
+- `memory_operations`
+  - company id
+  - binding id
+  - operation type (`write`, `query`, `forget`, `browse`, `correct`)
+  - scope fields
+  - source refs
+  - usage / latency / cost
+  - success / error
+
+Provider-specific long-form state should stay in plugin state or the provider itself unless a built-in local provider needs its own schema.
+
+## Recommended First Built-In
+
+The best zero-config built-in is a local markdown-first provider with optional semantic indexing.
+
+Why:
+
+- it matches Paperclip's local-first posture
+- it is inspectable
+- it is easy to back up and debug
+- it gives the system a baseline even without external API keys
+
+The design should still treat that built-in as just another provider behind the same control-plane contract.
+
+## Rollout Phases
+
+### Phase 1: Control-plane contract
+
+- add memory binding models and API types
+- add plugin capability / registration surface for memory providers
+- add operation logging and usage reporting
+
+### Phase 2: One built-in + one plugin example
+
+- ship a local markdown-first provider
+- ship one hosted adapter example to validate the external-provider path
+
+### Phase 3: UI inspection
+
+- add company / agent memory settings
+- add a memory operation explorer
+- add source backlinks to issues and runs
+
+### Phase 4: Automatic hooks
+
+- pre-run hydrate
+- post-run capture
+- selected issue comment / document capture
+
+### Phase 5: Rich capabilities
+
+- correction flows
+- provider-native browse / graph views
+- project-level overrides if needed
+- evaluation dashboards
+
+## Open Questions
+
+- Should project overrides exist in V1 of the memory service, or should we force company default + agent override first?
+- Do we want Paperclip-managed extraction pipelines at all, or should built-ins be the only place where Paperclip owns extraction?
+- Should memory usage extend the current `cost_events` model directly, or should memory operations keep a parallel usage log and roll up into `cost_events` secondarily?
+- Do we want provider install / binding changes to require approvals for some companies?
+
+## Bottom Line
+
+The right abstraction is:
+
+- Paperclip owns memory bindings, scopes, provenance, governance, and usage reporting.
+- Providers own extraction, ranking, storage, and provider-native memory semantics.
+
+That gives Paperclip a stable "memory service" without locking the product to one memory philosophy or one vendor.
--- a/doc/plans/2026-03-17-release-automation-and-versioning.md
+++ b/doc/plans/2026-03-17-release-automation-and-versioning.md
@@ -0,0 +1,491 @@
+# Release Automation and Versioning Simplification Plan
+
+## Context
+
+Paperclip's current release flow is documented in `doc/RELEASING.md` and implemented through:
+
+- `.github/workflows/release.yml`
+- `scripts/release-lib.sh`
+- `scripts/release-start.sh`
+- `scripts/release-preflight.sh`
+- `scripts/release.sh`
+- `scripts/create-github-release.sh`
+
+Today the model is:
+
+1. pick `patch`, `minor`, or `major`
+2. create `release/X.Y.Z`
+3. draft `releases/vX.Y.Z.md`
+4. publish one or more canaries from that release branch
+5. publish stable from that same branch
+6. push tag + create GitHub Release
+7. merge the release branch back to `master`
+
+That is workable, but it creates friction in exactly the places that should be cheap:
+
+- deciding `patch` vs `minor` vs `major`
+- cutting and carrying release branches
+- manually publishing canaries
+- thinking about changelog generation for canaries
+- handling npm credentials safely in a public repo
+
+The target state from this discussion is simpler:
+
+- every push to `master` publishes a canary automatically
+- stable releases are promoted deliberately from a vetted commit
+- versioning is date-driven instead of semantics-driven
+- stable publishing is secure even in a public open-source repository
+- changelog generation happens only for real stable releases
+
+## Recommendation In One Sentence
+
+Move Paperclip to semver-compatible calendar versioning, auto-publish canaries from `master`, promote stable from a chosen tested commit, and use npm trusted publishing plus GitHub environments so no long-lived npm or LLM token needs to live in Actions.
+
+## Core Decisions
+
+### 1. Use calendar versions, but keep semver syntax
+
+The repo and npm tooling still assume semver-shaped version strings in many places. That does not mean Paperclip must keep semver as a product policy. It does mean the version format should remain semver-valid.
+
+Recommended format:
+
+- stable: `YYYY.M.D`
+- canary: `YYYY.M.D-canary.N`
+
+Examples:
+
+- stable on March 17, 2026: `2026.3.17`
+- third canary on March 17, 2026: `2026.3.17-canary.2`
+
+Why this shape:
+
+- it removes `patch/minor/major` decisions
+- it is valid semver syntax
+- it stays compatible with npm, dist-tags, and existing semver validators
+- it is close to the format you actually want
+
+Important constraints:
+
+- `2026.03.17` is not the format to use
+  - numeric semver identifiers do not allow leading zeroes
+- `2026.03.16.8` is not the format to use
+  - semver has three numeric components, not four
+- the practical semver-safe equivalent of your example is `2026.3.16-canary.8`
+
+This is effectively CalVer on semver rails.
+
+### 2. Accept that CalVer changes the compatibility contract
+
+This is not semver in spirit anymore. It is semver in syntax only.
+
+That tradeoff is probably acceptable for Paperclip, but it should be explicit:
+
+- consumers no longer infer compatibility from `major/minor/patch`
+- release notes become the compatibility signal
+- downstream users should prefer exact pins or deliberate upgrades
+
+This is especially relevant for public library packages like `@paperclipai/shared`, `@paperclipai/db`, and the adapter packages.
+
+### 3. Drop release branches for normal publishing
+
+If every merge to `master` publishes a canary, the current `release/X.Y.Z` train model becomes more ceremony than value.
+
+Recommended replacement:
+
+- `master` is the only canary train
+- every push to `master` can publish a canary
+- stable is published from a chosen commit or canary tag on `master`
+
+This matches the workflow you actually want:
+
+- merge continuously
+- let npm always have a fresh canary
+- choose a known-good canary later and promote that commit to stable
+
+### 4. Promote by source ref, not by "renaming" a canary
+
+This is the most important mechanical constraint.
+
+npm can move dist-tags, but it does not let you rename an already-published version. That means:
+
+- you can move `latest` to `paperclipai@1.2.3`
+- you cannot turn `paperclipai@2026.3.16-canary.8` into `paperclipai@2026.3.17`
+
+So "promote canary to stable" really means:
+
+1. choose the commit or canary tag you trust
+2. rebuild from that exact commit
+3. publish it again with the stable version string
+
+Because of that, the stable workflow should take a source ref, not just a bump type.
+
+Recommended stable input:
+
+- `source_ref`
+  - commit SHA, or
+  - a canary git tag such as `canary/v2026.3.16-canary.8`
+
+### 5. Only stable releases get release notes, tags, and GitHub Releases
+
+Canaries should stay lightweight:
+
+- publish to npm under `canary`
+- optionally create a lightweight or annotated git tag
+- do not create GitHub Releases
+- do not require `releases/v*.md`
+- do not spend LLM tokens
+
+Stable releases should remain the public narrative surface:
+
+- git tag `v2026.3.17`
+- GitHub Release `v2026.3.17`
+- stable changelog file `releases/v2026.3.17.md`
+
+## Security Model
+
+### Recommendation
+
+Use npm trusted publishing with GitHub Actions OIDC, then disable token-based publishing access for the packages.
+
+Why:
+
+- no long-lived `NPM_TOKEN` in repo or org secrets
+- no personal npm token in Actions
+- short-lived credentials minted only for the authorized workflow
+- automatic npm provenance for public packages in public repos
+
+This is the cleanest answer to the open-repo security concern.
+
+### Concrete controls
+
+#### 1. Use one release workflow file
+
+Use one workflow filename for both canary and stable publishing:
+
+- `.github/workflows/release.yml`
+
+Why:
+
+- npm trusted publishing is configured per workflow filename
+- npm currently allows one trusted publisher configuration per package
+- GitHub environments can still provide separate canary/stable approval rules inside the same workflow
+
+#### 2. Use separate GitHub environments
+
+Recommended environments:
+
+- `npm-canary`
+- `npm-stable`
+
+Recommended policy:
+
+- `npm-canary`
+  - allowed branch: `master`
+  - no human reviewer required
+- `npm-stable`
+  - allowed branch: `master`
+  - required reviewer enabled
+  - prevent self-review enabled
+  - admin bypass disabled
+
+Stable should require an explicit second human gate even if the workflow is manually dispatched.
+
+#### 3. Lock down workflow edits
+
+Add or tighten `CODEOWNERS` coverage for:
+
+- `.github/workflows/*`
+- `scripts/release*`
+- `doc/RELEASING.md`
+
+This matters because trusted publishing authorizes a workflow file. The biggest remaining risk is not secret exfiltration from forks. It is a maintainer-approved change to the release workflow itself.
+
+#### 4. Remove traditional npm token access after OIDC works
+
+After trusted publishing is verified:
+
+- set package publishing access to require 2FA and disallow tokens
+- revoke any legacy automation tokens
+
+That eliminates the "someone stole the npm token" class of failure.
+
+### What not to do
+
+- do not put your personal Claude or npm token in GitHub Actions
+- do not run release logic from `pull_request_target`
+- do not make stable publishing depend on a repo secret if OIDC can handle it
+- do not create canary GitHub Releases
+
+## Changelog Strategy
+
+### Recommendation
+
+Generate stable changelogs only, and keep LLM-assisted changelog generation out of CI for now.
+
+Reasoning:
+
+- canaries happen too often
+- canaries do not need polished public notes
+- putting a personal Claude token into Actions is not worth the risk
+- stable release cadence is low enough that a human-in-the-loop step is acceptable
+
+Recommended stable path:
+
+1. pick a canary commit or tag
+2. run changelog generation locally from a trusted machine
+3. commit `releases/vYYYY.M.D.md`
+4. run stable promotion
+
+If the notes are not ready yet, a fallback is acceptable:
+
+- publish stable
+- create a minimal GitHub Release
+- update `releases/vYYYY.M.D.md` immediately afterward
+
+But the better steady-state is to have the stable notes committed before stable publish.
+
+### Future option
+
+If you later want CI-assisted changelog drafting, do it with:
+
+- a dedicated service account
+- a token scoped only for changelog generation
+- a manual workflow
+- a dedicated environment with required reviewers
+
+That is phase-two hardening work, not a phase-one requirement.
+
+## Proposed Future Workflow
+
+### Canary workflow
+
+Trigger:
+
+- `push` on `master`
+
+Steps:
+
+1. checkout the merged `master` commit
+2. run verification on that exact commit
+3. compute canary version for current UTC date
+4. version public packages to `YYYY.M.D-canary.N`
+5. publish to npm with dist-tag `canary`
+6. create a canary git tag for traceability
+
+Recommended canary tag format:
+
+- `canary/v2026.3.17-canary.4`
+
+Outputs:
+
+- npm canary published
+- git tag created
+- no GitHub Release
+- no changelog file required
+
+### Stable workflow
+
+Trigger:
+
+- `workflow_dispatch`
+
+Inputs:
+
+- `source_ref`
+- optional `stable_date`
+- `dry_run`
+
+Steps:
+
+1. checkout `source_ref`
+2. run verification on that exact commit
+3. compute stable version from UTC date or provided override
+4. fail if `vYYYY.M.D` already exists
+5. require `releases/vYYYY.M.D.md`
+6. version public packages to `YYYY.M.D`
+7. publish to npm under `latest`
+8. create git tag `vYYYY.M.D`
+9. push tag
+10. create GitHub Release from `releases/vYYYY.M.D.md`
+
+Outputs:
+
+- stable npm release
+- stable git tag
+- GitHub Release
+- clean public changelog surface
+
+## Implementation Guidance
+
+### 1. Replace bump-type version math with explicit version computation
+
+The current release scripts depend on:
+
+- `patch`
+- `minor`
+- `major`
+
+That logic should be replaced with:
+
+- `compute_canary_version_for_date`
+- `compute_stable_version_for_date`
+
+For example:
+
+- `stable_version_for_utc_date(2026-03-17) -> 2026.3.17`
+- `next_canary_for_utc_date(2026-03-17) -> 2026.3.17-canary.0`
+
+### 2. Stop requiring `release/X.Y.Z`
+
+These current invariants should be removed from the happy path:
+
+- "must run from branch `release/X.Y.Z`"
+- "stable and canary for `X.Y.Z` come from the same release branch"
+- `release-start.sh`
+
+Replace them with:
+
+- canary must run from `master`
+- stable may run from a pinned `source_ref`
+
+### 3. Keep Changesets only if it stays helpful
+
+The current system uses Changesets to:
+
+- rewrite package versions
+- maintain package-level `CHANGELOG.md` files
+- publish packages
+
+With CalVer, Changesets may still be useful for publish orchestration, but it should no longer own version selection.
+
+Recommended implementation order:
+
+1. keep `changeset publish` if it works with explicitly-set versions
+2. replace version computation with a small explicit versioning script
+3. if Changesets keeps fighting the model, remove it from release publishing entirely
+
+Paperclip's release problem is now "publish the whole fixed package set at one explicit version", not "derive the next semantic bump from human intent".
+
+### 4. Add a dedicated versioning script
+
+Recommended new script:
+
+- `scripts/set-release-version.mjs`
+
+Responsibilities:
+
+- set the version in all public publishable packages
+- update any internal exact-version references needed for publishing
+- update CLI version strings
+- avoid broad string replacement across unrelated files
+
+This is safer than keeping a bump-oriented changeset flow and then forcing it into a date-based scheme.
+
+### 5. Keep rollback based on dist-tags
+
+`rollback-latest.sh` should stay, but it should stop assuming a semver meaning beyond syntax.
+
+It should continue to:
+
+- repoint `latest` to a prior stable version
+- never unpublish
+
+## Tradeoffs and Risks
+
+### 1. One stable per UTC day
+
+With plain `YYYY.M.D`, you get one stable release per UTC day.
+
+That is probably fine, but it is a real product rule.
+
+If you need multiple same-day stables later, you have three options:
+
+1. accept a less pretty stable format
+2. go back to a serial patch component
+3. keep daily stable cadence and use canaries for same-day fixes
+
+My recommendation is to accept one stable per UTC day unless reality proves otherwise.
+
+### 2. Public package consumers lose semver intent signaling
+
+This is the main downside of CalVer.
+
+If that becomes a problem, one alternative is:
+
+- use CalVer for the CLI package only
+- keep semver for library packages
+
+That is more complex operationally, so I would not start there unless package consumers actually need it.
+
+### 3. Auto-canary means more publish traffic
+
+Publishing on every `master` merge means:
+
+- more npm versions
+- more git tags
+- more registry noise
+
+That is acceptable if canaries stay clearly separate:
+
+- npm dist-tag `canary`
+- no GitHub Release
+- no external announcement
+
+## Rollout Plan
+
+### Phase 1: Security foundation
+
+1. Create `release.yml`
+2. Configure npm trusted publishers for all public packages
+3. Create `npm-canary` and `npm-stable` environments
+4. Add `CODEOWNERS` protection for release files
+5. Verify OIDC publishing works
+6. Disable token-based publishing access and revoke old tokens
+
+### Phase 2: Canary automation
+
+1. Add canary workflow on `push` to `master`
+2. Add explicit calendar-version computation
+3. Add canary git tagging
+4. Remove changelog requirement from canaries
+5. Update `doc/RELEASING.md`
+
+### Phase 3: Stable promotion
+
+1. Add manual stable workflow with `source_ref`
+2. Require stable notes file
+3. Publish stable + tag + GitHub Release
+4. Update rollback docs and scripts
+5. Retire release-branch assumptions
+
+### Phase 4: Cleanup
+
+1. Remove `release-start.sh` from the primary path
+2. Remove `patch/minor/major` from maintainer docs
+3. Decide whether to keep or remove Changesets from publishing
+4. Document the CalVer compatibility contract publicly
+
+## Concrete Recommendation
+
+Paperclip should adopt this model:
+
+- stable versions: `YYYY.M.D`
+- canary versions: `YYYY.M.D-canary.N`
+- canaries auto-published on every push to `master`
+- stables manually promoted from a chosen tested commit or canary tag
+- no release branches in the default path
+- no canary changelog files
+- no canary GitHub Releases
+- no Claude token in GitHub Actions
+- no npm automation token in GitHub Actions
+- npm trusted publishing plus GitHub environments for release security
+
+That gets rid of the annoying part of semver without fighting npm, makes canaries cheap, keeps stables deliberate, and materially improves the security posture of the public repository.
+
+## External References
+
+- npm trusted publishing: https://docs.npmjs.com/trusted-publishers/
+- npm dist-tags: https://docs.npmjs.com/adding-dist-tags-to-packages/
+- npm semantic versioning guidance: https://docs.npmjs.com/about-semantic-versioning/
+- GitHub environments and deployment protection rules: https://docs.github.com/en/actions/how-tos/deploy/configure-and-manage-deployments/manage-environments
+- GitHub secrets behavior for forks: https://docs.github.com/en/actions/how-tos/write-workflows/choose-what-workflows-do/use-secrets
--- a/doc/plans/workspace-product-model-and-work-product.md
+++ b/doc/plans/workspace-product-model-and-work-product.md
--- a/doc/plans/workspace-technical-implementation.md
+++ b/doc/plans/workspace-technical-implementation.md
@@ -0,0 +1,882 @@
+# Workspace Technical Implementation Spec
+
+## Role of This Document
+
+This document translates [workspace-product-model-and-work-product.md](/Users/dotta/paperclip-subissues/doc/plans/workspace-product-model-and-work-product.md) into an implementation-ready engineering plan.
+
+It is intentionally concrete:
+
+- schema and migration shape
+- shared contract updates
+- route and service changes
+- UI changes
+- rollout and compatibility rules
+
+This is the implementation target for the first workspace-aware delivery slice.
+
+## Locked Decisions
+
+These decisions are treated as settled for this implementation:
+
+1. Add a new durable `execution_workspaces` table now.
+2. Each issue has at most one current execution workspace at a time.
+3. `issues` get explicit `project_workspace_id` and `execution_workspace_id`.
+4. Workspace reuse is in scope for V1.
+5. The feature is gated in the UI by `/instance/settings > Experimental > Workspaces`.
+6. The gate is UI-only. Backend model changes and migrations always ship.
+7. Existing users upgrade into compatibility-preserving defaults.
+8. `project_workspaces` evolves in place rather than being replaced.
+9. Work product is issue-first, with optional links to execution workspaces and runtime services.
+10. GitHub is the only PR provider in the first slice.
+11. Both `adapter_managed` and `cloud_sandbox` execution modes are in scope.
+12. Workspace controls ship first inside existing project properties, not in a new global navigation area.
+13. Subissues are out of scope for this implementation slice.
+
+## Non-Goals
+
+- Building a full code review system
+- Solving subissue UX in this slice
+- Implementing reusable shared workspace definitions across projects in this slice
+- Reworking all current runtime service behavior before introducing execution workspaces
+
+## Existing Baseline
+
+The repo already has:
+
+- `project_workspaces`
+- `projects.execution_workspace_policy`
+- `issues.execution_workspace_settings`
+- runtime service persistence in `workspace_runtime_services`
+- local git-worktree realization in `workspace-runtime.ts`
+
+This implementation should build on that baseline rather than fork it.
+
+## Terminology
+
+- `Project workspace`: durable configured codebase/root for a project
+- `Execution workspace`: actual runtime workspace used for one or more issues
+- `Work product`: user-facing output such as PR, preview, branch, commit, artifact, document
+- `Runtime service`: process or service owned or tracked for a workspace
+- `Compatibility mode`: existing behavior preserved for upgraded installs with no explicit workspace opt-in
+
+## Architecture Summary
+
+The first slice should introduce three explicit layers:
+
+1. `Project workspace`
+   - existing durable project-scoped codebase record
+   - extended to support local, git, non-git, and remote-managed shapes
+
+2. `Execution workspace`
+   - new durable runtime record
+   - represents shared, isolated, operator-branch, or remote-managed execution context
+
+3. `Issue work product`
+   - new durable output record
+   - stores PRs, previews, branches, commits, artifacts, and documents
+
+The issue remains the planning and ownership unit.
+The execution workspace remains the runtime unit.
+The work product remains the deliverable/output unit.
+
+## Configuration and Deployment Topology
+
+## Important correction
+
+This repo already uses `PAPERCLIP_DEPLOYMENT_MODE` for auth/deployment behavior (`local_trusted | authenticated`).
+
+Do not overload that variable for workspace execution topology.
+
+## New env var
+
+Add a separate execution-host hint:
+
+- `PAPERCLIP_EXECUTION_TOPOLOGY=local|cloud|hybrid`
+
+Default:
+
+- if unset, treat as `local`
+
+Purpose:
+
+- influences defaults and validation for workspace configuration
+- does not change current auth/deployment semantics
+- does not break existing installs
+
+### Semantics
+
+- `local`
+  - Paperclip may create host-local worktrees, processes, and paths
+- `cloud`
+  - Paperclip should assume no durable host-local execution workspace management
+  - adapter-managed and cloud-sandbox flows should be treated as first-class
+- `hybrid`
+  - both local and remote execution strategies may exist
+
+This is a guardrail and defaulting aid, not a hard policy engine in the first slice.
+
+## Instance Settings
+
+Add a new `Experimental` section under `/instance/settings`.
+
+### New setting
+
+- `experimental.workspaces: boolean`
+
+Rules:
+
+- default `false`
+- UI-only gate
+- stored in instance config or instance settings API response
+- backend routes and migrations remain available even when false
+
+### UI behavior when off
+
+- hide workspace-specific issue controls
+- hide workspace-specific project configuration
+- hide issue `Work Product` tab if it would otherwise be empty
+- do not remove or invalidate any stored workspace data
+
+## Data Model
+
+## 1. Extend `project_workspaces`
+
+Current table exists and should evolve in place.
+
+### New columns
+
+- `source_type text not null default 'local_path'`
+  - `local_path | git_repo | non_git_path | remote_managed`
+- `default_ref text null`
+- `visibility text not null default 'default'`
+  - `default | advanced`
+- `setup_command text null`
+- `cleanup_command text null`
+- `remote_provider text null`
+  - examples: `github`, `openai`, `anthropic`, `custom`
+- `remote_workspace_ref text null`
+- `shared_workspace_key text null`
+  - reserved for future cross-project shared workspace definitions
+
+### Backfill rules
+
+- if existing row has `repo_url`, backfill `source_type='git_repo'`
+- else if existing row has `cwd`, backfill `source_type='local_path'`
+- else backfill `source_type='remote_managed'`
+- copy existing `repo_ref` into `default_ref`
+
+### Indexes
+
+- retain current indexes
+- add `(project_id, source_type)`
+- add `(company_id, shared_workspace_key)` non-unique for future support
+
+## 2. Add `execution_workspaces`
+
+Create a new durable table.
+
+### Columns
+
+- `id uuid pk`
+- `company_id uuid not null`
+- `project_id uuid not null`
+- `project_workspace_id uuid null`
+- `source_issue_id uuid null`
+- `mode text not null`
+  - `shared_workspace | isolated_workspace | operator_branch | adapter_managed | cloud_sandbox`
+- `strategy_type text not null`
+  - `project_primary | git_worktree | adapter_managed | cloud_sandbox`
+- `name text not null`
+- `status text not null default 'active'`
+  - `active | idle | in_review | archived | cleanup_failed`
+- `cwd text null`
+- `repo_url text null`
+- `base_ref text null`
+- `branch_name text null`
+- `provider_type text not null default 'local_fs'`
+  - `local_fs | git_worktree | adapter_managed | cloud_sandbox`
+- `provider_ref text null`
+- `derived_from_execution_workspace_id uuid null`
+- `last_used_at timestamptz not null default now()`
+- `opened_at timestamptz not null default now()`
+- `closed_at timestamptz null`
+- `cleanup_eligible_at timestamptz null`
+- `cleanup_reason text null`
+- `metadata jsonb null`
+- `created_at timestamptz not null default now()`
+- `updated_at timestamptz not null default now()`
+
+### Foreign keys
+
+- `company_id -> companies.id`
+- `project_id -> projects.id`
+- `project_workspace_id -> project_workspaces.id on delete set null`
+- `source_issue_id -> issues.id on delete set null`
+- `derived_from_execution_workspace_id -> execution_workspaces.id on delete set null`
+
+### Indexes
+
+- `(company_id, project_id, status)`
+- `(company_id, project_workspace_id, status)`
+- `(company_id, source_issue_id)`
+- `(company_id, last_used_at desc)`
+- `(company_id, branch_name)` non-unique
+
+## 3. Extend `issues`
+
+Add explicit workspace linkage.
+
+### New columns
+
+- `project_workspace_id uuid null`
+- `execution_workspace_id uuid null`
+- `execution_workspace_preference text null`
+  - `inherit | shared_workspace | isolated_workspace | operator_branch | reuse_existing`
+
+### Foreign keys
+
+- `project_workspace_id -> project_workspaces.id on delete set null`
+- `execution_workspace_id -> execution_workspaces.id on delete set null`
+
+### Backfill rules
+
+- all existing issues get null values
+- null should be interpreted as compatibility/inherit behavior
+
+### Invariants
+
+- if `project_workspace_id` is set, it must belong to the issue's project and company
+- if `execution_workspace_id` is set, it must belong to the issue's company
+- if `execution_workspace_id` is set, the referenced workspace's `project_id` must match the issue's `project_id`
+
+## 4. Add `issue_work_products`
+
+Create a new durable table for outputs.
+
+### Columns
+
+- `id uuid pk`
+- `company_id uuid not null`
+- `project_id uuid null`
+- `issue_id uuid not null`
+- `execution_workspace_id uuid null`
+- `runtime_service_id uuid null`
+- `type text not null`
+  - `preview_url | runtime_service | pull_request | branch | commit | artifact | document`
+- `provider text not null`
+  - `paperclip | github | vercel | s3 | custom`
+- `external_id text null`
+- `title text not null`
+- `url text null`
+- `status text not null`
+  - `active | ready_for_review | approved | changes_requested | merged | closed | failed | archived`
+- `review_state text not null default 'none'`
+  - `none | needs_board_review | approved | changes_requested`
+- `is_primary boolean not null default false`
+- `health_status text not null default 'unknown'`
+  - `unknown | healthy | unhealthy`
+- `summary text null`
+- `metadata jsonb null`
+- `created_by_run_id uuid null`
+- `created_at timestamptz not null default now()`
+- `updated_at timestamptz not null default now()`
+
+### Foreign keys
+
+- `company_id -> companies.id`
+- `project_id -> projects.id on delete set null`
+- `issue_id -> issues.id on delete cascade`
+- `execution_workspace_id -> execution_workspaces.id on delete set null`
+- `runtime_service_id -> workspace_runtime_services.id on delete set null`
+- `created_by_run_id -> heartbeat_runs.id on delete set null`
+
+### Indexes
+
+- `(company_id, issue_id, type)`
+- `(company_id, execution_workspace_id, type)`
+- `(company_id, provider, external_id)`
+- `(company_id, updated_at desc)`
+
+## 5. Extend `workspace_runtime_services`
+
+This table already exists and should remain the system of record for owned/tracked services.
+
+### New column
+
+- `execution_workspace_id uuid null`
+
+### Foreign key
+
+- `execution_workspace_id -> execution_workspaces.id on delete set null`
+
+### Behavior
+
+- runtime services remain workspace-first
+- issue UIs should surface them through linked execution workspaces and work products
+
+## Shared Contracts
+
+## 1. `packages/shared`
+
+### Update project workspace types and validators
+
+Add fields:
+
+- `sourceType`
+- `defaultRef`
+- `visibility`
+- `setupCommand`
+- `cleanupCommand`
+- `remoteProvider`
+- `remoteWorkspaceRef`
+- `sharedWorkspaceKey`
+
+### Add execution workspace types and validators
+
+New shared types:
+
+- `ExecutionWorkspace`
+- `ExecutionWorkspaceMode`
+- `ExecutionWorkspaceStatus`
+- `ExecutionWorkspaceProviderType`
+
+### Add work product types and validators
+
+New shared types:
+
+- `IssueWorkProduct`
+- `IssueWorkProductType`
+- `IssueWorkProductStatus`
+- `IssueWorkProductReviewState`
+
+### Update issue types and validators
+
+Add:
+
+- `projectWorkspaceId`
+- `executionWorkspaceId`
+- `executionWorkspacePreference`
+- `workProducts?: IssueWorkProduct[]`
+
+### Extend project execution policy contract
+
+Replace the current narrow policy with a more explicit shape:
+
+- `enabled`
+- `defaultMode`
+  - `shared_workspace | isolated_workspace | operator_branch | adapter_default`
+- `allowIssueOverride`
+- `defaultProjectWorkspaceId`
+- `workspaceStrategy`
+- `branchPolicy`
+- `pullRequestPolicy`
+- `runtimePolicy`
+- `cleanupPolicy`
+
+Do not try to encode every possible provider-specific field in V1. Keep provider-specific extensibility in nested JSON where needed.
+
+## Service Layer Changes
+
+## 1. Project service
+
+Update project workspace CRUD to handle the extended schema.
+
+### Required rules
+
+- when setting a primary workspace, clear `is_primary` on siblings
+- `source_type=remote_managed` may have null `cwd`
+- local/git-backed workspaces should still require one of `cwd` or `repo_url`
+- preserve current behavior for existing callers that only send `cwd/repoUrl/repoRef`
+
+## 2. Issue service
+
+Update create/update flows to handle explicit workspace binding.
+
+### Create behavior
+
+Resolve defaults in this order:
+
+1. explicit `projectWorkspaceId` from request
+2. `project.executionWorkspacePolicy.defaultProjectWorkspaceId`
+3. project's primary workspace
+4. null
+
+Resolve `executionWorkspacePreference`:
+
+1. explicit request field
+2. project policy default
+3. compatibility fallback to `inherit`
+
+Do not create an execution workspace at issue creation time unless:
+
+- `reuse_existing` is explicitly chosen and `executionWorkspaceId` is provided
+
+Otherwise, workspace realization happens when execution starts.
+
+### Update behavior
+
+- allow changing `projectWorkspaceId` only if the workspace belongs to the same project
+- allow setting `executionWorkspaceId` only if it belongs to the same company and project
+- do not automatically destroy or relink historical work products when workspace linkage changes
+
+## 3. Workspace realization service
+
+Refactor `workspace-runtime.ts` so realization produces or reuses an `execution_workspaces` row.
+
+### New flow
+
+Input:
+
+- issue
+- project workspace
+- project execution policy
+- execution topology hint
+- adapter/runtime configuration
+
+Output:
+
+- realized execution workspace record
+- runtime cwd/provider metadata
+
+### Required modes
+
+- `shared_workspace`
+  - reuse a stable execution workspace representing the project primary/shared workspace
+- `isolated_workspace`
+  - create or reuse a derived isolated execution workspace
+- `operator_branch`
+  - create or reuse a long-lived branch workspace
+- `adapter_managed`
+  - create an execution workspace with provider references and optional null `cwd`
+- `cloud_sandbox`
+  - same as adapter-managed, but explicit remote sandbox semantics
+
+### Reuse rules
+
+When `reuse_existing` is requested:
+
+- only list active or recently used execution workspaces
+- only for the same project
+- only for the same project workspace if one is specified
+- exclude archived and cleanup-failed workspaces
+
+### Shared workspace realization
+
+For compatibility mode and shared-workspace projects:
+
+- create a stable execution workspace per project workspace when first needed
+- reuse it for subsequent runs
+
+This avoids a special-case branch in later work product linkage.
+
+## 4. Runtime service integration
+
+When runtime services are started or reused:
+
+- populate `execution_workspace_id`
+- continue populating `project_workspace_id`, `project_id`, and `issue_id`
+
+When a runtime service yields a URL:
+
+- optionally create or update a linked `issue_work_products` row of type `runtime_service` or `preview_url`
+
+## 5. PR and preview reporting
+
+Add a service for creating/updating `issue_work_products`.
+
+### Supported V1 product types
+
+- `pull_request`
+- `preview_url`
+- `runtime_service`
+- `branch`
+- `commit`
+- `artifact`
+- `document`
+
+### GitHub PR reporting
+
+For V1, GitHub is the only provider with richer semantics.
+
+Supported statuses:
+
+- `draft`
+- `ready_for_review`
+- `approved`
+- `changes_requested`
+- `merged`
+- `closed`
+
+Represent these in `status` and `review_state` rather than inventing a separate PR table in V1.
+
+## Routes and API
+
+## 1. Project workspace routes
+
+Extend existing routes:
+
+- `GET /projects/:id/workspaces`
+- `POST /projects/:id/workspaces`
+- `PATCH /projects/:id/workspaces/:workspaceId`
+- `DELETE /projects/:id/workspaces/:workspaceId`
+
+### New accepted/returned fields
+
+- `sourceType`
+- `defaultRef`
+- `visibility`
+- `setupCommand`
+- `cleanupCommand`
+- `remoteProvider`
+- `remoteWorkspaceRef`
+
+## 2. Execution workspace routes
+
+Add:
+
+- `GET /companies/:companyId/execution-workspaces`
+  - filters:
+    - `projectId`
+    - `projectWorkspaceId`
+    - `status`
+    - `issueId`
+    - `reuseEligible=true`
+- `GET /execution-workspaces/:id`
+- `PATCH /execution-workspaces/:id`
+  - update status/metadata/cleanup fields only in V1
+
+Do not add top-level navigation for these routes yet.
+
+## 3. Work product routes
+
+Add:
+
+- `GET /issues/:id/work-products`
+- `POST /issues/:id/work-products`
+- `PATCH /work-products/:id`
+- `DELETE /work-products/:id`
+
+### V1 mutation permissions
+
+- board can create/update/delete all
+- agents can create/update for issues they are assigned or currently executing
+- deletion should generally archive rather than hard-delete once linked to historical output
+
+## 4. Issue routes
+
+Extend existing create/update payloads to accept:
+
+- `projectWorkspaceId`
+- `executionWorkspacePreference`
+- `executionWorkspaceId`
+
+Extend `GET /issues/:id` to return:
+
+- `projectWorkspaceId`
+- `executionWorkspaceId`
+- `executionWorkspacePreference`
+- `currentExecutionWorkspace`
+- `workProducts[]`
+
+## 5. Instance settings routes
+
+Add support for:
+
+- reading/writing `experimental.workspaces`
+
+This is a UI gate only.
+
+If there is no generic instance settings storage yet, the first slice can store this in the existing config/instance settings mechanism used by `/instance/settings`.
+
+## UI Changes
+
+## 1. `/instance/settings`
+
+Add section:
+
+- `Experimental`
+  - `Enable Workspaces`
+
+When off:
+
+- hide new workspace-specific affordances
+- do not alter existing project or issue behavior
+
+## 2. Project properties
+
+Do not create a separate `Code` tab yet.
+Ship inside existing project properties first.
+
+### Add or re-enable sections
+
+- `Project Workspaces`
+- `Execution Defaults`
+- `Provisioning`
+- `Pull Requests`
+- `Previews and Runtime`
+- `Cleanup`
+
+### Display rules
+
+- only show when `experimental.workspaces=true`
+- keep wording generic enough for local and remote setups
+- only show git-specific fields when `sourceType=git_repo`
+- only show local-path-specific fields when not `remote_managed`
+
+## 3. Issue create dialog
+
+When the workspace experimental flag is on and the selected project has workspace automation or workspaces:
+
+### Basic fields
+
+- `Codebase`
+  - select from project workspaces
+  - default to policy default or primary workspace
+- `Execution mode`
+  - `Project default`
+  - `Shared workspace`
+  - `Isolated workspace`
+  - `Operator branch`
+
+### Advanced section
+
+- `Reuse existing execution workspace`
+
+This control should query only:
+
+- same project
+- same codebase if selected
+- active/recent workspaces
+- compact labels with branch or workspace name
+
+Do not expose all execution workspaces in a noisy unfiltered list.
+
+## 4. Issue detail
+
+Add a `Work Product` tab when:
+
+- the experimental flag is on, or
+- the issue already has work products
+
+### Show
+
+- current execution workspace summary
+- PR cards
+- preview cards
+- branch/commit rows
+- artifacts/documents
+
+Add compact header chips:
+
+- codebase
+- workspace
+- PR count/status
+- preview status
+
+## 5. Execution workspace detail page
+
+Add a detail route but no nav item.
+
+Linked from:
+
+- issue work product tab
+- project workspace/execution panels
+
+### Show
+
+- identity and status
+- project workspace origin
+- source issue
+- linked issues
+- branch/ref/provider info
+- runtime services
+- work products
+- cleanup state
+
+## Runtime and Adapter Behavior
+
+## 1. Local adapters
+
+For local adapters:
+
+- continue to use existing cwd/worktree realization paths
+- persist the result as execution workspaces
+- attach runtime services and work product to the execution workspace and issue
+
+## 2. Remote or cloud adapters
+
+For remote adapters:
+
+- allow execution workspaces with null `cwd`
+- require provider metadata sufficient to identify the remote workspace/session
+- allow work product creation without any host-local process ownership
+
+Examples:
+
+- cloud coding agent opens a branch and PR on GitHub
+- Vercel preview URL is reported back as a preview work product
+- remote sandbox emits artifact URLs
+
+## 3. Approval-aware PR workflow
+
+V1 should support richer PR state tracking, but not a full review engine.
+
+### Required actions
+
+- `open_pr`
+- `mark_ready`
+
+### Required review states
+
+- `draft`
+- `ready_for_review`
+- `approved`
+- `changes_requested`
+- `merged`
+- `closed`
+
+### Storage approach
+
+- represent these as `issue_work_products` with `type='pull_request'`
+- use `status` and `review_state`
+- store provider-specific details in `metadata`
+
+## Migration Plan
+
+## 1. Existing installs
+
+The migration posture is backward-compatible by default.
+
+### Guarantees
+
+- no existing project must be edited before it keeps working
+- no existing issue flow should start requiring workspace input
+- all new nullable columns must preserve current behavior when absent
+
+## 2. Project workspace migration
+
+Migrate `project_workspaces` in place.
+
+### Backfill
+
+- derive `source_type`
+- copy `repo_ref` to `default_ref`
+- leave new optional fields null
+
+## 3. Issue migration
+
+Do not backfill `project_workspace_id` or `execution_workspace_id` on all existing issues.
+
+Reason:
+
+- the safest migration is to preserve current runtime behavior and bind explicitly only when new workspace-aware flows are used
+
+Interpret old issues as:
+
+- `executionWorkspacePreference = inherit`
+- compatibility/shared behavior
+
+## 4. Runtime history migration
+
+Do not attempt a perfect historical reconstruction of execution workspaces in the migration itself.
+
+Instead:
+
+- create execution workspace records forward from first new run
+- optionally add a later backfill tool for recent runtime services if it proves valuable
+
+## Rollout Order
+
+## Phase 1: Schema and shared contracts
+
+1. extend `project_workspaces`
+2. add `execution_workspaces`
+3. add `issue_work_products`
+4. extend `issues`
+5. extend `workspace_runtime_services`
+6. update shared types and validators
+
+## Phase 2: Service wiring
+
+1. update project workspace CRUD
+2. update issue create/update resolution
+3. refactor workspace realization to persist execution workspaces
+4. attach runtime services to execution workspaces
+5. add work product service and persistence
+
+## Phase 3: API and UI
+
+1. add execution workspace routes
+2. add work product routes
+3. add instance experimental settings toggle
+4. re-enable and revise project workspace UI behind the flag
+5. add issue create/update controls behind the flag
+6. add issue work product tab
+7. add execution workspace detail page
+
+## Phase 4: Provider integrations
+
+1. GitHub PR reporting
+2. preview URL reporting
+3. runtime-service-to-work-product linking
+4. remote/cloud provider references
+
+## Acceptance Criteria
+
+1. Existing installs continue to behave predictably with no required reconfiguration.
+2. Projects can define local, git, non-git, and remote-managed project workspaces.
+3. Issues can explicitly select a project workspace and execution preference.
+4. Each issue can point to one current execution workspace.
+5. Multiple issues can intentionally reuse the same execution workspace.
+6. Execution workspaces are persisted for both local and remote execution flows.
+7. Work products can be attached to issues with optional execution workspace linkage.
+8. GitHub PRs can be represented with richer lifecycle states.
+9. The main UI remains simple when the experimental flag is off.
+10. No top-level workspace navigation is required for this first slice.
+
+## Risks and Mitigations
+
+## Risk: too many overlapping workspace concepts
+
+Mitigation:
+
+- keep issue UI to `Codebase` and `Execution mode`
+- reserve execution workspace details for advanced pages
+
+## Risk: breaking current projects on upgrade
+
+Mitigation:
+
+- nullable schema additions
+- in-place `project_workspaces` migration
+- compatibility defaults
+
+## Risk: local-only assumptions leaking into cloud mode
+
+Mitigation:
+
+- make `cwd` optional for execution workspaces
+- use `provider_type` and `provider_ref`
+- use `PAPERCLIP_EXECUTION_TOPOLOGY` as a defaulting guardrail
+
+## Risk: turning PRs into a bespoke subsystem too early
+
+Mitigation:
+
+- represent PRs as work products in V1
+- keep provider-specific details in metadata
+- defer a dedicated PR table unless usage proves it necessary
+
+## Recommended First Engineering Slice
+
+If we want the narrowest useful implementation:
+
+1. extend `project_workspaces`
+2. add `execution_workspaces`
+3. extend `issues` with explicit workspace fields
+4. persist execution workspaces from existing local workspace realization
+5. add `issue_work_products`
+6. show project workspace controls and issue workspace controls behind the experimental flag
+7. add issue `Work Product` tab with PR/preview/runtime service display
+
+This slice is enough to validate the model without yet building every provider integration or cleanup workflow.
--- a/doc/plugins/PLUGIN_AUTHORING_GUIDE.md
+++ b/doc/plugins/PLUGIN_AUTHORING_GUIDE.md
@@ -0,0 +1,155 @@
+# Plugin Authoring Guide
+
+This guide describes the current, implemented way to create a Paperclip plugin in this repo.
+
+It is intentionally narrower than [PLUGIN_SPEC.md](./PLUGIN_SPEC.md). The spec includes future ideas; this guide only covers the alpha surface that exists now.
+
+## Current reality
+
+- Treat plugin workers and plugin UI as trusted code.
+- Plugin UI runs as same-origin JavaScript inside the main Paperclip app.
+- Worker-side host APIs are capability-gated.
+- Plugin UI is not sandboxed by manifest capabilities.
+- There is no host-provided shared React component kit for plugins yet.
+- `ctx.assets` is not supported in the current runtime.
+
+## Scaffold a plugin
+
+Use the scaffold package:
+
+```bash
+pnpm --filter @paperclipai/create-paperclip-plugin build
+node packages/plugins/create-paperclip-plugin/dist/index.js @yourscope/plugin-name --output ./packages/plugins/examples
+```
+
+For a plugin that lives outside the Paperclip repo:
+
+```bash
+pnpm --filter @paperclipai/create-paperclip-plugin build
+node packages/plugins/create-paperclip-plugin/dist/index.js @yourscope/plugin-name \
+  --output /absolute/path/to/plugin-repos \
+  --sdk-path /absolute/path/to/paperclip/packages/plugins/sdk
+```
+
+That creates a package with:
+
+- `src/manifest.ts`
+- `src/worker.ts`
+- `src/ui/index.tsx`
+- `tests/plugin.spec.ts`
+- `esbuild.config.mjs`
+- `rollup.config.mjs`
+
+Inside this monorepo, the scaffold uses `workspace:*` for `@paperclipai/plugin-sdk`.
+
+Outside this monorepo, the scaffold snapshots `@paperclipai/plugin-sdk` from the local Paperclip checkout into a `.paperclip-sdk/` tarball so you can build and test a plugin without publishing anything to npm first.
+
+## Recommended local workflow
+
+From the generated plugin folder:
+
+```bash
+pnpm install
+pnpm typecheck
+pnpm test
+pnpm build
+```
+
+For local development, install it into Paperclip from an absolute local path through the plugin manager or API. The server supports local filesystem installs and watches local-path plugins for file changes so worker restarts happen automatically after rebuilds.
+
+Example:
+
+```bash
+curl -X POST http://127.0.0.1:3100/api/plugins/install \
+  -H "Content-Type: application/json" \
+  -d '{"packageName":"/absolute/path/to/your-plugin","isLocalPath":true}'
+```
+
+## Supported alpha surface
+
+Worker:
+
+- config
+- events
+- jobs
+- launchers
+- http
+- secrets
+- activity
+- state
+- entities
+- projects and project workspaces
+- companies
+- issues and comments
+- agents and agent sessions
+- goals
+- data/actions
+- streams
+- tools
+- metrics
+- logger
+
+UI:
+
+- `usePluginData`
+- `usePluginAction`
+- `usePluginStream`
+- `usePluginToast`
+- `useHostContext`
+- typed slot props from `@paperclipai/plugin-sdk/ui`
+
+Mount surfaces currently wired in the host include:
+
+- `page`
+- `settingsPage`
+- `dashboardWidget`
+- `sidebar`
+- `sidebarPanel`
+- `detailTab`
+- `taskDetailView`
+- `projectSidebarItem`
+- `globalToolbarButton`
+- `toolbarButton`
+- `contextMenuItem`
+- `commentAnnotation`
+- `commentContextMenuItem`
+
+## Company routes
+
+Plugins may declare a `page` slot with `routePath` to own a company route like:
+
+```text
+/:companyPrefix/<routePath>
+```
+
+Rules:
+
+- `routePath` must be a single lowercase slug
+- it cannot collide with reserved host routes
+- it cannot duplicate another installed plugin page route
+
+## Publishing guidance
+
+- Use npm packages as the deployment artifact.
+- Treat repo-local example installs as a development workflow only.
+- Prefer keeping plugin UI self-contained inside the package.
+- Do not rely on host design-system components or undocumented app internals.
+- GitHub repository installs are not a first-class workflow today. For local development, use a checked-out local path. For production, publish to npm or a private npm-compatible registry.
+
+## Verification before handoff
+
+At minimum:
+
+```bash
+pnpm --filter <your-plugin-package> typecheck
+pnpm --filter <your-plugin-package> test
+pnpm --filter <your-plugin-package> build
+```
+
+If you changed host integration too, also run:
+
+```bash
+pnpm -r typecheck
+pnpm test:run
+pnpm build
+```
--- a/doc/plugins/PLUGIN_SPEC.md
+++ b/doc/plugins/PLUGIN_SPEC.md
@@ -8,6 +8,29 @@ It expands the brief plugin notes in [doc/SPEC.md](../SPEC.md) and should be rea
 This is not part of the V1 implementation contract in [doc/SPEC-implementation.md](../SPEC-implementation.md).
 It is the full target architecture for the plugin system that should follow V1.

+## Current implementation caveats
+
+The code in this repo now includes an early plugin runtime and admin UI, but it does not yet deliver the full deployment model described in this spec.
+
+Today, the practical deployment model is:
+
+- single-tenant
+- self-hosted
+- single-node or otherwise filesystem-persistent
+
+Current limitations to keep in mind:
+
+- Plugin UI bundles currently run as same-origin JavaScript inside the main Paperclip app. Treat plugin UI as trusted code, not a sandboxed frontend capability boundary.
+- Manifest capabilities currently gate worker-side host RPC calls. They do not prevent plugin UI code from calling ordinary Paperclip HTTP APIs directly.
+- Runtime installs assume a writable local filesystem for the plugin package directory and plugin data directory.
+- Runtime npm installs assume `npm` is available in the running environment and that the host can reach the configured package registry.
+- Published npm packages are the intended install artifact for deployed plugins.
+- The repo example plugins under `packages/plugins/examples/` are development conveniences. They work from a source checkout and should not be assumed to exist in a generic published build unless they are explicitly shipped with that build.
+- Dynamic plugin install is not yet cloud-ready for horizontally scaled or ephemeral deployments. There is no shared artifact store, install coordination, or cross-node distribution layer yet.
+- The current runtime does not yet ship a real host-provided plugin UI component kit, and it does not support plugin asset uploads/reads. Treat those as future-scope ideas in this spec, not current implementation promises.
+
+In practice, that means the current implementation is a good fit for local development and self-hosted persistent deployments, but not yet for multi-instance cloud plugin distribution.
+
 ## 1. Scope

 This spec covers:
@@ -212,6 +235,8 @@ Suggested layout:

 The package install directory and the plugin data directory are separate.

+This on-disk model is the reason the current implementation expects a persistent writable host filesystem. Cloud-safe artifact replication is future work.
+
 ## 8.2 Operator Commands

 Paperclip should add CLI commands:
@@ -237,6 +262,8 @@ The install process is:
 7. Start plugin worker and run health/validation.
 8. Mark plugin `ready` or `error`.

+For the current implementation, this install flow should be read as a single-host workflow. A successful install writes packages to the local host, and other app nodes will not automatically receive that plugin unless a future shared distribution mechanism is added.
+
 ## 9. Load Order And Precedence

 Load order must be deterministic.
--- a/docker-compose.untrusted-review.yml
+++ b/docker-compose.untrusted-review.yml
@@ -0,0 +1,33 @@
+services:
+  review:
+    build:
+      context: .
+      dockerfile: docker/untrusted-review/Dockerfile
+    init: true
+    tty: true
+    stdin_open: true
+    working_dir: /work
+    environment:
+      HOME: "/home/reviewer"
+      CODEX_HOME: "/home/reviewer/.codex"
+      CLAUDE_HOME: "/home/reviewer/.claude"
+      PAPERCLIP_HOME: "/home/reviewer/.paperclip-review"
+      OPENAI_API_KEY: "${OPENAI_API_KEY:-}"
+      ANTHROPIC_API_KEY: "${ANTHROPIC_API_KEY:-}"
+      GITHUB_TOKEN: "${GITHUB_TOKEN:-}"
+    ports:
+      - "${REVIEW_PAPERCLIP_PORT:-3100}:3100"
+      - "${REVIEW_VITE_PORT:-5173}:5173"
+    volumes:
+      - review-home:/home/reviewer
+      - review-work:/work
+    cap_drop:
+      - ALL
+    security_opt:
+      - no-new-privileges:true
+    tmpfs:
+      - /tmp:mode=1777,size=1g
+
+volumes:
+  review-home:
+  review-work:
--- a/docker/untrusted-review/Dockerfile
+++ b/docker/untrusted-review/Dockerfile
@@ -0,0 +1,44 @@
+FROM node:lts-trixie-slim
+
+RUN apt-get update \
+  && apt-get install -y --no-install-recommends \
+    bash \
+    ca-certificates \
+    curl \
+    fd-find \
+    gh \
+    git \
+    jq \
+    less \
+    openssh-client \
+    procps \
+    ripgrep \
+  && rm -rf /var/lib/apt/lists/*
+
+RUN ln -sf /usr/bin/fdfind /usr/local/bin/fd
+
+RUN corepack enable \
+  && npm install --global --omit=dev @anthropic-ai/claude-code@latest @openai/codex@latest
+
+RUN useradd --create-home --shell /bin/bash reviewer
+
+ENV HOME=/home/reviewer \
+  CODEX_HOME=/home/reviewer/.codex \
+  CLAUDE_HOME=/home/reviewer/.claude \
+  PAPERCLIP_HOME=/home/reviewer/.paperclip-review \
+  PNPM_HOME=/home/reviewer/.local/share/pnpm \
+  PATH=/home/reviewer/.local/share/pnpm:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
+
+WORKDIR /work
+
+COPY --chown=reviewer:reviewer docker/untrusted-review/bin/review-checkout-pr /usr/local/bin/review-checkout-pr
+
+RUN chmod +x /usr/local/bin/review-checkout-pr \
+  && mkdir -p /work \
+  && chown -R reviewer:reviewer /work
+
+USER reviewer
+
+EXPOSE 3100 5173
+
+CMD ["bash", "-l"]
--- a/docker/untrusted-review/bin/review-checkout-pr
+++ b/docker/untrusted-review/bin/review-checkout-pr
@@ -0,0 +1,65 @@
+#!/usr/bin/env bash
+set -euo pipefail
+
+usage() {
+  cat <<'EOF'
+Usage: review-checkout-pr <owner/repo|github-url> <pr-number> [checkout-dir]
+
+Examples:
+  review-checkout-pr paperclipai/paperclip 432
+  review-checkout-pr https://github.com/paperclipai/paperclip.git 432
+EOF
+}
+
+if [[ $# -lt 2 || $# -gt 3 ]]; then
+  usage >&2
+  exit 1
+fi
+
+normalize_repo_slug() {
+  local raw="$1"
+  raw="${raw#git@github.com:}"
+  raw="${raw#ssh://git@github.com/}"
+  raw="${raw#https://github.com/}"
+  raw="${raw#http://github.com/}"
+  raw="${raw%.git}"
+  printf '%s\n' "${raw#/}"
+}
+
+repo_slug="$(normalize_repo_slug "$1")"
+pr_number="$2"
+
+if [[ ! "$repo_slug" =~ ^[^/]+/[^/]+$ ]]; then
+  echo "Expected GitHub repo slug like owner/repo or a GitHub repo URL, got: $1" >&2
+  exit 1
+fi
+
+if [[ ! "$pr_number" =~ ^[0-9]+$ ]]; then
+  echo "PR number must be numeric, got: $pr_number" >&2
+  exit 1
+fi
+
+repo_key="${repo_slug//\//-}"
+mirror_dir="/work/repos/${repo_key}"
+checkout_dir="${3:-/work/checkouts/${repo_key}/pr-${pr_number}}"
+pr_ref="refs/remotes/origin/pr/${pr_number}"
+
+mkdir -p "$(dirname "$mirror_dir")" "$(dirname "$checkout_dir")"
+
+if [[ ! -d "$mirror_dir/.git" ]]; then
+  if command -v gh >/dev/null 2>&1; then
+    gh repo clone "$repo_slug" "$mirror_dir" -- --filter=blob:none
+  else
+    git clone --filter=blob:none "https://github.com/${repo_slug}.git" "$mirror_dir"
+  fi
+fi
+
+git -C "$mirror_dir" fetch --force origin "pull/${pr_number}/head:${pr_ref}"
+
+if [[ -e "$checkout_dir" ]]; then
+  printf '%s\n' "$checkout_dir"
+  exit 0
+fi
+
+git -C "$mirror_dir" worktree add --detach "$checkout_dir" "$pr_ref" >/dev/null
+printf '%s\n' "$checkout_dir"
--- a/docs/adapters/codex-local.md
+++ b/docs/adapters/codex-local.md
@@ -30,6 +30,8 @@ Codex uses `previous_response_id` for session continuity. The adapter serializes

 The adapter symlinks Paperclip skills into the global Codex skills directory (`~/.codex/skills`). Existing user skills are not overwritten.

+When Paperclip is running inside a managed worktree instance (`PAPERCLIP_IN_WORKTREE=true`), the adapter instead uses a worktree-isolated `CODEX_HOME` under the Paperclip instance so Codex skills, sessions, logs, and other runtime state do not leak across checkouts. It seeds that isolated home from the user's main Codex home for shared auth/config continuity.
+
 For manual local CLI usage outside heartbeat runs (for example running as `codexcoder` directly), use:

 ```sh
--- a/docs/adapters/creating-an-adapter.md
+++ b/docs/adapters/creating-an-adapter.md
@@ -6,7 +6,7 @@ summary: Guide to building a custom adapter
 Build a custom adapter to connect Paperclip to any agent runtime.

 <Tip>
-If you're using Claude Code, the `create-agent-adapter` skill can guide you through the full adapter creation process interactively. Just ask Claude to create a new adapter and it will walk you through each step.
+If you're using Claude Code, the `.agents/skills/create-agent-adapter` skill can guide you through the full adapter creation process interactively. Just ask Claude to create a new adapter and it will walk you through each step.
 </Tip>

 ## Package Structure
--- a/docs/api/companies.md
+++ b/docs/api/companies.md
@@ -38,10 +38,33 @@ PATCH /api/companies/{companyId}
 {
  "name": "Updated Name",
  "description": "Updated description",
-  "budgetMonthlyCents": 100000
+  "budgetMonthlyCents": 100000,
+  "logoAssetId": "b9f5e911-6de5-4cd0-8dc6-a55a13bc02f6"
 }
 ```

+## Upload Company Logo
+
+Upload an image for a company icon and store it as that company’s logo.
+
+```
+POST /api/companies/{companyId}/logo
+Content-Type: multipart/form-data
+```
+
+Valid image content types:
+
+- `image/png`
+- `image/jpeg`
+- `image/jpg`
+- `image/webp`
+- `image/gif`
+- `image/svg+xml`
+
+Company logo uploads use the normal Paperclip attachment size limit.
+
+Then set the company logo by PATCHing the returned `assetId` into `logoAssetId`.
+
 ## Archive Company

 ```
@@ -58,6 +81,8 @@ Archives a company. Archived companies are hidden from default listings.
 | `name` | string | Company name |
 | `description` | string | Company description |
 | `status` | string | `active`, `paused`, `archived` |
+| `logoAssetId` | string | Optional asset id for the stored logo image |
+| `logoUrl` | string | Optional Paperclip asset content path for the stored logo image |
 | `budgetMonthlyCents` | number | Monthly budget limit |
 | `createdAt` | string | ISO timestamp |
 | `updatedAt` | string | ISO timestamp |
--- a/docs/api/issues.md
+++ b/docs/api/issues.md
@@ -1,9 +1,9 @@
 ---
 title: Issues
-summary: Issue CRUD, checkout/release, comments, and attachments
+summary: Issue CRUD, checkout/release, comments, documents, and attachments
 ---

-Issues are the unit of work in Paperclip. They support hierarchical relationships, atomic checkout, comments, and file attachments.
+Issues are the unit of work in Paperclip. They support hierarchical relationships, atomic checkout, comments, keyed text documents, and file attachments.

 ## List Issues

@@ -29,6 +29,12 @@ GET /api/issues/{issueId}

 Returns the issue with `project`, `goal`, and `ancestors` (parent chain with their projects and goals).

+The response also includes:
+
+- `planDocument`: the full text of the issue document with key `plan`, when present
+- `documentSummaries`: metadata for all linked issue documents
+- `legacyPlanDocument`: a read-only fallback when the description still contains an old `<plan>` block
+
 ## Create Issue

 ```
@@ -100,6 +106,54 @@ POST /api/issues/{issueId}/comments

@-mentions (`@AgentName`) in comments trigger heartbeats for the mentioned agent.

+## Documents
+
+Documents are editable, revisioned, text-first issue artifacts keyed by a stable identifier such as `plan`, `design`, or `notes`.
+
+### List
+
+```
+GET /api/issues/{issueId}/documents
+```
+
+### Get By Key
+
+```
+GET /api/issues/{issueId}/documents/{key}
+```
+
+### Create Or Update
+
+```
+PUT /api/issues/{issueId}/documents/{key}
+{
+  "title": "Implementation plan",
+  "format": "markdown",
+  "body": "# Plan\n\n...",
+  "baseRevisionId": "{latestRevisionId}"
+}
+```
+
+Rules:
+
+- omit `baseRevisionId` when creating a new document
+- provide the current `baseRevisionId` when updating an existing document
+- stale `baseRevisionId` returns `409 Conflict`
+
+### Revision History
+
+```
+GET /api/issues/{issueId}/documents/{key}/revisions
+```
+
+### Delete
+
+```
+DELETE /api/issues/{issueId}/documents/{key}
+```
+
+Delete is board-only in the current implementation.
+
 ## Attachments

 ### Upload
--- a/docs/plans/2026-03-13-issue-documents-plan.md
+++ b/docs/plans/2026-03-13-issue-documents-plan.md
@@ -0,0 +1,569 @@
+# Issue Documents Plan
+
+Status: Draft  
+Owner: Backend + UI + Agent Protocol  
+Date: 2026-03-13  
+Primary issue: `PAP-448`
+
+## Summary
+
+Add first-class **documents** to Paperclip as editable, revisioned, company-scoped text artifacts that can be linked to issues.
+
+The first required convention is a document with key `plan`.
+
+This solves the immediate workflow problem in `PAP-448`:
+
+- plans should stop living inside issue descriptions as `<plan>` blocks
+- agents and board users should be able to create/update issue documents directly
+- `GET /api/issues/:id` should include the full `plan` document and expose the other available documents
+- issue detail should render documents under the description
+
+This should be built as the **text-document slice** of the broader artifact system, not as a replacement for attachments/assets.
+
+## Recommended Product Shape
+
+### Documents vs attachments vs artifacts
+
+- **Documents**: editable text content with stable keys and revision history.
+- **Attachments**: uploaded/generated opaque files backed by storage (`assets` + `issue_attachments`).
+- **Artifacts**: later umbrella/read-model that can unify documents, attachments, previews, and workspace files.
+
+Recommendation:
+
+- implement **issue documents now**
+- keep existing attachments as-is
+- defer full artifact unification until there is a second real consumer beyond issue documents + attachments
+
+This keeps `PAP-448` focused while still fitting the larger artifact direction.
+
+## Goals
+
+1. Give issues first-class keyed documents, starting with `plan`.
+2. Make documents editable by board users and same-company agents with issue access.
+3. Preserve change history with append-only revisions.
+4. Make the `plan` document automatically available in the normal issue fetch used by agents/heartbeats.
+5. Replace the current `<plan>`-in-description convention in skills/docs.
+6. Keep the design compatible with a future artifact/deliverables layer.
+
+## Non-Goals
+
+- full collaborative doc editing
+- binary-file version history
+- browser IDE or workspace editor
+- full artifact-system implementation in the same change
+- generalized polymorphic relations for every entity type on day one
+
+## Product Decisions
+
+### 1. Keyed issue documents
+
+Each issue can have multiple documents. Each document relation has a stable key:
+
+- `plan`
+- `design`
+- `notes`
+- `report`
+- custom keys later
+
+Key rules:
+
+- unique per issue, case-insensitive
+- normalized to lowercase slug form
+- machine-oriented and stable
+- title is separate and user-facing
+
+The `plan` key is conventional and reserved by Paperclip workflow/docs.
+
+### 2. Text-first v1
+
+V1 documents should be text-first, not arbitrary blobs.
+
+Recommended supported formats:
+
+- `markdown`
+- `plain_text`
+- `json`
+- `html`
+
+Recommendation:
+
+- optimize UI for `markdown`
+- allow raw editing for the others
+- keep PDFs/images/CSVs/etc as attachments/artifacts, not editable documents
+
+### 3. Revision model
+
+Every document update creates a new immutable revision.
+
+The current document row stores the latest snapshot for fast reads.
+
+### 4. Concurrency model
+
+Do not use silent last-write-wins.
+
+Updates should include `baseRevisionId`:
+
+- create: no base revision required
+- update: `baseRevisionId` must match current latest revision
+- mismatch: return `409 Conflict`
+
+This is important because both board users and agents may edit the same document.
+
+### 5. Issue fetch behavior
+
+`GET /api/issues/:id` should include:
+
+- full `planDocument` when a `plan` document exists
+- `documentSummaries` for all linked documents
+
+It should not inline every document body by default.
+
+This keeps issue fetches useful for agents without making every issue payload unbounded.
+
+### 6. Legacy `<plan>` compatibility
+
+If an issue has no `plan` document but its description contains a legacy `<plan>` block:
+
+- expose that as a legacy read-only fallback in API/UI
+- mark it as legacy/synthetic
+- prefer a real `plan` document when both exist
+
+Recommendation:
+
+- do not auto-rewrite old issue descriptions in the first rollout
+- provide an explicit import/migrate path later
+
+## Proposed Data Model
+
+Recommendation: make documents first-class, but keep issue linkage explicit via a join table.
+
+This preserves foreign keys today and gives a clean path to future `project_documents` or `company_documents` tables later.
+
+## Tables
+
+### `documents`
+
+Canonical text document record.
+
+Suggested columns:
+
+- `id`
+- `company_id`
+- `title`
+- `format`
+- `latest_body`
+- `latest_revision_id`
+- `latest_revision_number`
+- `created_by_agent_id`
+- `created_by_user_id`
+- `updated_by_agent_id`
+- `updated_by_user_id`
+- `created_at`
+- `updated_at`
+
+### `document_revisions`
+
+Append-only history.
+
+Suggested columns:
+
+- `id`
+- `company_id`
+- `document_id`
+- `revision_number`
+- `body`
+- `change_summary`
+- `created_by_agent_id`
+- `created_by_user_id`
+- `created_at`
+
+Constraints:
+
+- unique `(document_id, revision_number)`
+
+### `issue_documents`
+
+Issue relation + workflow key.
+
+Suggested columns:
+
+- `id`
+- `company_id`
+- `issue_id`
+- `document_id`
+- `key`
+- `created_at`
+- `updated_at`
+
+Constraints:
+
+- unique `(company_id, issue_id, key)`
+- unique `(document_id)` to keep one issue relation per document in v1
+
+## Why not use `assets` for this?
+
+Because `assets` solves blob storage, not:
+
+- stable keyed semantics like `plan`
+- inline text editing
+- revision history
+- optimistic concurrency
+- cheap inclusion in `GET /issues/:id`
+
+Documents and attachments should remain separate primitives, then meet later in a deliverables/artifact read-model.
+
+## Shared Types and API Contract
+
+## New shared types
+
+Add:
+
+- `DocumentFormat`
+- `IssueDocument`
+- `IssueDocumentSummary`
+- `DocumentRevision`
+
+Recommended `IssueDocument` shape:
+
+```ts
+type DocumentFormat = "markdown" | "plain_text" | "json" | "html";
+
+interface IssueDocument {
+  id: string;
+  companyId: string;
+  issueId: string;
+  key: string;
+  title: string | null;
+  format: DocumentFormat;
+  body: string;
+  latestRevisionId: string;
+  latestRevisionNumber: number;
+  createdByAgentId: string | null;
+  createdByUserId: string | null;
+  updatedByAgentId: string | null;
+  updatedByUserId: string | null;
+  createdAt: Date;
+  updatedAt: Date;
+}
+```
+
+Recommended `IssueDocumentSummary` shape:
+
+```ts
+interface IssueDocumentSummary {
+  id: string;
+  key: string;
+  title: string | null;
+  format: DocumentFormat;
+  latestRevisionId: string;
+  latestRevisionNumber: number;
+  updatedAt: Date;
+}
+```
+
+## Issue type enrichment
+
+Extend `Issue` with:
+
+```ts
+interface Issue {
+  ...
+  planDocument?: IssueDocument | null;
+  documentSummaries?: IssueDocumentSummary[];
+  legacyPlanDocument?: {
+    key: "plan";
+    body: string;
+    source: "issue_description";
+  } | null;
+}
+```
+
+This directly satisfies the `PAP-448` requirement for heartbeat/API issue fetches.
+
+## API endpoints
+
+Recommended endpoints:
+
+- `GET /api/issues/:issueId/documents`
+- `GET /api/issues/:issueId/documents/:key`
+- `PUT /api/issues/:issueId/documents/:key`
+- `GET /api/issues/:issueId/documents/:key/revisions`
+- `DELETE /api/issues/:issueId/documents/:key` optionally board-only in v1
+
+Recommended `PUT` body:
+
+```ts
+{
+  title?: string | null;
+  format: "markdown" | "plain_text" | "json" | "html";
+  body: string;
+  changeSummary?: string | null;
+  baseRevisionId?: string | null;
+}
+```
+
+Behavior:
+
+- missing document + no `baseRevisionId`: create
+- existing document + matching `baseRevisionId`: update
+- existing document + stale `baseRevisionId`: `409`
+
+## Authorization and invariants
+
+- all document records are company-scoped
+- issue relation must belong to same company
+- board access follows existing issue access rules
+- agent access follows existing same-company issue access rules
+- every mutation writes activity log entries
+
+Recommended delete rule for v1:
+
+- board can delete documents
+- agents can create/update, but not delete
+
+That keeps automated systems from removing canonical docs too easily.
+
+## UI Plan
+
+## Issue detail
+
+Add a new **Documents** section directly under the issue description.
+
+Recommended behavior:
+
+- show `plan` first when present
+- show other documents below it
+- render a gist-like header:
+  - key
+  - title
+  - last updated metadata
+  - revision number
+- support inline edit
+- support create new document by key
+- support revision history drawer or sheet
+
+Recommended presentation order:
+
+1. Description
+2. Documents
+3. Attachments
+4. Comments / activity / sub-issues
+
+This matches the request that documents live under the description while still leaving attachments available.
+
+## Editing UX
+
+Recommendation:
+
+- use markdown preview + raw edit toggle for markdown docs
+- use raw textarea editor for non-markdown docs in v1
+- show explicit save conflicts on `409`
+- show a clear empty state: "No documents yet"
+
+## Legacy plan rendering
+
+If there is no stored `plan` document but legacy `<plan>` exists:
+
+- show it in the Documents section
+- mark it `Legacy plan from description`
+- offer create/import in a later pass
+
+## Agent Protocol and Skills
+
+Update the Paperclip agent workflow so planning no longer edits the issue description.
+
+Required changes:
+
+- update `skills/paperclip/SKILL.md`
+- replace the `<plan>` instructions with document creation/update instructions
+- document the new endpoints in `docs/api/issues.md`
+- update any internal planning docs that still teach inline `<plan>` blocks
+
+New rule:
+
+- when asked to make a plan for an issue, create or update the issue document with key `plan`
+- leave a comment that the plan document was created/updated
+- do not mark the issue done
+
+## Relationship to the Artifact Plan
+
+This work should explicitly feed the broader artifact/deliverables direction.
+
+Recommendation:
+
+- keep documents as their own primitive in this change
+- add `document` to any future `ArtifactKind`
+- later build a deliverables read-model that aggregates:
+  - issue documents
+  - issue attachments
+  - preview URLs
+  - workspace-file references
+
+The artifact proposal currently has no explicit `document` kind. It should.
+
+Recommended future shape:
+
+```ts
+type ArtifactKind =
+  | "document"
+  | "attachment"
+  | "workspace_file"
+  | "preview"
+  | "report_link";
+```
+
+## Implementation Phases
+
+## Phase 1: Shared contract and schema
+
+Files:
+
+- `packages/db/src/schema/documents.ts`
+- `packages/db/src/schema/document_revisions.ts`
+- `packages/db/src/schema/issue_documents.ts`
+- `packages/db/src/schema/index.ts`
+- `packages/db/src/migrations/*`
+- `packages/shared/src/types/issue.ts`
+- `packages/shared/src/validators/issue.ts` or new document validator file
+- `packages/shared/src/index.ts`
+
+Acceptance:
+
+- schema enforces one key per issue
+- revisions are append-only
+- shared types expose plan/document fields on issue fetch
+
+## Phase 2: Server services and routes
+
+Files:
+
+- `server/src/services/issues.ts` or `server/src/services/documents.ts`
+- `server/src/routes/issues.ts`
+- `server/src/services/activity.ts` callsites
+
+Behavior:
+
+- list/get/upsert/delete documents
+- revision listing
+- `GET /issues/:id` returns `planDocument` + `documentSummaries`
+- company boundary checks match issue routes
+
+Acceptance:
+
+- agents and board can fetch/update same-company issue documents
+- stale edits return `409`
+- activity timeline shows document changes
+
+## Phase 3: UI issue documents surface
+
+Files:
+
+- `ui/src/api/issues.ts`
+- `ui/src/lib/queryKeys.ts`
+- `ui/src/pages/IssueDetail.tsx`
+- new reusable document UI component if needed
+
+Behavior:
+
+- render plan + documents under description
+- create/update by key
+- open revision history
+- show conflicts/errors clearly
+
+Acceptance:
+
+- board can create a `plan` doc from issue detail
+- updated plan appears immediately
+- issue detail no longer depends on description-embedded `<plan>`
+
+## Phase 4: Skills/docs migration
+
+Files:
+
+- `skills/paperclip/SKILL.md`
+- `docs/api/issues.md`
+- `doc/SPEC-implementation.md`
+- relevant plan/docs that mention `<plan>`
+
+Acceptance:
+
+- planning guidance references issue documents, not inline issue description tags
+- API docs describe the new document endpoints and issue payload additions
+
+## Phase 5: Legacy compatibility and follow-up
+
+Behavior:
+
+- read legacy `<plan>` blocks as fallback
+- optionally add explicit import/migration command later
+
+Follow-up, not required for first merge:
+
+- deliverables/artifact read-model
+- project/company documents
+- comment-linked documents
+- diff view between revisions
+
+## Test Plan
+
+### Server
+
+- document create/read/update/delete lifecycle
+- revision numbering
+- `baseRevisionId` conflict handling
+- company boundary enforcement
+- agent vs board authorization
+- issue fetch includes `planDocument` and document summaries
+- legacy `<plan>` fallback behavior
+- activity log mutation coverage
+
+### UI
+
+- issue detail shows plan document
+- create/update flows invalidate queries correctly
+- conflict and validation errors are surfaced
+- legacy plan fallback renders correctly
+
+### Verification
+
+Run before implementation is declared complete:
+
+```sh
+pnpm -r typecheck
+pnpm test:run
+pnpm build
+```
+
+## Open Questions
+
+1. Should v1 documents be markdown-only, with `json/html/plain_text` deferred?
+   Recommendation: allow all four in API, optimize UI for markdown only.
+
+2. Should agents be allowed to create arbitrary keys, or only conventional keys?
+   Recommendation: allow arbitrary keys with normalized validation; reserve `plan` as special behavior only.
+
+3. Should delete exist in v1?
+   Recommendation: yes, but board-only.
+
+4. Should legacy `<plan>` blocks ever be auto-migrated?
+   Recommendation: no automatic mutation in the first rollout.
+
+5. Should documents appear inside a future Deliverables section or remain a top-level Issue section?
+   Recommendation: keep a dedicated Documents section now; later also expose them in Deliverables if an aggregated artifact view is added.
+
+## Final Recommendation
+
+Ship **issue documents** as a focused, text-first primitive now.
+
+Do not try to solve full artifact unification in the same implementation.
+
+Use:
+
+- first-class document tables
+- issue-level keyed linkage
+- append-only revisions
+- `planDocument` embedded in normal issue fetches
+- legacy `<plan>` fallback
+- skill/docs migration away from description-embedded plans
+
+This addresses the real planning workflow problem immediately and leaves the artifact system room to grow cleanly afterward.
--- a/package.json
+++ b/package.json
@@ -18,13 +18,11 @@
    "db:backup": "./scripts/backup-db.sh",
    "paperclipai": "node cli/node_modules/tsx/dist/cli.mjs cli/src/index.ts",
    "build:npm": "./scripts/build-npm.sh",
-    "release:start": "./scripts/release-start.sh",
    "release": "./scripts/release.sh",
-    "release:preflight": "./scripts/release-preflight.sh",
+    "release:canary": "./scripts/release.sh canary",
+    "release:stable": "./scripts/release.sh stable",
    "release:github": "./scripts/create-github-release.sh",
    "release:rollback": "./scripts/rollback-latest.sh",
-    "changeset": "changeset",
-    "version-packages": "changeset version",
    "check:tokens": "node scripts/check-forbidden-tokens.mjs",
    "docs:dev": "cd docs && npx mintlify dev",
    "smoke:openclaw-join": "./scripts/smoke/openclaw-join.sh",
@@ -34,7 +32,6 @@
    "test:e2e:headed": "npx playwright test --config tests/e2e/playwright.config.ts --headed"
  },
  "devDependencies": {
-    "@changesets/cli": "^2.30.0",
    "cross-env": "^10.1.0",
    "@playwright/test": "^1.58.2",
    "esbuild": "^0.27.3",
--- a/packages/adapter-utils/CHANGELOG.md
+++ b/packages/adapter-utils/CHANGELOG.md
@@ -1,5 +1,11 @@
 # @paperclipai/adapter-utils

+## 0.3.1
+
+### Patch Changes
+
+- Stable release preparation for 0.3.1
+
 ## 0.3.0

 ### Minor Changes
--- a/packages/adapter-utils/package.json
+++ b/packages/adapter-utils/package.json
@@ -1,6 +1,16 @@
 {
  "name": "@paperclipai/adapter-utils",
-  "version": "0.3.0",
+  "version": "0.3.1",
+  "license": "MIT",
+  "homepage": "https://github.com/paperclipai/paperclip",
+  "bugs": {
+    "url": "https://github.com/paperclipai/paperclip/issues"
+  },
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/paperclipai/paperclip",
+    "directory": "packages/adapter-utils"
+  },
  "type": "module",
  "exports": {
    ".": "./src/index.ts",
--- a/packages/adapter-utils/src/billing.test.ts
+++ b/packages/adapter-utils/src/billing.test.ts
@@ -0,0 +1,28 @@
+import { describe, expect, it } from "vitest";
+import { inferOpenAiCompatibleBiller } from "./billing.js";
+
+describe("inferOpenAiCompatibleBiller", () => {
+  it("returns openrouter when OPENROUTER_API_KEY is present", () => {
+    expect(
+      inferOpenAiCompatibleBiller({ OPENROUTER_API_KEY: "sk-or-123" } as NodeJS.ProcessEnv, "openai"),
+    ).toBe("openrouter");
+  });
+
+  it("returns openrouter when OPENAI_BASE_URL points at OpenRouter", () => {
+    expect(
+      inferOpenAiCompatibleBiller(
+        { OPENAI_BASE_URL: "https://openrouter.ai/api/v1" } as NodeJS.ProcessEnv,
+        "openai",
+      ),
+    ).toBe("openrouter");
+  });
+
+  it("returns fallback when no OpenRouter markers are present", () => {
+    expect(
+      inferOpenAiCompatibleBiller(
+        { OPENAI_BASE_URL: "https://api.openai.com/v1" } as NodeJS.ProcessEnv,
+        "openai",
+      ),
+    ).toBe("openai");
+  });
+});
--- a/packages/adapter-utils/src/billing.ts
+++ b/packages/adapter-utils/src/billing.ts
@@ -0,0 +1,20 @@
+function readEnv(env: NodeJS.ProcessEnv, key: string): string | null {
+  const value = env[key];
+  return typeof value === "string" && value.trim().length > 0 ? value.trim() : null;
+}
+
+export function inferOpenAiCompatibleBiller(
+  env: NodeJS.ProcessEnv,
+  fallback: string | null = "openai",
+): string | null {
+  const explicitOpenRouterKey = readEnv(env, "OPENROUTER_API_KEY");
+  if (explicitOpenRouterKey) return "openrouter";
+
+  const baseUrl =
+    readEnv(env, "OPENAI_BASE_URL") ??
+    readEnv(env, "OPENAI_API_BASE") ??
+    readEnv(env, "OPENAI_API_BASE_URL");
+  if (baseUrl && /openrouter\.ai/i.test(baseUrl)) return "openrouter";
+
+  return fallback;
+}
--- a/packages/adapter-utils/src/index.ts
+++ b/packages/adapter-utils/src/index.ts
@@ -17,14 +17,31 @@ export type {
  HireApprovedPayload,
  HireApprovedHookResult,
  ServerAdapterModule,
+  QuotaWindow,
+  ProviderQuotaResult,
  TranscriptEntry,
  StdoutLineParser,
  CLIAdapterModule,
  CreateConfigValues,
 } from "./types.js";
+export type {
+  SessionCompactionPolicy,
+  NativeContextManagement,
+  AdapterSessionManagement,
+  ResolvedSessionCompactionPolicy,
+} from "./session-compaction.js";
+export {
+  ADAPTER_SESSION_MANAGEMENT,
+  LEGACY_SESSIONED_ADAPTER_TYPES,
+  getAdapterSessionManagement,
+  readSessionCompactionOverride,
+  resolveSessionCompactionPolicy,
+  hasSessionCompactionThresholds,
+} from "./session-compaction.js";
 export {
  REDACTED_HOME_PATH_USER,
  redactHomePathUserSegments,
  redactHomePathUserSegmentsInValue,
  redactTranscriptEntryPaths,
 } from "./log-redaction.js";
+export { inferOpenAiCompatibleBiller } from "./billing.js";
--- a/packages/adapter-utils/src/server-utils.ts
+++ b/packages/adapter-utils/src/server-utils.ts
@@ -32,6 +32,23 @@ export const runningProcesses = new Map<string, RunningProcess>();
 export const MAX_CAPTURE_BYTES = 4 * 1024 * 1024;
 export const MAX_EXCERPT_BYTES = 32 * 1024;
 const SENSITIVE_ENV_KEY = /(key|token|secret|password|passwd|authorization|cookie)/i;
+const PAPERCLIP_SKILL_ROOT_RELATIVE_CANDIDATES = [
+  "../../skills",
+  "../../../../../skills",
+];
+
+export interface PaperclipSkillEntry {
+  name: string;
+  source: string;
+}
+
+function normalizePathSlashes(value: string): string {
+  return value.replaceAll("\\", "/");
+}
+
+function isMaintainerOnlySkillTarget(candidate: string): boolean {
+  return normalizePathSlashes(candidate).includes("/.agents/skills/");
+}

 export function parseObject(value: unknown): Record<string, unknown> {
  if (typeof value !== "object" || value === null || Array.isArray(value)) {
@@ -95,6 +112,16 @@ export function renderTemplate(template: string, data: Record<string, unknown>)
  return template.replace(/{{\s*([a-zA-Z0-9_.-]+)\s*}}/g, (_, path) => resolvePathValue(data, path));
 }

+export function joinPromptSections(
+  sections: Array<string | null | undefined>,
+  separator = "\n\n",
+) {
+  return sections
+    .map((value) => (typeof value === "string" ? value.trim() : ""))
+    .filter(Boolean)
+    .join(separator);
+}
+
 export function redactEnvForLogs(env: Record<string, string>): Record<string, string> {
  const redacted: Record<string, string> = {};
  for (const [key, value] of Object.entries(env)) {
@@ -245,6 +272,136 @@ export async function ensureAbsoluteDirectory(
  }
 }

+export async function resolvePaperclipSkillsDir(
+  moduleDir: string,
+  additionalCandidates: string[] = [],
+): Promise<string | null> {
+  const candidates = [
+    ...PAPERCLIP_SKILL_ROOT_RELATIVE_CANDIDATES.map((relativePath) => path.resolve(moduleDir, relativePath)),
+    ...additionalCandidates.map((candidate) => path.resolve(candidate)),
+  ];
+  const seenRoots = new Set<string>();
+
+  for (const root of candidates) {
+    if (seenRoots.has(root)) continue;
+    seenRoots.add(root);
+    const isDirectory = await fs.stat(root).then((stats) => stats.isDirectory()).catch(() => false);
+    if (isDirectory) return root;
+  }
+
+  return null;
+}
+
+export async function listPaperclipSkillEntries(
+  moduleDir: string,
+  additionalCandidates: string[] = [],
+): Promise<PaperclipSkillEntry[]> {
+  const root = await resolvePaperclipSkillsDir(moduleDir, additionalCandidates);
+  if (!root) return [];
+
+  try {
+    const entries = await fs.readdir(root, { withFileTypes: true });
+    return entries
+      .filter((entry) => entry.isDirectory())
+      .map((entry) => ({
+        name: entry.name,
+        source: path.join(root, entry.name),
+      }));
+  } catch {
+    return [];
+  }
+}
+
+export async function readPaperclipSkillMarkdown(
+  moduleDir: string,
+  skillName: string,
+): Promise<string | null> {
+  const normalized = skillName.trim().toLowerCase();
+  if (!normalized) return null;
+
+  const entries = await listPaperclipSkillEntries(moduleDir);
+  const match = entries.find((entry) => entry.name === normalized);
+  if (!match) return null;
+
+  try {
+    return await fs.readFile(path.join(match.source, "SKILL.md"), "utf8");
+  } catch {
+    return null;
+  }
+}
+
+export async function ensurePaperclipSkillSymlink(
+  source: string,
+  target: string,
+  linkSkill: (source: string, target: string) => Promise<void> = (linkSource, linkTarget) =>
+    fs.symlink(linkSource, linkTarget),
+): Promise<"created" | "repaired" | "skipped"> {
+  const existing = await fs.lstat(target).catch(() => null);
+  if (!existing) {
+    await linkSkill(source, target);
+    return "created";
+  }
+
+  if (!existing.isSymbolicLink()) {
+    return "skipped";
+  }
+
+  const linkedPath = await fs.readlink(target).catch(() => null);
+  if (!linkedPath) return "skipped";
+
+  const resolvedLinkedPath = path.resolve(path.dirname(target), linkedPath);
+  if (resolvedLinkedPath === source) {
+    return "skipped";
+  }
+
+  const linkedPathExists = await fs.stat(resolvedLinkedPath).then(() => true).catch(() => false);
+  if (linkedPathExists) {
+    return "skipped";
+  }
+
+  await fs.unlink(target);
+  await linkSkill(source, target);
+  return "repaired";
+}
+
+export async function removeMaintainerOnlySkillSymlinks(
+  skillsHome: string,
+  allowedSkillNames: Iterable<string>,
+): Promise<string[]> {
+  const allowed = new Set(Array.from(allowedSkillNames));
+  try {
+    const entries = await fs.readdir(skillsHome, { withFileTypes: true });
+    const removed: string[] = [];
+    for (const entry of entries) {
+      if (allowed.has(entry.name)) continue;
+
+      const target = path.join(skillsHome, entry.name);
+      const existing = await fs.lstat(target).catch(() => null);
+      if (!existing?.isSymbolicLink()) continue;
+
+      const linkedPath = await fs.readlink(target).catch(() => null);
+      if (!linkedPath) continue;
+
+      const resolvedLinkedPath = path.isAbsolute(linkedPath)
+        ? linkedPath
+        : path.resolve(path.dirname(target), linkedPath);
+      if (
+        !isMaintainerOnlySkillTarget(linkedPath) &&
+        !isMaintainerOnlySkillTarget(resolvedLinkedPath)
+      ) {
+        continue;
+      }
+
+      await fs.unlink(target);
+      removed.push(entry.name);
+    }
+
+    return removed;
+  } catch {
+    return [];
+  }
+}
+
 export async function ensureCommandResolvable(command: string, cwd: string, env: NodeJS.ProcessEnv) {
  const resolved = await resolveCommandPath(command, cwd, env);
  if (resolved) return;
--- a/packages/adapter-utils/src/session-compaction.ts
+++ b/packages/adapter-utils/src/session-compaction.ts
@@ -0,0 +1,175 @@
+export interface SessionCompactionPolicy {
+  enabled: boolean;
+  maxSessionRuns: number;
+  maxRawInputTokens: number;
+  maxSessionAgeHours: number;
+}
+
+export type NativeContextManagement = "confirmed" | "likely" | "unknown" | "none";
+
+export interface AdapterSessionManagement {
+  supportsSessionResume: boolean;
+  nativeContextManagement: NativeContextManagement;
+  defaultSessionCompaction: SessionCompactionPolicy;
+}
+
+export interface ResolvedSessionCompactionPolicy {
+  policy: SessionCompactionPolicy;
+  adapterSessionManagement: AdapterSessionManagement | null;
+  explicitOverride: Partial<SessionCompactionPolicy>;
+  source: "adapter_default" | "agent_override" | "legacy_fallback";
+}
+
+const DEFAULT_SESSION_COMPACTION_POLICY: SessionCompactionPolicy = {
+  enabled: true,
+  maxSessionRuns: 200,
+  maxRawInputTokens: 2_000_000,
+  maxSessionAgeHours: 72,
+};
+
+// Adapters with native context management still participate in session resume,
+// but Paperclip should not rotate them using threshold-based compaction.
+const ADAPTER_MANAGED_SESSION_POLICY: SessionCompactionPolicy = {
+  enabled: true,
+  maxSessionRuns: 0,
+  maxRawInputTokens: 0,
+  maxSessionAgeHours: 0,
+};
+
+export const LEGACY_SESSIONED_ADAPTER_TYPES = new Set([
+  "claude_local",
+  "codex_local",
+  "cursor",
+  "gemini_local",
+  "opencode_local",
+  "pi_local",
+]);
+
+export const ADAPTER_SESSION_MANAGEMENT: Record<string, AdapterSessionManagement> = {
+  claude_local: {
+    supportsSessionResume: true,
+    nativeContextManagement: "confirmed",
+    defaultSessionCompaction: ADAPTER_MANAGED_SESSION_POLICY,
+  },
+  codex_local: {
+    supportsSessionResume: true,
+    nativeContextManagement: "confirmed",
+    defaultSessionCompaction: ADAPTER_MANAGED_SESSION_POLICY,
+  },
+  cursor: {
+    supportsSessionResume: true,
+    nativeContextManagement: "unknown",
+    defaultSessionCompaction: DEFAULT_SESSION_COMPACTION_POLICY,
+  },
+  gemini_local: {
+    supportsSessionResume: true,
+    nativeContextManagement: "unknown",
+    defaultSessionCompaction: DEFAULT_SESSION_COMPACTION_POLICY,
+  },
+  opencode_local: {
+    supportsSessionResume: true,
+    nativeContextManagement: "unknown",
+    defaultSessionCompaction: DEFAULT_SESSION_COMPACTION_POLICY,
+  },
+  pi_local: {
+    supportsSessionResume: true,
+    nativeContextManagement: "unknown",
+    defaultSessionCompaction: DEFAULT_SESSION_COMPACTION_POLICY,
+  },
+};
+
+function isRecord(value: unknown): value is Record<string, unknown> {
+  return typeof value === "object" && value !== null && !Array.isArray(value);
+}
+
+function readBoolean(value: unknown): boolean | undefined {
+  if (typeof value === "boolean") return value;
+  if (typeof value === "number") {
+    if (value === 1) return true;
+    if (value === 0) return false;
+    return undefined;
+  }
+  if (typeof value !== "string") return undefined;
+  const normalized = value.trim().toLowerCase();
+  if (normalized === "true" || normalized === "1" || normalized === "yes" || normalized === "on") {
+    return true;
+  }
+  if (normalized === "false" || normalized === "0" || normalized === "no" || normalized === "off") {
+    return false;
+  }
+  return undefined;
+}
+
+function readNumber(value: unknown): number | undefined {
+  if (typeof value === "number" && Number.isFinite(value)) {
+    return Math.max(0, Math.floor(value));
+  }
+  if (typeof value !== "string") return undefined;
+  const parsed = Number(value.trim());
+  return Number.isFinite(parsed) ? Math.max(0, Math.floor(parsed)) : undefined;
+}
+
+export function getAdapterSessionManagement(adapterType: string | null | undefined): AdapterSessionManagement | null {
+  if (!adapterType) return null;
+  return ADAPTER_SESSION_MANAGEMENT[adapterType] ?? null;
+}
+
+export function readSessionCompactionOverride(runtimeConfig: unknown): Partial<SessionCompactionPolicy> {
+  const runtime = isRecord(runtimeConfig) ? runtimeConfig : {};
+  const heartbeat = isRecord(runtime.heartbeat) ? runtime.heartbeat : {};
+  const compaction = isRecord(
+    heartbeat.sessionCompaction ?? heartbeat.sessionRotation ?? runtime.sessionCompaction,
+  )
+    ? (heartbeat.sessionCompaction ?? heartbeat.sessionRotation ?? runtime.sessionCompaction) as Record<string, unknown>
+    : {};
+
+  const explicit: Partial<SessionCompactionPolicy> = {};
+  const enabled = readBoolean(compaction.enabled);
+  const maxSessionRuns = readNumber(compaction.maxSessionRuns);
+  const maxRawInputTokens = readNumber(compaction.maxRawInputTokens);
+  const maxSessionAgeHours = readNumber(compaction.maxSessionAgeHours);
+
+  if (enabled !== undefined) explicit.enabled = enabled;
+  if (maxSessionRuns !== undefined) explicit.maxSessionRuns = maxSessionRuns;
+  if (maxRawInputTokens !== undefined) explicit.maxRawInputTokens = maxRawInputTokens;
+  if (maxSessionAgeHours !== undefined) explicit.maxSessionAgeHours = maxSessionAgeHours;
+
+  return explicit;
+}
+
+export function resolveSessionCompactionPolicy(
+  adapterType: string | null | undefined,
+  runtimeConfig: unknown,
+): ResolvedSessionCompactionPolicy {
+  const adapterSessionManagement = getAdapterSessionManagement(adapterType);
+  const explicitOverride = readSessionCompactionOverride(runtimeConfig);
+  const hasExplicitOverride = Object.keys(explicitOverride).length > 0;
+  const fallbackEnabled = Boolean(adapterType && LEGACY_SESSIONED_ADAPTER_TYPES.has(adapterType));
+  const basePolicy = adapterSessionManagement?.defaultSessionCompaction ?? {
+    ...DEFAULT_SESSION_COMPACTION_POLICY,
+    enabled: fallbackEnabled,
+  };
+
+  return {
+    policy: {
+      enabled: explicitOverride.enabled ?? basePolicy.enabled,
+      maxSessionRuns: explicitOverride.maxSessionRuns ?? basePolicy.maxSessionRuns,
+      maxRawInputTokens: explicitOverride.maxRawInputTokens ?? basePolicy.maxRawInputTokens,
+      maxSessionAgeHours: explicitOverride.maxSessionAgeHours ?? basePolicy.maxSessionAgeHours,
+    },
+    adapterSessionManagement,
+    explicitOverride,
+    source: hasExplicitOverride
+      ? "agent_override"
+      : adapterSessionManagement
+        ? "adapter_default"
+        : "legacy_fallback",
+  };
+}
+
+export function hasSessionCompactionThresholds(policy: Pick<
+  SessionCompactionPolicy,
+  "maxSessionRuns" | "maxRawInputTokens" | "maxSessionAgeHours"
+>) {
+  return policy.maxSessionRuns > 0 || policy.maxRawInputTokens > 0 || policy.maxSessionAgeHours > 0;
+}
--- a/packages/adapter-utils/src/types.ts
+++ b/packages/adapter-utils/src/types.ts
@@ -30,7 +30,15 @@ export interface UsageSummary {
  cachedInputTokens?: number;
 }

-export type AdapterBillingType = "api" | "subscription" | "unknown";
+export type AdapterBillingType =
+  | "api"
+  | "subscription"
+  | "metered_api"
+  | "subscription_included"
+  | "subscription_overage"
+  | "credits"
+  | "fixed"
+  | "unknown";

 export interface AdapterRuntimeServiceReport {
  id?: string | null;
@@ -68,6 +76,7 @@ export interface AdapterExecutionResult {
  sessionParams?: Record<string, unknown> | null;
  sessionDisplayId?: string | null;
  provider?: string | null;
+  biller?: string | null;
  model?: string | null;
  billingType?: AdapterBillingType | null;
  costUsd?: number | null;
@@ -99,6 +108,7 @@ export interface AdapterInvocationMeta {
  commandNotes?: string[];
  env?: Record<string, string>;
  prompt?: string;
+  promptMetrics?: Record<string, number>;
  context?: Record<string, unknown>;
 }

@@ -170,11 +180,43 @@ export interface HireApprovedHookResult {
  detail?: Record<string, unknown>;
 }

+// ---------------------------------------------------------------------------
+// Quota window types — used by adapters that can report provider quota/rate-limit state
+// ---------------------------------------------------------------------------
+
+/** a single rate-limit or usage window returned by a provider quota API */
+export interface QuotaWindow {
+  /** human label, e.g. "5h", "7d", "Sonnet 7d", "Credits" */
+  label: string;
+  /** percent of the window already consumed (0-100), null when not reported */
+  usedPercent: number | null;
+  /** iso timestamp when this window resets, null when not reported */
+  resetsAt: string | null;
+  /** free-form value label for credit-style windows, e.g. "$4.20 remaining" */
+  valueLabel: string | null;
+  /** optional supporting text, e.g. reset details or provider-specific notes */
+  detail?: string | null;
+}
+
+/** result for one provider from getQuotaWindows() */
+export interface ProviderQuotaResult {
+  /** provider slug, e.g. "anthropic", "openai" */
+  provider: string;
+  /** source label when the provider reports where the quota data came from */
+  source?: string | null;
+  /** true when the fetch succeeded and windows is populated */
+  ok: boolean;
+  /** error message when ok is false */
+  error?: string;
+  windows: QuotaWindow[];
+}
+
 export interface ServerAdapterModule {
  type: string;
  execute(ctx: AdapterExecutionContext): Promise<AdapterExecutionResult>;
  testEnvironment(ctx: AdapterEnvironmentTestContext): Promise<AdapterEnvironmentTestResult>;
  sessionCodec?: AdapterSessionCodec;
+  sessionManagement?: import("./session-compaction.js").AdapterSessionManagement;
  supportsLocalAgentJwt?: boolean;
  models?: AdapterModel[];
  listModels?: () => Promise<AdapterModel[]>;
@@ -187,6 +229,12 @@ export interface ServerAdapterModule {
    payload: HireApprovedPayload,
    adapterConfig: Record<string, unknown>,
  ) => Promise<HireApprovedHookResult>;
+  /**
+   * Optional: fetch live provider quota/rate-limit windows for this adapter.
+   * Returns a ProviderQuotaResult so the server can aggregate across adapters
+   * without knowing provider-specific credential paths or API shapes.
+   */
+  getQuotaWindows?: () => Promise<ProviderQuotaResult>;
 }

 // ---------------------------------------------------------------------------
--- a/packages/adapters/claude-local/CHANGELOG.md
+++ b/packages/adapters/claude-local/CHANGELOG.md
@@ -1,5 +1,13 @@
 # @paperclipai/adapter-claude-local

+## 0.3.1
+
+### Patch Changes
+
+- Stable release preparation for 0.3.1
+- Updated dependencies
+  - @paperclipai/adapter-utils@0.3.1
+
 ## 0.3.0

 ### Minor Changes
--- a/packages/adapters/claude-local/package.json
+++ b/packages/adapters/claude-local/package.json
@@ -1,6 +1,16 @@
 {
  "name": "@paperclipai/adapter-claude-local",
-  "version": "0.3.0",
+  "version": "0.3.1",
+  "license": "MIT",
+  "homepage": "https://github.com/paperclipai/paperclip",
+  "bugs": {
+    "url": "https://github.com/paperclipai/paperclip/issues"
+  },
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/paperclipai/paperclip",
+    "directory": "packages/adapters/claude-local"
+  },
  "type": "module",
  "exports": {
    ".": "./src/index.ts",
@@ -38,7 +48,9 @@
  "scripts": {
    "build": "tsc",
    "clean": "rm -rf dist",
-    "typecheck": "tsc --noEmit"
+    "typecheck": "tsc --noEmit",
+    "probe:quota": "pnpm exec tsx src/cli/quota-probe.ts --json",
+    "probe:quota:raw": "pnpm exec tsx src/cli/quota-probe.ts --json --raw-cli"
  },
  "dependencies": {
    "@paperclipai/adapter-utils": "workspace:*",
--- a/packages/adapters/claude-local/src/cli/quota-probe.ts
+++ b/packages/adapters/claude-local/src/cli/quota-probe.ts
@@ -0,0 +1,124 @@
+#!/usr/bin/env node
+
+import {
+  captureClaudeCliUsageText,
+  fetchClaudeCliQuota,
+  fetchClaudeQuota,
+  getQuotaWindows,
+  parseClaudeCliUsageText,
+  readClaudeAuthStatus,
+  readClaudeToken,
+} from "../server/quota.js";
+
+interface ProbeArgs {
+  json: boolean;
+  includeRawCli: boolean;
+  oauthOnly: boolean;
+  cliOnly: boolean;
+}
+
+function parseArgs(argv: string[]): ProbeArgs {
+  return {
+    json: argv.includes("--json"),
+    includeRawCli: argv.includes("--raw-cli"),
+    oauthOnly: argv.includes("--oauth-only"),
+    cliOnly: argv.includes("--cli-only"),
+  };
+}
+
+function stringifyError(error: unknown): string {
+  return error instanceof Error ? error.message : String(error);
+}
+
+async function main() {
+  const args = parseArgs(process.argv.slice(2));
+  if (args.oauthOnly && args.cliOnly) {
+    throw new Error("Choose either --oauth-only or --cli-only, not both.");
+  }
+
+  const authStatus = await readClaudeAuthStatus();
+  const token = await readClaudeToken();
+
+  const result: Record<string, unknown> = {
+    timestamp: new Date().toISOString(),
+    authStatus,
+    tokenAvailable: token != null,
+  };
+
+  if (!args.cliOnly) {
+    if (!token) {
+      result.oauth = {
+        ok: false,
+        error: "No Claude OAuth access token found in local credentials files.",
+        windows: [],
+      };
+    } else {
+      try {
+        result.oauth = {
+          ok: true,
+          windows: await fetchClaudeQuota(token),
+        };
+      } catch (error) {
+        result.oauth = {
+          ok: false,
+          error: stringifyError(error),
+          windows: [],
+        };
+      }
+    }
+  }
+
+  if (!args.oauthOnly) {
+    try {
+      const rawCliText = args.includeRawCli ? await captureClaudeCliUsageText() : null;
+      const windows = rawCliText ? parseClaudeCliUsageText(rawCliText) : await fetchClaudeCliQuota();
+      result.cli = rawCliText
+        ? {
+            ok: true,
+            windows,
+            rawText: rawCliText,
+          }
+        : {
+            ok: true,
+            windows,
+          };
+    } catch (error) {
+      result.cli = {
+        ok: false,
+        error: stringifyError(error),
+        windows: [],
+      };
+    }
+  }
+
+  if (!args.oauthOnly && !args.cliOnly) {
+    try {
+      result.aggregated = await getQuotaWindows();
+    } catch (error) {
+      result.aggregated = {
+        ok: false,
+        error: stringifyError(error),
+      };
+    }
+  }
+
+  const oauthOk = (result.oauth as { ok?: boolean } | undefined)?.ok === true;
+  const cliOk = (result.cli as { ok?: boolean } | undefined)?.ok === true;
+  const aggregatedOk = (result.aggregated as { ok?: boolean } | undefined)?.ok === true;
+  const ok = oauthOk || cliOk || aggregatedOk;
+
+  if (args.json || process.stdout.isTTY === false) {
+    console.log(JSON.stringify({ ok, ...result }, null, 2));
+  } else {
+    console.log(`timestamp: ${result.timestamp}`);
+    console.log(`auth: ${JSON.stringify(authStatus)}`);
+    console.log(`tokenAvailable: ${token != null}`);
+    if (result.oauth) console.log(`oauth: ${JSON.stringify(result.oauth, null, 2)}`);
+    if (result.cli) console.log(`cli: ${JSON.stringify(result.cli, null, 2)}`);
+    if (result.aggregated) console.log(`aggregated: ${JSON.stringify(result.aggregated, null, 2)}`);
+  }
+
+  if (!ok) process.exitCode = 1;
+}
+
+await main();
--- a/packages/adapters/claude-local/src/server/execute.ts
+++ b/packages/adapters/claude-local/src/server/execute.ts
@@ -12,6 +12,7 @@ import {
  parseObject,
  parseJson,
  buildPaperclipEnv,
+  joinPromptSections,
  redactEnvForLogs,
  ensureAbsoluteDirectory,
  ensureCommandResolvable,
@@ -121,6 +122,7 @@ async function buildClaudeRuntimeConfig(input: ClaudeExecutionInput): Promise<Cl
  const workspaceRepoRef = asString(workspaceContext.repoRef, "") || null;
  const workspaceBranch = asString(workspaceContext.branchName, "") || null;
  const workspaceWorktreePath = asString(workspaceContext.worktreePath, "") || null;
+  const agentHome = asString(workspaceContext.agentHome, "") || null;
  const workspaceHints = Array.isArray(context.paperclipWorkspaces)
    ? context.paperclipWorkspaces.filter(
        (value): value is Record<string, unknown> => typeof value === "object" && value !== null,
@@ -215,6 +217,9 @@ async function buildClaudeRuntimeConfig(input: ClaudeExecutionInput): Promise<Cl
  if (workspaceWorktreePath) {
    env.PAPERCLIP_WORKSPACE_WORKTREE_PATH = workspaceWorktreePath;
  }
+  if (agentHome) {
+    env.AGENT_HOME = agentHome;
+  }
  if (workspaceHints.length > 0) {
    env.PAPERCLIP_WORKSPACES_JSON = JSON.stringify(workspaceHints);
  }
@@ -335,7 +340,12 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
    graceSec,
    extraArgs,
  } = runtimeConfig;
-  const billingType = resolveClaudeBillingType(env);
+  const effectiveEnv = Object.fromEntries(
+    Object.entries({ ...process.env, ...env }).filter(
+      (entry): entry is [string, string] => typeof entry[1] === "string",
+    ),
+  );
+  const billingType = resolveClaudeBillingType(effectiveEnv);
  const skillsDir = await buildSkillsDir();

  // When instructionsFilePath is configured, create a combined temp file that
@@ -363,7 +373,8 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
      `[paperclip] Claude session "${runtimeSessionId}" was saved for cwd "${runtimeSessionCwd}" and will not be resumed in "${cwd}".\n`,
    );
  }
-  const prompt = renderTemplate(promptTemplate, {
+  const bootstrapPromptTemplate = asString(config.bootstrapPromptTemplate, "");
+  const templateData = {
    agentId: agent.id,
    companyId: agent.companyId,
    runId,
@@ -371,7 +382,24 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
    agent,
    run: { id: runId, source: "on_demand" },
    context,
-  });
+  };
+  const renderedPrompt = renderTemplate(promptTemplate, templateData);
+  const renderedBootstrapPrompt =
+    !sessionId && bootstrapPromptTemplate.trim().length > 0
+      ? renderTemplate(bootstrapPromptTemplate, templateData).trim()
+      : "";
+  const sessionHandoffNote = asString(context.paperclipSessionHandoffMarkdown, "").trim();
+  const prompt = joinPromptSections([
+    renderedBootstrapPrompt,
+    sessionHandoffNote,
+    renderedPrompt,
+  ]);
+  const promptMetrics = {
+    promptChars: prompt.length,
+    bootstrapPromptChars: renderedBootstrapPrompt.length,
+    sessionHandoffChars: sessionHandoffNote.length,
+    heartbeatPromptChars: renderedPrompt.length,
+  };

  const buildClaudeArgs = (resumeSessionId: string | null) => {
    const args = ["--print", "-", "--output-format", "stream-json", "--verbose"];
@@ -416,6 +444,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
        commandNotes,
        env: redactEnvForLogs(env),
        prompt,
+        promptMetrics,
        context,
      });
    }
@@ -523,6 +552,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
      sessionParams: resolvedSessionParams,
      sessionDisplayId: resolvedSessionId,
      provider: "anthropic",
+      biller: "anthropic",
      model: parsedStream.model || asString(parsed.model, model),
      billingType,
      costUsd: parsedStream.costUsd ?? asNumber(parsed.total_cost_usd, 0),
--- a/packages/adapters/claude-local/src/server/index.ts
+++ b/packages/adapters/claude-local/src/server/index.ts
@@ -6,6 +6,18 @@ export {
  isClaudeMaxTurnsResult,
  isClaudeUnknownSessionError,
 } from "./parse.js";
+export {
+  getQuotaWindows,
+  readClaudeAuthStatus,
+  readClaudeToken,
+  fetchClaudeQuota,
+  fetchClaudeCliQuota,
+  captureClaudeCliUsageText,
+  parseClaudeCliUsageText,
+  toPercent,
+  fetchWithTimeout,
+  claudeConfigDir,
+} from "./quota.js";
 import type { AdapterSessionCodec } from "@paperclipai/adapter-utils";

 function readNonEmptyString(value: unknown): string | null {
--- a/packages/adapters/claude-local/src/server/quota.ts
+++ b/packages/adapters/claude-local/src/server/quota.ts
@@ -0,0 +1,531 @@
+import { execFile } from "node:child_process";
+import fs from "node:fs/promises";
+import os from "node:os";
+import path from "node:path";
+import { promisify } from "node:util";
+import type { ProviderQuotaResult, QuotaWindow } from "@paperclipai/adapter-utils";
+
+const execFileAsync = promisify(execFile);
+
+const CLAUDE_USAGE_SOURCE_OAUTH = "anthropic-oauth";
+const CLAUDE_USAGE_SOURCE_CLI = "claude-cli";
+
+export function claudeConfigDir(): string {
+  const fromEnv = process.env.CLAUDE_CONFIG_DIR;
+  if (typeof fromEnv === "string" && fromEnv.trim().length > 0) return fromEnv.trim();
+  return path.join(os.homedir(), ".claude");
+}
+
+function hasNonEmptyProcessEnv(key: string): boolean {
+  const value = process.env[key];
+  return typeof value === "string" && value.trim().length > 0;
+}
+
+function createClaudeQuotaEnv(): Record<string, string> {
+  const env: Record<string, string> = {};
+  for (const [key, value] of Object.entries(process.env)) {
+    if (typeof value !== "string") continue;
+    if (key.startsWith("ANTHROPIC_")) continue;
+    env[key] = value;
+  }
+  return env;
+}
+
+function stripBackspaces(text: string): string {
+  let out = "";
+  for (const char of text) {
+    if (char === "\b") {
+      out = out.slice(0, -1);
+    } else {
+      out += char;
+    }
+  }
+  return out;
+}
+
+function stripAnsi(text: string): string {
+  return text
+    .replace(/\u001B\][^\u0007]*(?:\u0007|\u001B\\)/g, "")
+    .replace(/\u001B(?:[@-Z\\-_]|\[[0-?]*[ -/]*[@-~])/g, "");
+}
+
+function cleanTerminalText(text: string): string {
+  return stripAnsi(stripBackspaces(text))
+    .replace(/\u0000/g, "")
+    .replace(/\r/g, "\n");
+}
+
+function normalizeForLabelSearch(text: string): string {
+  return text.toLowerCase().replace(/[^a-z0-9]+/g, "");
+}
+
+function trimToLatestUsagePanel(text: string): string | null {
+  const lower = text.toLowerCase();
+  const settingsIndex = lower.lastIndexOf("settings:");
+  if (settingsIndex < 0) return null;
+  let tail = text.slice(settingsIndex);
+  const tailLower = tail.toLowerCase();
+  if (!tailLower.includes("usage")) return null;
+  if (!tailLower.includes("current session") && !tailLower.includes("loading usage")) return null;
+  const stopMarkers = [
+    "status dialog dismissed",
+    "checking for updates",
+    "press ctrl-c again to exit",
+  ];
+  let stopIndex = -1;
+  for (const marker of stopMarkers) {
+    const markerIndex = tailLower.indexOf(marker);
+    if (markerIndex >= 0 && (stopIndex === -1 || markerIndex < stopIndex)) {
+      stopIndex = markerIndex;
+    }
+  }
+  if (stopIndex >= 0) {
+    tail = tail.slice(0, stopIndex);
+  }
+  return tail;
+}
+
+async function readClaudeTokenFromFile(credPath: string): Promise<string | null> {
+  let raw: string;
+  try {
+    raw = await fs.readFile(credPath, "utf8");
+  } catch {
+    return null;
+  }
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(raw);
+  } catch {
+    return null;
+  }
+  if (typeof parsed !== "object" || parsed === null) return null;
+  const obj = parsed as Record<string, unknown>;
+  const oauth = obj["claudeAiOauth"];
+  if (typeof oauth !== "object" || oauth === null) return null;
+  const token = (oauth as Record<string, unknown>)["accessToken"];
+  return typeof token === "string" && token.length > 0 ? token : null;
+}
+
+interface ClaudeAuthStatus {
+  loggedIn: boolean;
+  authMethod: string | null;
+  subscriptionType: string | null;
+}
+
+export async function readClaudeAuthStatus(): Promise<ClaudeAuthStatus | null> {
+  try {
+    const { stdout } = await execFileAsync("claude", ["auth", "status"], {
+      env: process.env,
+      timeout: 5_000,
+      maxBuffer: 1024 * 1024,
+    });
+    const parsed = JSON.parse(stdout) as Record<string, unknown>;
+    return {
+      loggedIn: parsed.loggedIn === true,
+      authMethod: typeof parsed.authMethod === "string" ? parsed.authMethod : null,
+      subscriptionType: typeof parsed.subscriptionType === "string" ? parsed.subscriptionType : null,
+    };
+  } catch {
+    return null;
+  }
+}
+
+function describeClaudeSubscriptionAuth(status: ClaudeAuthStatus | null): string | null {
+  if (!status?.loggedIn || status.authMethod !== "claude.ai") return null;
+  return status.subscriptionType
+    ? `Claude is logged in via claude.ai (${status.subscriptionType})`
+    : "Claude is logged in via claude.ai";
+}
+
+export async function readClaudeToken(): Promise<string | null> {
+  const configDir = claudeConfigDir();
+  for (const filename of [".credentials.json", "credentials.json"]) {
+    const token = await readClaudeTokenFromFile(path.join(configDir, filename));
+    if (token) return token;
+  }
+  return null;
+}
+
+interface AnthropicUsageWindow {
+  utilization?: number | null;
+  resets_at?: string | null;
+}
+
+interface AnthropicExtraUsage {
+  is_enabled?: boolean | null;
+  monthly_limit?: number | null;
+  used_credits?: number | null;
+  utilization?: number | null;
+  currency?: string | null;
+}
+
+interface AnthropicUsageResponse {
+  five_hour?: AnthropicUsageWindow | null;
+  seven_day?: AnthropicUsageWindow | null;
+  seven_day_sonnet?: AnthropicUsageWindow | null;
+  seven_day_opus?: AnthropicUsageWindow | null;
+  extra_usage?: AnthropicExtraUsage | null;
+}
+
+function formatCurrencyAmount(value: number, currency: string | null | undefined): string {
+  const code = typeof currency === "string" && currency.trim().length > 0 ? currency.trim().toUpperCase() : "USD";
+  return new Intl.NumberFormat("en-US", {
+    style: "currency",
+    currency: code,
+    maximumFractionDigits: 2,
+  }).format(value);
+}
+
+function formatExtraUsageLabel(extraUsage: AnthropicExtraUsage): string | null {
+  const monthlyLimit = extraUsage.monthly_limit;
+  const usedCredits = extraUsage.used_credits;
+  if (
+    typeof monthlyLimit !== "number" ||
+    !Number.isFinite(monthlyLimit) ||
+    typeof usedCredits !== "number" ||
+    !Number.isFinite(usedCredits)
+  ) {
+    return null;
+  }
+  return `${formatCurrencyAmount(usedCredits, extraUsage.currency)} / ${formatCurrencyAmount(monthlyLimit, extraUsage.currency)}`;
+}
+
+/** Convert a 0-1 utilization fraction to a 0-100 integer percent. Returns null for null/undefined input. */
+export function toPercent(utilization: number | null | undefined): number | null {
+  if (utilization == null) return null;
+  return Math.min(100, Math.round(utilization * 100));
+}
+
+/** fetch with an abort-based timeout so a hanging provider api doesn't block the response indefinitely */
+export async function fetchWithTimeout(url: string, init: RequestInit, ms = 8000): Promise<Response> {
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), ms);
+  try {
+    return await fetch(url, { ...init, signal: controller.signal });
+  } finally {
+    clearTimeout(timer);
+  }
+}
+
+export async function fetchClaudeQuota(token: string): Promise<QuotaWindow[]> {
+  const resp = await fetchWithTimeout("https://api.anthropic.com/api/oauth/usage", {
+    headers: {
+      Authorization: `Bearer ${token}`,
+      "anthropic-beta": "oauth-2025-04-20",
+    },
+  });
+  if (!resp.ok) throw new Error(`anthropic usage api returned ${resp.status}`);
+  const body = (await resp.json()) as AnthropicUsageResponse;
+  const windows: QuotaWindow[] = [];
+
+  if (body.five_hour != null) {
+    windows.push({
+      label: "Current session",
+      usedPercent: toPercent(body.five_hour.utilization),
+      resetsAt: body.five_hour.resets_at ?? null,
+      valueLabel: null,
+      detail: null,
+    });
+  }
+  if (body.seven_day != null) {
+    windows.push({
+      label: "Current week (all models)",
+      usedPercent: toPercent(body.seven_day.utilization),
+      resetsAt: body.seven_day.resets_at ?? null,
+      valueLabel: null,
+      detail: null,
+    });
+  }
+  if (body.seven_day_sonnet != null) {
+    windows.push({
+      label: "Current week (Sonnet only)",
+      usedPercent: toPercent(body.seven_day_sonnet.utilization),
+      resetsAt: body.seven_day_sonnet.resets_at ?? null,
+      valueLabel: null,
+      detail: null,
+    });
+  }
+  if (body.seven_day_opus != null) {
+    windows.push({
+      label: "Current week (Opus only)",
+      usedPercent: toPercent(body.seven_day_opus.utilization),
+      resetsAt: body.seven_day_opus.resets_at ?? null,
+      valueLabel: null,
+      detail: null,
+    });
+  }
+  if (body.extra_usage != null) {
+    windows.push({
+      label: "Extra usage",
+      usedPercent: body.extra_usage.is_enabled === false ? null : toPercent(body.extra_usage.utilization),
+      resetsAt: null,
+      valueLabel:
+        body.extra_usage.is_enabled === false
+          ? "Not enabled"
+          : formatExtraUsageLabel(body.extra_usage),
+      detail:
+        body.extra_usage.is_enabled === false
+          ? "Extra usage not enabled"
+          : "Monthly extra usage pool",
+    });
+  }
+  return windows;
+}
+
+function usageOutputLooksRelevant(text: string): boolean {
+  const normalized = normalizeForLabelSearch(text);
+  return normalized.includes("currentsession")
+    || normalized.includes("currentweek")
+    || normalized.includes("loadingusage")
+    || normalized.includes("failedtoloadusagedata")
+    || normalized.includes("tokenexpired")
+    || normalized.includes("authenticationerror")
+    || normalized.includes("ratelimited");
+}
+
+function usageOutputLooksComplete(text: string): boolean {
+  const normalized = normalizeForLabelSearch(text);
+  if (
+    normalized.includes("failedtoloadusagedata")
+    || normalized.includes("tokenexpired")
+    || normalized.includes("authenticationerror")
+    || normalized.includes("ratelimited")
+  ) {
+    return true;
+  }
+  return normalized.includes("currentsession")
+    && (normalized.includes("currentweek") || normalized.includes("extrausage"))
+    && /[0-9]{1,3}(?:\.[0-9]+)?%/i.test(text);
+}
+
+function extractUsageError(text: string): string | null {
+  const lower = text.toLowerCase();
+  const compact = lower.replace(/\s+/g, "");
+  if (lower.includes("token_expired") || lower.includes("token has expired")) {
+    return "Claude CLI token expired. Run `claude login` to refresh.";
+  }
+  if (lower.includes("authentication_error")) {
+    return "Claude CLI authentication error. Run `claude login`.";
+  }
+  if (lower.includes("rate_limit_error") || lower.includes("rate limited") || compact.includes("ratelimited")) {
+    return "Claude CLI usage endpoint is rate limited right now. Please try again later.";
+  }
+  if (lower.includes("failed to load usage data") || compact.includes("failedtoloadusagedata")) {
+    return "Claude CLI could not load usage data. Open the CLI and retry `/usage`.";
+  }
+  return null;
+}
+
+function percentFromLine(line: string): number | null {
+  const match = line.match(/([0-9]{1,3}(?:\.[0-9]+)?)\s*%/i);
+  if (!match) return null;
+  const rawValue = Number(match[1]);
+  if (!Number.isFinite(rawValue)) return null;
+  const clamped = Math.min(100, Math.max(0, rawValue));
+  const lower = line.toLowerCase();
+  if (lower.includes("remaining") || lower.includes("left") || lower.includes("available")) {
+    return Math.max(0, Math.min(100, Math.round(100 - clamped)));
+  }
+  return Math.round(clamped);
+}
+
+function isQuotaLabel(line: string): boolean {
+  const normalized = normalizeForLabelSearch(line);
+  return normalized === "currentsession"
+    || normalized === "currentweekallmodels"
+    || normalized === "currentweeksonnetonly"
+    || normalized === "currentweeksonnet"
+    || normalized === "currentweekopusonly"
+    || normalized === "currentweekopus"
+    || normalized === "extrausage";
+}
+
+function canonicalQuotaLabel(line: string): string {
+  switch (normalizeForLabelSearch(line)) {
+    case "currentsession":
+      return "Current session";
+    case "currentweekallmodels":
+      return "Current week (all models)";
+    case "currentweeksonnetonly":
+    case "currentweeksonnet":
+      return "Current week (Sonnet only)";
+    case "currentweekopusonly":
+    case "currentweekopus":
+      return "Current week (Opus only)";
+    case "extrausage":
+      return "Extra usage";
+    default:
+      return line;
+  }
+}
+
+function formatClaudeCliDetail(label: string, lines: string[]): string | null {
+  const normalizedLabel = normalizeForLabelSearch(label);
+  if (normalizedLabel === "extrausage") {
+    const compact = lines.join(" ").replace(/\s+/g, "").toLowerCase();
+    if (compact.includes("extrausagenotenabled")) {
+      return "Extra usage not enabled • /extra-usage to enable";
+    }
+    const firstLine = lines.find((line) => line.trim().length > 0) ?? null;
+    return firstLine;
+  }
+
+  const resetLine = lines.find((line) => /^resets/i.test(line) || normalizeForLabelSearch(line).startsWith("resets"));
+  if (!resetLine) return null;
+  return resetLine
+    .replace(/^Resets/i, "Resets ")
+    .replace(/([A-Z][a-z]{2})(\d)/g, "$1 $2")
+    .replace(/(\d)at(\d)/g, "$1 at $2")
+    .replace(/(am|pm)\(/gi, "$1 (")
+    .replace(/([A-Za-z])\(/g, "$1 (")
+    .replace(/\s+/g, " ")
+    .trim();
+}
+
+export function parseClaudeCliUsageText(text: string): QuotaWindow[] {
+  const cleaned = trimToLatestUsagePanel(cleanTerminalText(text)) ?? cleanTerminalText(text);
+  const usageError = extractUsageError(cleaned);
+  if (usageError) throw new Error(usageError);
+
+  const lines = cleaned
+    .split("\n")
+    .map((line) => line.trim())
+    .filter((line) => line.length > 0);
+
+  const sections: Array<{ label: string; lines: string[] }> = [];
+  let current: { label: string; lines: string[] } | null = null;
+
+  for (const line of lines) {
+    if (isQuotaLabel(line)) {
+      if (current) sections.push(current);
+      current = { label: canonicalQuotaLabel(line), lines: [] };
+      continue;
+    }
+    if (current) current.lines.push(line);
+  }
+  if (current) sections.push(current);
+
+  const windows = sections.map<QuotaWindow>((section) => {
+    const usedPercent = section.lines.map(percentFromLine).find((value) => value != null) ?? null;
+    return {
+      label: section.label,
+      usedPercent,
+      resetsAt: null,
+      valueLabel: null,
+      detail: formatClaudeCliDetail(section.label, section.lines),
+    };
+  });
+
+  if (!windows.some((window) => normalizeForLabelSearch(window.label) === "currentsession")) {
+    throw new Error("Could not parse Claude CLI usage output.");
+  }
+  return windows;
+}
+
+function quoteForShell(value: string): string {
+  return `'${value.replace(/'/g, `'\\''`)}'`;
+}
+
+function buildClaudeCliShellProbeCommand(): string {
+  const feed = "(sleep 2; printf '/usage\\r'; sleep 6; printf '\\033'; sleep 1; printf '\\003')";
+  const claudeCommand = "claude --tools \"\"";
+  if (process.platform === "darwin") {
+    return `${feed} | script -q /dev/null ${claudeCommand}`;
+  }
+  return `${feed} | script -q -e -f -c ${quoteForShell(claudeCommand)} /dev/null`;
+}
+
+export async function captureClaudeCliUsageText(timeoutMs = 12_000): Promise<string> {
+  const command = buildClaudeCliShellProbeCommand();
+  try {
+    const { stdout, stderr } = await execFileAsync("sh", ["-c", command], {
+      env: createClaudeQuotaEnv(),
+      timeout: timeoutMs,
+      maxBuffer: 8 * 1024 * 1024,
+    });
+    const output = `${stdout}${stderr}`;
+    const cleaned = cleanTerminalText(output);
+    if (usageOutputLooksComplete(cleaned)) return output;
+    throw new Error("Claude CLI usage probe ended before rendering usage.");
+  } catch (error) {
+    const stdout =
+      typeof error === "object" && error !== null && "stdout" in error && typeof error.stdout === "string"
+        ? error.stdout
+        : "";
+    const stderr =
+      typeof error === "object" && error !== null && "stderr" in error && typeof error.stderr === "string"
+        ? error.stderr
+        : "";
+    const output = `${stdout}${stderr}`;
+    const cleaned = cleanTerminalText(output);
+    if (usageOutputLooksComplete(cleaned)) return output;
+    if (usageOutputLooksRelevant(cleaned)) {
+      throw new Error("Claude CLI usage probe ended before rendering usage.");
+    }
+    throw error instanceof Error ? error : new Error(String(error));
+  }
+}
+
+export async function fetchClaudeCliQuota(): Promise<QuotaWindow[]> {
+  const rawText = await captureClaudeCliUsageText();
+  return parseClaudeCliUsageText(rawText);
+}
+
+function formatProviderError(source: string, error: unknown): string {
+  const message = error instanceof Error ? error.message : String(error);
+  return `${source}: ${message}`;
+}
+
+export async function getQuotaWindows(): Promise<ProviderQuotaResult> {
+  const authStatus = await readClaudeAuthStatus();
+  const authDescription = describeClaudeSubscriptionAuth(authStatus);
+  const token = await readClaudeToken();
+
+  const errors: string[] = [];
+
+  if (token) {
+    try {
+      const windows = await fetchClaudeQuota(token);
+      return { provider: "anthropic", source: CLAUDE_USAGE_SOURCE_OAUTH, ok: true, windows };
+    } catch (error) {
+      errors.push(formatProviderError("Anthropic OAuth usage", error));
+    }
+  }
+
+  try {
+    const windows = await fetchClaudeCliQuota();
+    return { provider: "anthropic", source: CLAUDE_USAGE_SOURCE_CLI, ok: true, windows };
+  } catch (error) {
+    errors.push(formatProviderError("Claude CLI /usage", error));
+  }
+
+  if (hasNonEmptyProcessEnv("ANTHROPIC_API_KEY") && !authDescription) {
+    return {
+      provider: "anthropic",
+      ok: false,
+      error:
+        errors[0]
+        ?? "ANTHROPIC_API_KEY is set and no local Claude subscription session is available for quota polling",
+      windows: [],
+    };
+  }
+
+  if (authDescription) {
+    return {
+      provider: "anthropic",
+      ok: false,
+      error:
+        errors.length > 0
+          ? `${authDescription}, but quota polling failed (${errors.join("; ")})`
+          : `${authDescription}, but Paperclip could not load subscription quota data`,
+      windows: [],
+    };
+  }
+
+  return {
+    provider: "anthropic",
+    ok: false,
+    error: errors[0] ?? "no local claude auth token",
+    windows: [],
+  };
+}
--- a/packages/adapters/claude-local/src/ui/build-config.ts
+++ b/packages/adapters/claude-local/src/ui/build-config.ts
@@ -67,6 +67,7 @@ export function buildClaudeLocalConfig(v: CreateConfigValues): Record<string, un
  if (v.cwd) ac.cwd = v.cwd;
  if (v.instructionsFilePath) ac.instructionsFilePath = v.instructionsFilePath;
  if (v.promptTemplate) ac.promptTemplate = v.promptTemplate;
+  if (v.bootstrapPrompt) ac.bootstrapPromptTemplate = v.bootstrapPrompt;
  if (v.model) ac.model = v.model;
  if (v.thinkingEffort) ac.effort = v.thinkingEffort;
  if (v.chrome) ac.chrome = true;
--- a/packages/adapters/codex-local/CHANGELOG.md
+++ b/packages/adapters/codex-local/CHANGELOG.md
@@ -1,5 +1,13 @@
 # @paperclipai/adapter-codex-local

+## 0.3.1
+
+### Patch Changes
+
+- Stable release preparation for 0.3.1
+- Updated dependencies
+  - @paperclipai/adapter-utils@0.3.1
+
 ## 0.3.0

 ### Minor Changes
--- a/packages/adapters/codex-local/package.json
+++ b/packages/adapters/codex-local/package.json
@@ -1,6 +1,16 @@
 {
  "name": "@paperclipai/adapter-codex-local",
-  "version": "0.3.0",
+  "version": "0.3.1",
+  "license": "MIT",
+  "homepage": "https://github.com/paperclipai/paperclip",
+  "bugs": {
+    "url": "https://github.com/paperclipai/paperclip/issues"
+  },
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/paperclipai/paperclip",
+    "directory": "packages/adapters/codex-local"
+  },
  "type": "module",
  "exports": {
    ".": "./src/index.ts",
@@ -38,7 +48,8 @@
  "scripts": {
    "build": "tsc",
    "clean": "rm -rf dist",
-    "typecheck": "tsc --noEmit"
+    "typecheck": "tsc --noEmit",
+    "probe:quota": "pnpm exec tsx src/cli/quota-probe.ts --json"
  },
  "dependencies": {
    "@paperclipai/adapter-utils": "workspace:*",
--- a/packages/adapters/codex-local/src/cli/quota-probe.ts
+++ b/packages/adapters/codex-local/src/cli/quota-probe.ts
@@ -0,0 +1,112 @@
+#!/usr/bin/env node
+
+import {
+  fetchCodexQuota,
+  fetchCodexRpcQuota,
+  getQuotaWindows,
+  readCodexAuthInfo,
+  readCodexToken,
+} from "../server/quota.js";
+
+interface ProbeArgs {
+  json: boolean;
+  rpcOnly: boolean;
+  whamOnly: boolean;
+}
+
+function parseArgs(argv: string[]): ProbeArgs {
+  return {
+    json: argv.includes("--json"),
+    rpcOnly: argv.includes("--rpc-only"),
+    whamOnly: argv.includes("--wham-only"),
+  };
+}
+
+function stringifyError(error: unknown): string {
+  return error instanceof Error ? error.message : String(error);
+}
+
+async function main() {
+  const args = parseArgs(process.argv.slice(2));
+  if (args.rpcOnly && args.whamOnly) {
+    throw new Error("Choose either --rpc-only or --wham-only, not both.");
+  }
+
+  const auth = await readCodexAuthInfo();
+  const token = await readCodexToken();
+
+  const result: Record<string, unknown> = {
+    timestamp: new Date().toISOString(),
+    auth,
+    tokenAvailable: token != null,
+  };
+
+  if (!args.whamOnly) {
+    try {
+      result.rpc = {
+        ok: true,
+        ...(await fetchCodexRpcQuota()),
+      };
+    } catch (error) {
+      result.rpc = {
+        ok: false,
+        error: stringifyError(error),
+        windows: [],
+      };
+    }
+  }
+
+  if (!args.rpcOnly) {
+    if (!token) {
+      result.wham = {
+        ok: false,
+        error: "No local Codex auth token found in ~/.codex/auth.json.",
+        windows: [],
+      };
+    } else {
+      try {
+        result.wham = {
+          ok: true,
+          windows: await fetchCodexQuota(token.token, token.accountId),
+        };
+      } catch (error) {
+        result.wham = {
+          ok: false,
+          error: stringifyError(error),
+          windows: [],
+        };
+      }
+    }
+  }
+
+  if (!args.rpcOnly && !args.whamOnly) {
+    try {
+      result.aggregated = await getQuotaWindows();
+    } catch (error) {
+      result.aggregated = {
+        ok: false,
+        error: stringifyError(error),
+      };
+    }
+  }
+
+  const rpcOk = (result.rpc as { ok?: boolean } | undefined)?.ok === true;
+  const whamOk = (result.wham as { ok?: boolean } | undefined)?.ok === true;
+  const aggregatedOk = (result.aggregated as { ok?: boolean } | undefined)?.ok === true;
+  const ok = rpcOk || whamOk || aggregatedOk;
+
+  if (args.json || process.stdout.isTTY === false) {
+    console.log(JSON.stringify({ ok, ...result }, null, 2));
+  } else {
+    console.log(`timestamp: ${result.timestamp}`);
+    console.log(`auth: ${JSON.stringify(auth)}`);
+    console.log(`tokenAvailable: ${token != null}`);
+    if (result.rpc) console.log(`rpc: ${JSON.stringify(result.rpc, null, 2)}`);
+    if (result.wham) console.log(`wham: ${JSON.stringify(result.wham, null, 2)}`);
+    if (result.aggregated) console.log(`aggregated: ${JSON.stringify(result.aggregated, null, 2)}`);
+  }
+
+  if (!ok) process.exitCode = 1;
+}
+
+await main();
--- a/packages/adapters/codex-local/src/server/codex-home.ts
+++ b/packages/adapters/codex-local/src/server/codex-home.ts
@@ -0,0 +1,101 @@
+import fs from "node:fs/promises";
+import os from "node:os";
+import path from "node:path";
+import type { AdapterExecutionContext } from "@paperclipai/adapter-utils";
+
+const TRUTHY_ENV_RE = /^(1|true|yes|on)$/i;
+const COPIED_SHARED_FILES = ["config.json", "config.toml", "instructions.md"] as const;
+const SYMLINKED_SHARED_FILES = ["auth.json"] as const;
+
+function nonEmpty(value: string | undefined): string | null {
+  return typeof value === "string" && value.trim().length > 0 ? value.trim() : null;
+}
+
+export async function pathExists(candidate: string): Promise<boolean> {
+  return fs.access(candidate).then(() => true).catch(() => false);
+}
+
+export function resolveCodexHomeDir(env: NodeJS.ProcessEnv = process.env): string {
+  const fromEnv = nonEmpty(env.CODEX_HOME);
+  if (fromEnv) return path.resolve(fromEnv);
+  return path.join(os.homedir(), ".codex");
+}
+
+function isWorktreeMode(env: NodeJS.ProcessEnv): boolean {
+  return TRUTHY_ENV_RE.test(env.PAPERCLIP_IN_WORKTREE ?? "");
+}
+
+function resolveWorktreeCodexHomeDir(env: NodeJS.ProcessEnv): string | null {
+  if (!isWorktreeMode(env)) return null;
+  const paperclipHome = nonEmpty(env.PAPERCLIP_HOME);
+  if (!paperclipHome) return null;
+  const instanceId = nonEmpty(env.PAPERCLIP_INSTANCE_ID);
+  if (instanceId) {
+    return path.resolve(paperclipHome, "instances", instanceId, "codex-home");
+  }
+  return path.resolve(paperclipHome, "codex-home");
+}
+
+async function ensureParentDir(target: string): Promise<void> {
+  await fs.mkdir(path.dirname(target), { recursive: true });
+}
+
+async function ensureSymlink(target: string, source: string): Promise<void> {
+  const existing = await fs.lstat(target).catch(() => null);
+  if (!existing) {
+    await ensureParentDir(target);
+    await fs.symlink(source, target);
+    return;
+  }
+
+  if (!existing.isSymbolicLink()) {
+    return;
+  }
+
+  const linkedPath = await fs.readlink(target).catch(() => null);
+  if (!linkedPath) return;
+
+  const resolvedLinkedPath = path.resolve(path.dirname(target), linkedPath);
+  if (resolvedLinkedPath === source) return;
+
+  await fs.unlink(target);
+  await fs.symlink(source, target);
+}
+
+async function ensureCopiedFile(target: string, source: string): Promise<void> {
+  const existing = await fs.lstat(target).catch(() => null);
+  if (existing) return;
+  await ensureParentDir(target);
+  await fs.copyFile(source, target);
+}
+
+export async function prepareWorktreeCodexHome(
+  env: NodeJS.ProcessEnv,
+  onLog: AdapterExecutionContext["onLog"],
+): Promise<string | null> {
+  const targetHome = resolveWorktreeCodexHomeDir(env);
+  if (!targetHome) return null;
+
+  const sourceHome = resolveCodexHomeDir(env);
+  if (path.resolve(sourceHome) === path.resolve(targetHome)) return targetHome;
+
+  await fs.mkdir(targetHome, { recursive: true });
+
+  for (const name of SYMLINKED_SHARED_FILES) {
+    const source = path.join(sourceHome, name);
+    if (!(await pathExists(source))) continue;
+    await ensureSymlink(path.join(targetHome, name), source);
+  }
+
+  for (const name of COPIED_SHARED_FILES) {
+    const source = path.join(sourceHome, name);
+    if (!(await pathExists(source))) continue;
+    await ensureCopiedFile(path.join(targetHome, name), source);
+  }
+
+  await onLog(
+    "stdout",
+    `[paperclip] Using worktree-isolated Codex home "${targetHome}" (seeded from "${sourceHome}").\n`,
+  );
+  return targetHome;
+}
--- a/packages/adapters/codex-local/src/server/execute.ts
+++ b/packages/adapters/codex-local/src/server/execute.ts
@@ -1,8 +1,7 @@
 import fs from "node:fs/promises";
-import os from "node:os";
 import path from "node:path";
 import { fileURLToPath } from "node:url";
-import type { AdapterExecutionContext, AdapterExecutionResult } from "@paperclipai/adapter-utils";
+import { inferOpenAiCompatibleBiller, type AdapterExecutionContext, type AdapterExecutionResult } from "@paperclipai/adapter-utils";
 import {
  asString,
  asNumber,
@@ -13,17 +12,18 @@ import {
  redactEnvForLogs,
  ensureAbsoluteDirectory,
  ensureCommandResolvable,
+  ensurePaperclipSkillSymlink,
  ensurePathInEnv,
+  listPaperclipSkillEntries,
+  removeMaintainerOnlySkillSymlinks,
  renderTemplate,
+  joinPromptSections,
  runChildProcess,
 } from "@paperclipai/adapter-utils/server-utils";
 import { parseCodexJsonl, isCodexUnknownSessionError } from "./parse.js";
+import { pathExists, prepareWorktreeCodexHome, resolveCodexHomeDir } from "./codex-home.js";

 const __moduleDir = path.dirname(fileURLToPath(import.meta.url));
-const PAPERCLIP_SKILLS_CANDIDATES = [
-  path.resolve(__moduleDir, "../../skills"),         // published: <pkg>/dist/server/ -> <pkg>/skills/
-  path.resolve(__moduleDir, "../../../../../skills"), // dev: src/server/ -> repo root/skills/
-];
 const CODEX_ROLLOUT_NOISE_RE =
  /^\d{4}-\d{2}-\d{2}T[^\s]+\s+ERROR\s+codex_core::rollout::list:\s+state db missing rollout path for thread\s+[a-z0-9-]+$/i;

@@ -61,39 +61,101 @@ function resolveCodexBillingType(env: Record<string, string>): "api" | "subscrip
  return hasNonEmptyEnvValue(env, "OPENAI_API_KEY") ? "api" : "subscription";
 }

-function codexHomeDir(): string {
-  const fromEnv = process.env.CODEX_HOME;
-  if (typeof fromEnv === "string" && fromEnv.trim().length > 0) return fromEnv.trim();
-  return path.join(os.homedir(), ".codex");
+function resolveCodexBiller(env: Record<string, string>, billingType: "api" | "subscription"): string {
+  const openAiCompatibleBiller = inferOpenAiCompatibleBiller(env, "openai");
+  if (openAiCompatibleBiller === "openrouter") return "openrouter";
+  return billingType === "subscription" ? "chatgpt" : openAiCompatibleBiller ?? "openai";
 }

-async function resolvePaperclipSkillsDir(): Promise<string | null> {
-  for (const candidate of PAPERCLIP_SKILLS_CANDIDATES) {
-    const isDir = await fs.stat(candidate).then((s) => s.isDirectory()).catch(() => false);
-    if (isDir) return candidate;
+async function isLikelyPaperclipRepoRoot(candidate: string): Promise<boolean> {
+  const [hasWorkspace, hasPackageJson, hasServerDir, hasAdapterUtilsDir] = await Promise.all([
+    pathExists(path.join(candidate, "pnpm-workspace.yaml")),
+    pathExists(path.join(candidate, "package.json")),
+    pathExists(path.join(candidate, "server")),
+    pathExists(path.join(candidate, "packages", "adapter-utils")),
+  ]);
+
+  return hasWorkspace && hasPackageJson && hasServerDir && hasAdapterUtilsDir;
+}
+
+async function isLikelyPaperclipRuntimeSkillSource(candidate: string, skillName: string): Promise<boolean> {
+  if (path.basename(candidate) !== skillName) return false;
+  const skillsRoot = path.dirname(candidate);
+  if (path.basename(skillsRoot) !== "skills") return false;
+  if (!(await pathExists(path.join(candidate, "SKILL.md")))) return false;
+
+  let cursor = path.dirname(skillsRoot);
+  for (let depth = 0; depth < 6; depth += 1) {
+    if (await isLikelyPaperclipRepoRoot(cursor)) return true;
+    const parent = path.dirname(cursor);
+    if (parent === cursor) break;
+    cursor = parent;
  }
-  return null;
+
+  return false;
 }

-async function ensureCodexSkillsInjected(onLog: AdapterExecutionContext["onLog"]) {
-  const skillsDir = await resolvePaperclipSkillsDir();
-  if (!skillsDir) return;
+type EnsureCodexSkillsInjectedOptions = {
+  skillsHome?: string;
+  skillsEntries?: Awaited<ReturnType<typeof listPaperclipSkillEntries>>;
+  linkSkill?: (source: string, target: string) => Promise<void>;
+};

-  const skillsHome = path.join(codexHomeDir(), "skills");
+export async function ensureCodexSkillsInjected(
+  onLog: AdapterExecutionContext["onLog"],
+  options: EnsureCodexSkillsInjectedOptions = {},
+) {
+  const skillsEntries = options.skillsEntries ?? await listPaperclipSkillEntries(__moduleDir);
+  if (skillsEntries.length === 0) return;
+
+  const skillsHome = options.skillsHome ?? path.join(resolveCodexHomeDir(process.env), "skills");
  await fs.mkdir(skillsHome, { recursive: true });
-  const entries = await fs.readdir(skillsDir, { withFileTypes: true });
-  for (const entry of entries) {
-    if (!entry.isDirectory()) continue;
-    const source = path.join(skillsDir, entry.name);
+  const removedSkills = await removeMaintainerOnlySkillSymlinks(
+    skillsHome,
+    skillsEntries.map((entry) => entry.name),
+  );
+  for (const skillName of removedSkills) {
+    await onLog(
+      "stdout",
+      `[paperclip] Removed maintainer-only Codex skill "${skillName}" from ${skillsHome}\n`,
+    );
+  }
+  const linkSkill = options.linkSkill;
+  for (const entry of skillsEntries) {
    const target = path.join(skillsHome, entry.name);
-    const existing = await fs.lstat(target).catch(() => null);
-    if (existing) continue;

    try {
-      await fs.symlink(source, target);
+      const existing = await fs.lstat(target).catch(() => null);
+      if (existing?.isSymbolicLink()) {
+        const linkedPath = await fs.readlink(target).catch(() => null);
+        const resolvedLinkedPath = linkedPath
+          ? path.resolve(path.dirname(target), linkedPath)
+          : null;
+        if (
+          resolvedLinkedPath &&
+          resolvedLinkedPath !== entry.source &&
+          (await isLikelyPaperclipRuntimeSkillSource(resolvedLinkedPath, entry.name))
+        ) {
+          await fs.unlink(target);
+          if (linkSkill) {
+            await linkSkill(entry.source, target);
+          } else {
+            await fs.symlink(entry.source, target);
+          }
+          await onLog(
+            "stdout",
+            `[paperclip] Repaired Codex skill "${entry.name}" into ${skillsHome}\n`,
+          );
+          continue;
+        }
+      }
+
+      const result = await ensurePaperclipSkillSymlink(entry.source, target, linkSkill);
+      if (result === "skipped") continue;
+
      await onLog(
-        "stderr",
-        `[paperclip] Injected Codex skill "${entry.name}" into ${skillsHome}\n`,
+        "stdout",
+        `[paperclip] ${result === "repaired" ? "Repaired" : "Injected"} Codex skill "${entry.name}" into ${skillsHome}\n`,
      );
    } catch (err) {
      await onLog(
@@ -132,6 +194,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  const workspaceRepoRef = asString(workspaceContext.repoRef, "");
  const workspaceBranch = asString(workspaceContext.branchName, "");
  const workspaceWorktreePath = asString(workspaceContext.worktreePath, "");
+  const agentHome = asString(workspaceContext.agentHome, "");
  const workspaceHints = Array.isArray(context.paperclipWorkspaces)
    ? context.paperclipWorkspaces.filter(
        (value): value is Record<string, unknown> => typeof value === "object" && value !== null,
@@ -152,12 +215,25 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  const useConfiguredInsteadOfAgentHome = workspaceSource === "agent_home" && configuredCwd.length > 0;
  const effectiveWorkspaceCwd = useConfiguredInsteadOfAgentHome ? "" : workspaceCwd;
  const cwd = effectiveWorkspaceCwd || configuredCwd || process.cwd();
-  await ensureAbsoluteDirectory(cwd, { createIfMissing: true });
-  await ensureCodexSkillsInjected(onLog);
  const envConfig = parseObject(config.env);
+  const configuredCodexHome =
+    typeof envConfig.CODEX_HOME === "string" && envConfig.CODEX_HOME.trim().length > 0
+      ? path.resolve(envConfig.CODEX_HOME.trim())
+      : null;
+  await ensureAbsoluteDirectory(cwd, { createIfMissing: true });
+  const preparedWorktreeCodexHome =
+    configuredCodexHome ? null : await prepareWorktreeCodexHome(process.env, onLog);
+  const effectiveCodexHome = configuredCodexHome ?? preparedWorktreeCodexHome;
+  await ensureCodexSkillsInjected(
+    onLog,
+    effectiveCodexHome ? { skillsHome: path.join(effectiveCodexHome, "skills") } : {},
+  );
  const hasExplicitApiKey =
    typeof envConfig.PAPERCLIP_API_KEY === "string" && envConfig.PAPERCLIP_API_KEY.trim().length > 0;
  const env: Record<string, string> = { ...buildPaperclipEnv(agent) };
+  if (effectiveCodexHome) {
+    env.CODEX_HOME = effectiveCodexHome;
+  }
  env.PAPERCLIP_RUN_ID = runId;
  const wakeTaskId =
    (typeof context.taskId === "string" && context.taskId.trim().length > 0 && context.taskId.trim()) ||
@@ -224,6 +300,9 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  if (workspaceWorktreePath) {
    env.PAPERCLIP_WORKSPACE_WORKTREE_PATH = workspaceWorktreePath;
  }
+  if (agentHome) {
+    env.AGENT_HOME = agentHome;
+  }
  if (workspaceHints.length > 0) {
    env.PAPERCLIP_WORKSPACES_JSON = JSON.stringify(workspaceHints);
  }
@@ -242,8 +321,13 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  if (!hasExplicitApiKey && authToken) {
    env.PAPERCLIP_API_KEY = authToken;
  }
-  const billingType = resolveCodexBillingType(env);
-  const runtimeEnv = ensurePathInEnv({ ...process.env, ...env });
+  const effectiveEnv = Object.fromEntries(
+    Object.entries({ ...process.env, ...env }).filter(
+      (entry): entry is [string, string] => typeof entry[1] === "string",
+    ),
+  );
+  const billingType = resolveCodexBillingType(effectiveEnv);
+  const runtimeEnv = ensurePathInEnv(effectiveEnv);
  await ensureCommandResolvable(command, cwd, runtimeEnv);

  const timeoutSec = asNumber(config.timeoutSec, 0);
@@ -270,6 +354,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  const instructionsFilePath = asString(config.instructionsFilePath, "").trim();
  const instructionsDir = instructionsFilePath ? `${path.dirname(instructionsFilePath)}/` : "";
  let instructionsPrefix = "";
+  let instructionsChars = 0;
  if (instructionsFilePath) {
    try {
      const instructionsContents = await fs.readFile(instructionsFilePath, "utf8");
@@ -277,8 +362,9 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
        `${instructionsContents}\n\n` +
        `The above agent instructions were loaded from ${instructionsFilePath}. ` +
        `Resolve any relative file references from ${instructionsDir}.\n\n`;
+      instructionsChars = instructionsPrefix.length;
      await onLog(
-        "stderr",
+        "stdout",
        `[paperclip] Loaded agent instructions file: ${instructionsFilePath}\n`,
      );
    } catch (err) {
@@ -301,7 +387,8 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
      `Configured instructionsFilePath ${instructionsFilePath}, but file could not be read; continuing without injected instructions.`,
    ];
  })();
-  const renderedPrompt = renderTemplate(promptTemplate, {
+  const bootstrapPromptTemplate = asString(config.bootstrapPromptTemplate, "");
+  const templateData = {
    agentId: agent.id,
    companyId: agent.companyId,
    runId,
@@ -309,8 +396,26 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
    agent,
    run: { id: runId, source: "on_demand" },
    context,
-  });
-  const prompt = `${instructionsPrefix}${renderedPrompt}`;
+  };
+  const renderedPrompt = renderTemplate(promptTemplate, templateData);
+  const renderedBootstrapPrompt =
+    !sessionId && bootstrapPromptTemplate.trim().length > 0
+      ? renderTemplate(bootstrapPromptTemplate, templateData).trim()
+      : "";
+  const sessionHandoffNote = asString(context.paperclipSessionHandoffMarkdown, "").trim();
+  const prompt = joinPromptSections([
+    instructionsPrefix,
+    renderedBootstrapPrompt,
+    sessionHandoffNote,
+    renderedPrompt,
+  ]);
+  const promptMetrics = {
+    promptChars: prompt.length,
+    instructionsChars,
+    bootstrapPromptChars: renderedBootstrapPrompt.length,
+    sessionHandoffChars: sessionHandoffNote.length,
+    heartbeatPromptChars: renderedPrompt.length,
+  };

  const buildArgs = (resumeSessionId: string | null) => {
    const args = ["exec", "--json"];
@@ -338,6 +443,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
        }),
        env: redactEnvForLogs(env),
        prompt,
+        promptMetrics,
        context,
      });
    }
@@ -413,6 +519,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
      sessionParams: resolvedSessionParams,
      sessionDisplayId: resolvedSessionId,
      provider: "openai",
+      biller: resolveCodexBiller(effectiveEnv, billingType),
      model,
      billingType,
      costUsd: null,
--- a/packages/adapters/codex-local/src/server/index.ts
+++ b/packages/adapters/codex-local/src/server/index.ts
@@ -1,6 +1,17 @@
-export { execute } from "./execute.js";
+export { execute, ensureCodexSkillsInjected } from "./execute.js";
 export { testEnvironment } from "./test.js";
 export { parseCodexJsonl, isCodexUnknownSessionError } from "./parse.js";
+export {
+  getQuotaWindows,
+  readCodexAuthInfo,
+  readCodexToken,
+  fetchCodexQuota,
+  fetchCodexRpcQuota,
+  mapCodexRpcQuota,
+  secondsToWindowLabel,
+  fetchWithTimeout,
+  codexHomeDir,
+} from "./quota.js";
 import type { AdapterSessionCodec } from "@paperclipai/adapter-utils";

 function readNonEmptyString(value: unknown): string | null {
--- a/packages/adapters/codex-local/src/server/quota.ts
+++ b/packages/adapters/codex-local/src/server/quota.ts
@@ -0,0 +1,556 @@
+import { spawn } from "node:child_process";
+import fs from "node:fs/promises";
+import os from "node:os";
+import path from "node:path";
+import type { ProviderQuotaResult, QuotaWindow } from "@paperclipai/adapter-utils";
+
+const CODEX_USAGE_SOURCE_RPC = "codex-rpc";
+const CODEX_USAGE_SOURCE_WHAM = "codex-wham";
+
+export function codexHomeDir(): string {
+  const fromEnv = process.env.CODEX_HOME;
+  if (typeof fromEnv === "string" && fromEnv.trim().length > 0) return fromEnv.trim();
+  return path.join(os.homedir(), ".codex");
+}
+
+interface CodexLegacyAuthFile {
+  accessToken?: string | null;
+  accountId?: string | null;
+}
+
+interface CodexTokenBlock {
+  id_token?: string | null;
+  access_token?: string | null;
+  refresh_token?: string | null;
+  account_id?: string | null;
+}
+
+interface CodexModernAuthFile {
+  OPENAI_API_KEY?: string | null;
+  tokens?: CodexTokenBlock | null;
+  last_refresh?: string | null;
+}
+
+export interface CodexAuthInfo {
+  accessToken: string;
+  accountId: string | null;
+  refreshToken: string | null;
+  idToken: string | null;
+  email: string | null;
+  planType: string | null;
+  lastRefresh: string | null;
+}
+
+function base64UrlDecode(input: string): string | null {
+  try {
+    let normalized = input.replace(/-/g, "+").replace(/_/g, "/");
+    const remainder = normalized.length % 4;
+    if (remainder > 0) normalized += "=".repeat(4 - remainder);
+    return Buffer.from(normalized, "base64").toString("utf8");
+  } catch {
+    return null;
+  }
+}
+
+function decodeJwtPayload(token: string | null | undefined): Record<string, unknown> | null {
+  if (typeof token !== "string" || token.trim().length === 0) return null;
+  const parts = token.split(".");
+  if (parts.length < 2) return null;
+  const decoded = base64UrlDecode(parts[1] ?? "");
+  if (!decoded) return null;
+  try {
+    const parsed = JSON.parse(decoded) as unknown;
+    return typeof parsed === "object" && parsed !== null ? parsed as Record<string, unknown> : null;
+  } catch {
+    return null;
+  }
+}
+
+function readNestedString(record: Record<string, unknown>, pathSegments: string[]): string | null {
+  let current: unknown = record;
+  for (const segment of pathSegments) {
+    if (typeof current !== "object" || current === null || Array.isArray(current)) return null;
+    current = (current as Record<string, unknown>)[segment];
+  }
+  return typeof current === "string" && current.trim().length > 0 ? current.trim() : null;
+}
+
+function parsePlanAndEmailFromToken(idToken: string | null, accessToken: string | null): {
+  email: string | null;
+  planType: string | null;
+} {
+  const payloads = [decodeJwtPayload(idToken), decodeJwtPayload(accessToken)].filter(
+    (value): value is Record<string, unknown> => value != null,
+  );
+  for (const payload of payloads) {
+    const directEmail = typeof payload.email === "string" ? payload.email : null;
+    const authBlock =
+      typeof payload["https://api.openai.com/auth"] === "object" &&
+      payload["https://api.openai.com/auth"] !== null &&
+      !Array.isArray(payload["https://api.openai.com/auth"])
+        ? payload["https://api.openai.com/auth"] as Record<string, unknown>
+        : null;
+    const profileBlock =
+      typeof payload["https://api.openai.com/profile"] === "object" &&
+      payload["https://api.openai.com/profile"] !== null &&
+      !Array.isArray(payload["https://api.openai.com/profile"])
+        ? payload["https://api.openai.com/profile"] as Record<string, unknown>
+        : null;
+    const email =
+      directEmail
+      ?? (typeof profileBlock?.email === "string" ? profileBlock.email : null)
+      ?? (typeof authBlock?.chatgpt_user_email === "string" ? authBlock.chatgpt_user_email : null);
+    const planType =
+      typeof authBlock?.chatgpt_plan_type === "string" ? authBlock.chatgpt_plan_type : null;
+    if (email || planType) return { email: email ?? null, planType };
+  }
+  return { email: null, planType: null };
+}
+
+export async function readCodexAuthInfo(): Promise<CodexAuthInfo | null> {
+  const authPath = path.join(codexHomeDir(), "auth.json");
+  let raw: string;
+  try {
+    raw = await fs.readFile(authPath, "utf8");
+  } catch {
+    return null;
+  }
+  let parsed: unknown;
+  try {
+    parsed = JSON.parse(raw);
+  } catch {
+    return null;
+  }
+  if (typeof parsed !== "object" || parsed === null) return null;
+  const obj = parsed as Record<string, unknown>;
+  const modern = obj as CodexModernAuthFile;
+  const legacy = obj as CodexLegacyAuthFile;
+
+  const accessToken =
+    legacy.accessToken
+    ?? modern.tokens?.access_token
+    ?? readNestedString(obj, ["tokens", "access_token"]);
+  if (typeof accessToken !== "string" || accessToken.length === 0) return null;
+
+  const accountId =
+    legacy.accountId
+    ?? modern.tokens?.account_id
+    ?? readNestedString(obj, ["tokens", "account_id"]);
+  const refreshToken =
+    modern.tokens?.refresh_token
+    ?? readNestedString(obj, ["tokens", "refresh_token"]);
+  const idToken =
+    modern.tokens?.id_token
+    ?? readNestedString(obj, ["tokens", "id_token"]);
+  const { email, planType } = parsePlanAndEmailFromToken(idToken, accessToken);
+
+  return {
+    accessToken,
+    accountId:
+      typeof accountId === "string" && accountId.trim().length > 0 ? accountId.trim() : null,
+    refreshToken:
+      typeof refreshToken === "string" && refreshToken.trim().length > 0 ? refreshToken.trim() : null,
+    idToken:
+      typeof idToken === "string" && idToken.trim().length > 0 ? idToken.trim() : null,
+    email,
+    planType,
+    lastRefresh:
+      typeof modern.last_refresh === "string" && modern.last_refresh.trim().length > 0
+        ? modern.last_refresh.trim()
+        : null,
+  };
+}
+
+export async function readCodexToken(): Promise<{ token: string; accountId: string | null } | null> {
+  const auth = await readCodexAuthInfo();
+  if (!auth) return null;
+  return { token: auth.accessToken, accountId: auth.accountId };
+}
+
+interface WhamWindow {
+  used_percent?: number | null;
+  limit_window_seconds?: number | null;
+  reset_at?: string | number | null;
+}
+
+interface WhamCredits {
+  balance?: number | null;
+  unlimited?: boolean | null;
+}
+
+interface WhamUsageResponse {
+  plan_type?: string | null;
+  rate_limit?: {
+    primary_window?: WhamWindow | null;
+    secondary_window?: WhamWindow | null;
+  } | null;
+  credits?: WhamCredits | null;
+}
+
+/**
+ * Map a window duration in seconds to a human-readable label.
+ * Falls back to the provided fallback string when seconds is null/undefined.
+ */
+export function secondsToWindowLabel(
+  seconds: number | null | undefined,
+  fallback: string,
+): string {
+  if (seconds == null) return fallback;
+  const hours = seconds / 3600;
+  if (hours < 6) return "5h";
+  if (hours <= 24) return "24h";
+  if (hours <= 168) return "7d";
+  return `${Math.round(hours / 24)}d`;
+}
+
+/** fetch with an abort-based timeout so a hanging provider api doesn't block the response indefinitely */
+export async function fetchWithTimeout(
+  url: string,
+  init: RequestInit,
+  ms = 8000,
+): Promise<Response> {
+  const controller = new AbortController();
+  const timer = setTimeout(() => controller.abort(), ms);
+  try {
+    return await fetch(url, { ...init, signal: controller.signal });
+  } finally {
+    clearTimeout(timer);
+  }
+}
+
+function normalizeCodexUsedPercent(rawPct: number | null | undefined): number | null {
+  if (rawPct == null) return null;
+  return Math.min(100, Math.round(rawPct < 1 ? rawPct * 100 : rawPct));
+}
+
+export async function fetchCodexQuota(
+  token: string,
+  accountId: string | null,
+): Promise<QuotaWindow[]> {
+  const headers: Record<string, string> = {
+    Authorization: `Bearer ${token}`,
+  };
+  if (accountId) headers["ChatGPT-Account-Id"] = accountId;
+
+  const resp = await fetchWithTimeout("https://chatgpt.com/backend-api/wham/usage", { headers });
+  if (!resp.ok) throw new Error(`chatgpt wham api returned ${resp.status}`);
+  const body = (await resp.json()) as WhamUsageResponse;
+  const windows: QuotaWindow[] = [];
+
+  const rateLimit = body.rate_limit;
+  if (rateLimit?.primary_window != null) {
+    const w = rateLimit.primary_window;
+    windows.push({
+      label: "5h limit",
+      usedPercent: normalizeCodexUsedPercent(w.used_percent),
+      resetsAt:
+        typeof w.reset_at === "number"
+          ? unixSecondsToIso(w.reset_at)
+          : (w.reset_at ?? null),
+      valueLabel: null,
+      detail: null,
+    });
+  }
+  if (rateLimit?.secondary_window != null) {
+    const w = rateLimit.secondary_window;
+    windows.push({
+      label: "Weekly limit",
+      usedPercent: normalizeCodexUsedPercent(w.used_percent),
+      resetsAt:
+        typeof w.reset_at === "number"
+          ? unixSecondsToIso(w.reset_at)
+          : (w.reset_at ?? null),
+      valueLabel: null,
+      detail: null,
+    });
+  }
+  if (body.credits != null && body.credits.unlimited !== true) {
+    const balance = body.credits.balance;
+    const valueLabel = balance != null ? `$${(balance / 100).toFixed(2)} remaining` : "N/A";
+    windows.push({
+      label: "Credits",
+      usedPercent: null,
+      resetsAt: null,
+      valueLabel,
+      detail: null,
+    });
+  }
+  return windows;
+}
+
+interface CodexRpcWindow {
+  usedPercent?: number | null;
+  windowDurationMins?: number | null;
+  resetsAt?: number | null;
+}
+
+interface CodexRpcCredits {
+  hasCredits?: boolean | null;
+  unlimited?: boolean | null;
+  balance?: string | number | null;
+}
+
+interface CodexRpcLimit {
+  limitId?: string | null;
+  limitName?: string | null;
+  primary?: CodexRpcWindow | null;
+  secondary?: CodexRpcWindow | null;
+  credits?: CodexRpcCredits | null;
+  planType?: string | null;
+}
+
+interface CodexRpcRateLimitsResult {
+  rateLimits?: CodexRpcLimit | null;
+  rateLimitsByLimitId?: Record<string, CodexRpcLimit> | null;
+}
+
+interface CodexRpcAccountResult {
+  account?: {
+    type?: string | null;
+    email?: string | null;
+    planType?: string | null;
+  } | null;
+  requiresOpenaiAuth?: boolean | null;
+}
+
+export interface CodexRpcQuotaSnapshot {
+  windows: QuotaWindow[];
+  email: string | null;
+  planType: string | null;
+}
+
+function unixSecondsToIso(value: number | null | undefined): string | null {
+  if (typeof value !== "number" || !Number.isFinite(value)) return null;
+  return new Date(value * 1000).toISOString();
+}
+
+function buildCodexRpcWindow(label: string, window: CodexRpcWindow | null | undefined): QuotaWindow | null {
+  if (!window) return null;
+  return {
+    label,
+    usedPercent: normalizeCodexUsedPercent(window.usedPercent),
+    resetsAt: unixSecondsToIso(window.resetsAt),
+    valueLabel: null,
+    detail: null,
+  };
+}
+
+function parseCreditBalance(value: string | number | null | undefined): string | null {
+  if (typeof value === "number" && Number.isFinite(value)) {
+    return `$${value.toFixed(2)} remaining`;
+  }
+  if (typeof value === "string" && value.trim().length > 0) {
+    const parsed = Number(value);
+    if (Number.isFinite(parsed)) {
+      return `$${parsed.toFixed(2)} remaining`;
+    }
+    return value.trim();
+  }
+  return null;
+}
+
+export function mapCodexRpcQuota(result: CodexRpcRateLimitsResult, account?: CodexRpcAccountResult | null): CodexRpcQuotaSnapshot {
+  const windows: QuotaWindow[] = [];
+  const limitOrder = ["codex"];
+  const limitsById = result.rateLimitsByLimitId ?? {};
+  for (const key of Object.keys(limitsById)) {
+    if (!limitOrder.includes(key)) limitOrder.push(key);
+  }
+
+  const rootLimit = result.rateLimits ?? null;
+  const allLimits = new Map<string, CodexRpcLimit>();
+  if (rootLimit?.limitId) allLimits.set(rootLimit.limitId, rootLimit);
+  for (const [key, value] of Object.entries(limitsById)) {
+    allLimits.set(key, value);
+  }
+  if (!allLimits.has("codex") && rootLimit) allLimits.set("codex", rootLimit);
+
+  for (const limitId of limitOrder) {
+    const limit = allLimits.get(limitId);
+    if (!limit) continue;
+    const prefix =
+      limitId === "codex"
+        ? ""
+        : `${limit.limitName ?? limitId} · `;
+    const primary = buildCodexRpcWindow(`${prefix}5h limit`, limit.primary);
+    if (primary) windows.push(primary);
+    const secondary = buildCodexRpcWindow(`${prefix}Weekly limit`, limit.secondary);
+    if (secondary) windows.push(secondary);
+    if (limitId === "codex" && limit.credits && limit.credits.unlimited !== true) {
+      windows.push({
+        label: "Credits",
+        usedPercent: null,
+        resetsAt: null,
+        valueLabel: parseCreditBalance(limit.credits.balance) ?? "N/A",
+        detail: null,
+      });
+    }
+  }
+
+  return {
+    windows,
+    email:
+      typeof account?.account?.email === "string" && account.account.email.trim().length > 0
+        ? account.account.email.trim()
+        : null,
+    planType:
+      typeof account?.account?.planType === "string" && account.account.planType.trim().length > 0
+        ? account.account.planType.trim()
+        : (typeof rootLimit?.planType === "string" && rootLimit.planType.trim().length > 0 ? rootLimit.planType.trim() : null),
+  };
+}
+
+type PendingRequest = {
+  resolve: (value: Record<string, unknown>) => void;
+  reject: (error: Error) => void;
+  timer: NodeJS.Timeout;
+};
+
+class CodexRpcClient {
+  private proc = spawn(
+    "codex",
+    ["-s", "read-only", "-a", "untrusted", "app-server"],
+    { stdio: ["pipe", "pipe", "pipe"], env: process.env },
+  );
+
+  private nextId = 1;
+  private buffer = "";
+  private pending = new Map<number, PendingRequest>();
+  private stderr = "";
+
+  constructor() {
+    this.proc.stdout.setEncoding("utf8");
+    this.proc.stderr.setEncoding("utf8");
+    this.proc.stdout.on("data", (chunk: string) => this.onStdout(chunk));
+    this.proc.stderr.on("data", (chunk: string) => {
+      this.stderr += chunk;
+    });
+    this.proc.on("exit", () => {
+      for (const request of this.pending.values()) {
+        clearTimeout(request.timer);
+        request.reject(new Error(this.stderr.trim() || "codex app-server closed unexpectedly"));
+      }
+      this.pending.clear();
+    });
+  }
+
+  private onStdout(chunk: string) {
+    this.buffer += chunk;
+    while (true) {
+      const newlineIndex = this.buffer.indexOf("\n");
+      if (newlineIndex < 0) break;
+      const line = this.buffer.slice(0, newlineIndex).trim();
+      this.buffer = this.buffer.slice(newlineIndex + 1);
+      if (!line) continue;
+      let parsed: Record<string, unknown>;
+      try {
+        parsed = JSON.parse(line) as Record<string, unknown>;
+      } catch {
+        continue;
+      }
+      const id = typeof parsed.id === "number" ? parsed.id : null;
+      if (id == null) continue;
+      const pending = this.pending.get(id);
+      if (!pending) continue;
+      this.pending.delete(id);
+      clearTimeout(pending.timer);
+      pending.resolve(parsed);
+    }
+  }
+
+  private request(method: string, params: Record<string, unknown> = {}, timeoutMs = 6_000): Promise<Record<string, unknown>> {
+    const id = this.nextId++;
+    const payload = JSON.stringify({ id, method, params }) + "\n";
+    return new Promise<Record<string, unknown>>((resolve, reject) => {
+      const timer = setTimeout(() => {
+        this.pending.delete(id);
+        reject(new Error(`codex app-server timed out on ${method}`));
+      }, timeoutMs);
+      this.pending.set(id, { resolve, reject, timer });
+      this.proc.stdin.write(payload);
+    });
+  }
+
+  private notify(method: string, params: Record<string, unknown> = {}) {
+    this.proc.stdin.write(JSON.stringify({ method, params }) + "\n");
+  }
+
+  async initialize() {
+    await this.request("initialize", {
+      clientInfo: {
+        name: "paperclip",
+        version: "0.0.0",
+      },
+    });
+    this.notify("initialized", {});
+  }
+
+  async fetchRateLimits(): Promise<CodexRpcRateLimitsResult> {
+    const message = await this.request("account/rateLimits/read");
+    return (message.result as CodexRpcRateLimitsResult | undefined) ?? {};
+  }
+
+  async fetchAccount(): Promise<CodexRpcAccountResult | null> {
+    try {
+      const message = await this.request("account/read");
+      return (message.result as CodexRpcAccountResult | undefined) ?? null;
+    } catch {
+      return null;
+    }
+  }
+
+  async shutdown() {
+    this.proc.kill("SIGTERM");
+  }
+}
+
+export async function fetchCodexRpcQuota(): Promise<CodexRpcQuotaSnapshot> {
+  const client = new CodexRpcClient();
+  try {
+    await client.initialize();
+    const [limits, account] = await Promise.all([
+      client.fetchRateLimits(),
+      client.fetchAccount(),
+    ]);
+    return mapCodexRpcQuota(limits, account);
+  } finally {
+    await client.shutdown();
+  }
+}
+
+function formatProviderError(source: string, error: unknown): string {
+  const message = error instanceof Error ? error.message : String(error);
+  return `${source}: ${message}`;
+}
+
+export async function getQuotaWindows(): Promise<ProviderQuotaResult> {
+  const errors: string[] = [];
+
+  try {
+    const rpc = await fetchCodexRpcQuota();
+    if (rpc.windows.length > 0) {
+      return { provider: "openai", source: CODEX_USAGE_SOURCE_RPC, ok: true, windows: rpc.windows };
+    }
+  } catch (error) {
+    errors.push(formatProviderError("Codex app-server", error));
+  }
+
+  const auth = await readCodexToken();
+  if (auth) {
+    try {
+      const windows = await fetchCodexQuota(auth.token, auth.accountId);
+      return { provider: "openai", source: CODEX_USAGE_SOURCE_WHAM, ok: true, windows };
+    } catch (error) {
+      errors.push(formatProviderError("ChatGPT WHAM usage", error));
+    }
+  } else {
+    errors.push("no local codex auth token");
+  }
+
+  return {
+    provider: "openai",
+    ok: false,
+    error: errors.join("; "),
+    windows: [],
+  };
+}
--- a/packages/adapters/codex-local/src/ui/build-config.ts
+++ b/packages/adapters/codex-local/src/ui/build-config.ts
@@ -71,6 +71,7 @@ export function buildCodexLocalConfig(v: CreateConfigValues): Record<string, unk
  if (v.cwd) ac.cwd = v.cwd;
  if (v.instructionsFilePath) ac.instructionsFilePath = v.instructionsFilePath;
  if (v.promptTemplate) ac.promptTemplate = v.promptTemplate;
+  if (v.bootstrapPrompt) ac.bootstrapPromptTemplate = v.bootstrapPrompt;
  ac.model = v.model || DEFAULT_CODEX_LOCAL_MODEL;
  if (v.thinkingEffort) ac.modelReasoningEffort = v.thinkingEffort;
  ac.timeoutSec = 0;
--- a/packages/adapters/cursor-local/CHANGELOG.md
+++ b/packages/adapters/cursor-local/CHANGELOG.md
@@ -1,5 +1,13 @@
 # @paperclipai/adapter-cursor-local

+## 0.3.1
+
+### Patch Changes
+
+- Stable release preparation for 0.3.1
+- Updated dependencies
+  - @paperclipai/adapter-utils@0.3.1
+
 ## 0.3.0

 ### Minor Changes
--- a/packages/adapters/cursor-local/package.json
+++ b/packages/adapters/cursor-local/package.json
@@ -1,6 +1,16 @@
 {
  "name": "@paperclipai/adapter-cursor-local",
-  "version": "0.3.0",
+  "version": "0.3.1",
+  "license": "MIT",
+  "homepage": "https://github.com/paperclipai/paperclip",
+  "bugs": {
+    "url": "https://github.com/paperclipai/paperclip/issues"
+  },
+  "repository": {
+    "type": "git",
+    "url": "https://github.com/paperclipai/paperclip",
+    "directory": "packages/adapters/cursor-local"
+  },
  "type": "module",
  "exports": {
    ".": "./src/index.ts",
--- a/packages/adapters/cursor-local/src/server/execute.ts
+++ b/packages/adapters/cursor-local/src/server/execute.ts
@@ -1,9 +1,8 @@
 import fs from "node:fs/promises";
-import type { Dirent } from "node:fs";
 import os from "node:os";
 import path from "node:path";
 import { fileURLToPath } from "node:url";
-import type { AdapterExecutionContext, AdapterExecutionResult } from "@paperclipai/adapter-utils";
+import { inferOpenAiCompatibleBiller, type AdapterExecutionContext, type AdapterExecutionResult } from "@paperclipai/adapter-utils";
 import {
  asString,
  asNumber,
@@ -13,8 +12,12 @@ import {
  redactEnvForLogs,
  ensureAbsoluteDirectory,
  ensureCommandResolvable,
+  ensurePaperclipSkillSymlink,
  ensurePathInEnv,
+  listPaperclipSkillEntries,
+  removeMaintainerOnlySkillSymlinks,
  renderTemplate,
+  joinPromptSections,
  runChildProcess,
 } from "@paperclipai/adapter-utils/server-utils";
 import { DEFAULT_CURSOR_LOCAL_MODEL } from "../index.js";
@@ -23,10 +26,6 @@ import { normalizeCursorStreamLine } from "../shared/stream.js";
 import { hasCursorTrustBypassArg } from "../shared/trust.js";

 const __moduleDir = path.dirname(fileURLToPath(import.meta.url));
-const PAPERCLIP_SKILLS_CANDIDATES = [
-  path.resolve(__moduleDir, "../../skills"),
-  path.resolve(__moduleDir, "../../../../../skills"),
-];

 function firstNonEmptyLine(text: string): string {
  return (
@@ -48,6 +47,17 @@ function resolveCursorBillingType(env: Record<string, string>): "api" | "subscri
    : "subscription";
 }

+function resolveCursorBiller(
+  env: Record<string, string>,
+  billingType: "api" | "subscription",
+  provider: string | null,
+): string {
+  const openAiCompatibleBiller = inferOpenAiCompatibleBiller(env, null);
+  if (openAiCompatibleBiller === "openrouter") return "openrouter";
+  if (billingType === "subscription") return "cursor";
+  return provider ?? "cursor";
+}
+
 function resolveProviderFromModel(model: string): string | null {
  const trimmed = model.trim().toLowerCase();
  if (!trimmed) return null;
@@ -82,16 +92,9 @@ function cursorSkillsHome(): string {
  return path.join(os.homedir(), ".cursor", "skills");
 }

-async function resolvePaperclipSkillsDir(): Promise<string | null> {
-  for (const candidate of PAPERCLIP_SKILLS_CANDIDATES) {
-    const isDir = await fs.stat(candidate).then((s) => s.isDirectory()).catch(() => false);
-    if (isDir) return candidate;
-  }
-  return null;
-}
-
 type EnsureCursorSkillsInjectedOptions = {
  skillsDir?: string | null;
+  skillsEntries?: Array<{ name: string; source: string }>;
  skillsHome?: string;
  linkSkill?: (source: string, target: string) => Promise<void>;
 };
@@ -100,8 +103,13 @@ export async function ensureCursorSkillsInjected(
  onLog: AdapterExecutionContext["onLog"],
  options: EnsureCursorSkillsInjectedOptions = {},
 ) {
-  const skillsDir = options.skillsDir ?? await resolvePaperclipSkillsDir();
-  if (!skillsDir) return;
+  const skillsEntries = options.skillsEntries
+    ?? (options.skillsDir
+      ? (await fs.readdir(options.skillsDir, { withFileTypes: true }))
+          .filter((entry) => entry.isDirectory())
+          .map((entry) => ({ name: entry.name, source: path.join(options.skillsDir!, entry.name) }))
+      : await listPaperclipSkillEntries(__moduleDir));
+  if (skillsEntries.length === 0) return;

  const skillsHome = options.skillsHome ?? cursorSkillsHome();
  try {
@@ -113,31 +121,26 @@ export async function ensureCursorSkillsInjected(
    );
    return;
  }
-
-  let entries: Dirent[];
-  try {
-    entries = await fs.readdir(skillsDir, { withFileTypes: true });
-  } catch (err) {
+  const removedSkills = await removeMaintainerOnlySkillSymlinks(
+    skillsHome,
+    skillsEntries.map((entry) => entry.name),
+  );
+  for (const skillName of removedSkills) {
    await onLog(
      "stderr",
-      `[paperclip] Failed to read Paperclip skills from ${skillsDir}: ${err instanceof Error ? err.message : String(err)}\n`,
+      `[paperclip] Removed maintainer-only Cursor skill "${skillName}" from ${skillsHome}\n`,
    );
-    return;
  }
-
  const linkSkill = options.linkSkill ?? ((source: string, target: string) => fs.symlink(source, target));
-  for (const entry of entries) {
-    if (!entry.isDirectory()) continue;
-    const source = path.join(skillsDir, entry.name);
+  for (const entry of skillsEntries) {
    const target = path.join(skillsHome, entry.name);
-    const existing = await fs.lstat(target).catch(() => null);
-    if (existing) continue;
-
    try {
-      await linkSkill(source, target);
+      const result = await ensurePaperclipSkillSymlink(entry.source, target, linkSkill);
+      if (result === "skipped") continue;
+
      await onLog(
        "stderr",
-        `[paperclip] Injected Cursor skill "${entry.name}" into ${skillsHome}\n`,
+        `[paperclip] ${result === "repaired" ? "Repaired" : "Injected"} Cursor skill "${entry.name}" into ${skillsHome}\n`,
      );
    } catch (err) {
      await onLog(
@@ -165,6 +168,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  const workspaceId = asString(workspaceContext.workspaceId, "");
  const workspaceRepoUrl = asString(workspaceContext.repoUrl, "");
  const workspaceRepoRef = asString(workspaceContext.repoRef, "");
+  const agentHome = asString(workspaceContext.agentHome, "");
  const workspaceHints = Array.isArray(context.paperclipWorkspaces)
    ? context.paperclipWorkspaces.filter(
        (value): value is Record<string, unknown> => typeof value === "object" && value !== null,
@@ -238,6 +242,9 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  if (workspaceRepoRef) {
    env.PAPERCLIP_WORKSPACE_REPO_REF = workspaceRepoRef;
  }
+  if (agentHome) {
+    env.AGENT_HOME = agentHome;
+  }
  if (workspaceHints.length > 0) {
    env.PAPERCLIP_WORKSPACES_JSON = JSON.stringify(workspaceHints);
  }
@@ -247,8 +254,13 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  if (!hasExplicitApiKey && authToken) {
    env.PAPERCLIP_API_KEY = authToken;
  }
-  const billingType = resolveCursorBillingType(env);
-  const runtimeEnv = ensurePathInEnv({ ...process.env, ...env });
+  const effectiveEnv = Object.fromEntries(
+    Object.entries({ ...process.env, ...env }).filter(
+      (entry): entry is [string, string] => typeof entry[1] === "string",
+    ),
+  );
+  const billingType = resolveCursorBillingType(effectiveEnv);
+  const runtimeEnv = ensurePathInEnv(effectiveEnv);
  await ensureCommandResolvable(command, cwd, runtimeEnv);

  const timeoutSec = asNumber(config.timeoutSec, 0);
@@ -277,6 +289,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
  const instructionsFilePath = asString(config.instructionsFilePath, "").trim();
  const instructionsDir = instructionsFilePath ? `${path.dirname(instructionsFilePath)}/` : "";
  let instructionsPrefix = "";
+  let instructionsChars = 0;
  if (instructionsFilePath) {
    try {
      const instructionsContents = await fs.readFile(instructionsFilePath, "utf8");
@@ -284,6 +297,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
        `${instructionsContents}\n\n` +
        `The above agent instructions were loaded from ${instructionsFilePath}. ` +
        `Resolve any relative file references from ${instructionsDir}.\n\n`;
+      instructionsChars = instructionsPrefix.length;
      await onLog(
        "stderr",
        `[paperclip] Loaded agent instructions file: ${instructionsFilePath}\n`,
@@ -316,7 +330,8 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
    return notes;
  })();

-  const renderedPrompt = renderTemplate(promptTemplate, {
+  const bootstrapPromptTemplate = asString(config.bootstrapPromptTemplate, "");
+  const templateData = {
    agentId: agent.id,
    companyId: agent.companyId,
    runId,
@@ -324,9 +339,29 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
    agent,
    run: { id: runId, source: "on_demand" },
    context,
-  });
+  };
+  const renderedPrompt = renderTemplate(promptTemplate, templateData);
+  const renderedBootstrapPrompt =
+    !sessionId && bootstrapPromptTemplate.trim().length > 0
+      ? renderTemplate(bootstrapPromptTemplate, templateData).trim()
+      : "";
+  const sessionHandoffNote = asString(context.paperclipSessionHandoffMarkdown, "").trim();
  const paperclipEnvNote = renderPaperclipEnvNote(env);
-  const prompt = `${instructionsPrefix}${paperclipEnvNote}${renderedPrompt}`;
+  const prompt = joinPromptSections([
+    instructionsPrefix,
+    renderedBootstrapPrompt,
+    sessionHandoffNote,
+    paperclipEnvNote,
+    renderedPrompt,
+  ]);
+  const promptMetrics = {
+    promptChars: prompt.length,
+    instructionsChars,
+    bootstrapPromptChars: renderedBootstrapPrompt.length,
+    sessionHandoffChars: sessionHandoffNote.length,
+    runtimeNoteChars: paperclipEnvNote.length,
+    heartbeatPromptChars: renderedPrompt.length,
+  };

  const buildArgs = (resumeSessionId: string | null) => {
    const args = ["-p", "--output-format", "stream-json", "--workspace", cwd];
@@ -349,6 +384,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
        commandArgs: args,
        env: redactEnvForLogs(env),
        prompt,
+        promptMetrics,
        context,
      });
    }
@@ -454,6 +490,7 @@ export async function execute(ctx: AdapterExecutionContext): Promise<AdapterExec
      sessionParams: resolvedSessionParams,
      sessionDisplayId: resolvedSessionId,
      provider: providerFromModel,
+      biller: resolveCursorBiller(effectiveEnv, billingType, providerFromModel),
      model,
      billingType,
      costUsd: attempt.parsed.costUsd,
--- a/packages/adapters/cursor-local/src/ui/build-config.ts
+++ b/packages/adapters/cursor-local/src/ui/build-config.ts
@@ -62,6 +62,7 @@ export function buildCursorLocalConfig(v: CreateConfigValues): Record<string, un
  if (v.cwd) ac.cwd = v.cwd;
  if (v.instructionsFilePath) ac.instructionsFilePath = v.instructionsFilePath;
  if (v.promptTemplate) ac.promptTemplate = v.promptTemplate;
+  if (v.bootstrapPrompt) ac.bootstrapPromptTemplate = v.bootstrapPrompt;
  ac.model = v.model || DEFAULT_CURSOR_LOCAL_MODEL;
  const mode = normalizeMode(v.thinkingEffort);
  if (mode) ac.mode = mode;
--- a/Show More
+++ b/Show More