docs: plan memory service surface API

Co-Authored-By: Paperclip <noreply@paperclip.ing>
2026-03-17 12:07:14 -05:00
parent 71de1c5877
commit 7b9718cbaa
2 changed files with 598 additions and 0 deletions
--- a/doc/memory-landscape.md
+++ b/doc/memory-landscape.md
@@ -0,0 +1,172 @@
 # Memory Landscape
 Date: 2026-03-17
 This document summarizes the memory systems referenced in task `PAP-530` and extracts the design patterns that matter for Paperclip.
 ## What Paperclip Needs From This Survey
 Paperclip is not trying to become a single opinionated memory engine. The more useful target is a control-plane memory surface that:
 - stays company-scoped
 - lets each company choose a default memory provider
 - lets specific agents override that default
 - keeps provenance back to Paperclip runs, issues, comments, and documents
 - records memory-related cost and latency the same way the rest of the control plane records work
 - works with plugin-provided providers, not only built-ins
 The question is not "which memory project wins?" The question is "what is the smallest Paperclip contract that can sit above several very different memory systems without flattening away the useful differences?"
 ## Quick Grouping
 ### Hosted memory APIs
 - `mem0`
 - `supermemory`
 - `Memori`
 These optimize for a simple application integration story: send conversation/content plus an identity, then query for relevant memory or user context later.
 ### Agent-centric memory frameworks / memory OSes
 - `MemOS`
 - `memU`
 - `EverMemOS`
 - `OpenViking`
 These treat memory as an agent runtime subsystem, not only as a search index. They usually add task memory, profiles, filesystem-style organization, async ingestion, or skill/resource management.
 ### Local-first memory stores / indexes
 - `nuggets`
 - `memsearch`
 These emphasize local persistence, inspectability, and low operational overhead. They are useful because Paperclip is local-first today and needs at least one zero-config path.
 ## Per-Project Notes
 | Project | Shape | Notable API / model | Strong fit for Paperclip | Main mismatch |
 |---|---|---|---|---|
 | [nuggets](https://github.com/NeoVertex1/nuggets) | local memory engine + messaging gateway | topic-scoped HRR memory with `remember`, `recall`, `forget`, fact promotion into `MEMORY.md` | good example of lightweight local memory and automatic promotion | very specific architecture; not a general multi-tenant service |
 | [mem0](https://github.com/mem0ai/mem0) | hosted + OSS SDK | `add`, `search`, `getAll`, `get`, `update`, `delete`, `deleteAll`; entity partitioning via `user_id`, `agent_id`, `run_id`, `app_id` | closest to a clean provider API with identities and metadata filters | provider owns extraction heavily; Paperclip should not assume every backend behaves like mem0 |
 | [MemOS](https://github.com/MemTensor/MemOS) | memory OS / framework | unified add-retrieve-edit-delete, memory cubes, multimodal memory, tool memory, async scheduler, feedback/correction | strong source for optional capabilities beyond plain search | much broader than the minimal contract Paperclip should standardize first |
 | [supermemory](https://github.com/supermemoryai/supermemory) | hosted memory + context API | `add`, `profile`, `search.memories`, `search.documents`, document upload, settings; automatic profile building and forgetting | strong example of "context bundle" rather than raw search results | heavily productized around its own ontology and hosted flow |
 | [memU](https://github.com/NevaMind-AI/memU) | proactive agent memory framework | file-system metaphor, proactive loop, intent prediction, always-on companion model | good source for when memory should trigger agent behavior, not just retrieval | proactive assistant framing is broader than Paperclip's task-centric control plane |
 | [Memori](https://github.com/MemoriLabs/Memori) | hosted memory fabric + SDK wrappers | registers against LLM SDKs, attribution via `entity_id` + `process_id`, sessions, cloud + BYODB | strong example of automatic capture around model clients | wrapper-centric design does not map 1:1 to Paperclip's run / issue / comment lifecycle |
 | [EverMemOS](https://github.com/EverMind-AI/EverMemOS) | conversational long-term memory system | MemCell extraction, structured narratives, user profiles, hybrid retrieval / reranking | useful model for provenance-rich structured memories and evolving profiles | focused on conversational memory rather than generalized control-plane events |
 | [memsearch](https://github.com/zilliztech/memsearch) | markdown-first local memory index | markdown as source of truth, `index`, `search`, `watch`, transcript parsing, plugin hooks | excellent baseline for a local built-in provider and inspectable provenance | intentionally simple; no hosted service semantics or rich correction workflow |
 | [OpenViking](https://github.com/volcengine/OpenViking) | context database | filesystem-style organization of memories/resources/skills, tiered loading, visualized retrieval trajectories | strong source for browse/inspect UX and context provenance | treats "context database" as a larger product surface than Paperclip should own |
 ## Common Primitives Across The Landscape
 Even though the systems disagree on architecture, they converge on a few primitives:
 - `ingest`: add memory from text, messages, documents, or transcripts
 - `query`: search or retrieve memory given a task, question, or scope
 - `scope`: partition memory by user, agent, project, process, or session
 - `provenance`: carry enough metadata to explain where a memory came from
 - `maintenance`: update, forget, dedupe, compact, or correct memories over time
 - `context assembly`: turn raw memories into a prompt-ready bundle for the agent
 If Paperclip does not expose these, it will not adapt well to the systems above.
 ## Where The Systems Differ
 These differences are exactly why Paperclip needs a layered contract instead of a single hard-coded engine.
 ### 1. Who owns extraction?
 - `mem0`, `supermemory`, and `Memori` expect the provider to infer memories from conversations.
 - `memsearch` expects the host to decide what markdown to write, then indexes it.
 - `MemOS`, `memU`, `EverMemOS`, and `OpenViking` sit somewhere in between and often expose richer memory construction pipelines.
 Paperclip should support both:
 - provider-managed extraction
 - Paperclip-managed extraction with provider-managed storage / retrieval
 ### 2. What is the source of truth?
 - `memsearch` and `nuggets` make the source inspectable on disk.
 - hosted APIs often make the provider store canonical.
 - filesystem-style systems like `OpenViking` and `memU` treat hierarchy itself as part of the memory model.
 Paperclip should not require a single storage shape. It should require normalized references back to Paperclip entities.
 ### 3. Is memory just search, or also profile and planning state?
 - `mem0` and `memsearch` center search and CRUD.
 - `supermemory` adds user profiles as a first-class output.
 - `MemOS`, `memU`, `EverMemOS`, and `OpenViking` expand into tool traces, task memory, resources, and skills.
 Paperclip should make plain search the minimum contract and richer outputs optional capabilities.
 ### 4. Is memory synchronous or asynchronous?
 - local tools often work synchronously in-process.
 - larger systems add schedulers, background indexing, compaction, or sync jobs.
 Paperclip needs both direct request/response operations and background maintenance hooks.
 ## Paperclip-Specific Takeaways
 ### Paperclip should own these concerns
 - binding a provider to a company and optionally overriding it per agent
 - mapping Paperclip entities into provider scopes
 - provenance back to issue comments, documents, runs, and activity
 - cost / token / latency reporting for memory work
 - browse and inspect surfaces in the Paperclip UI
 - governance on destructive operations
 ### Providers should own these concerns
 - extraction heuristics
 - embedding / indexing strategy
 - ranking and reranking
 - profile synthesis
 - contradiction resolution and forgetting logic
 - storage engine details
 ### The control-plane contract should stay small
 Paperclip does not need to standardize every feature from every provider. It needs:
 - a required portable core
 - optional capability flags for richer providers
 - a way to record provider-native ids and metadata without pretending all providers are equivalent internally
 ## Recommended Direction
 Paperclip should adopt a two-layer memory model:
 1. `Memory binding + control plane layer`
   Paperclip decides which provider key is in effect for a company, agent, or project, and it logs every memory operation with provenance and usage.
 2. `Provider adapter layer`
   A built-in or plugin-supplied adapter turns Paperclip memory requests into provider-specific calls.
 The portable core should cover:
 - ingest / write
 - search / recall
 - browse / inspect
 - get by provider record handle
 - forget / correction
 - usage reporting
 Optional capabilities can cover:
 - profile synthesis
 - async ingestion
 - multimodal content
 - tool / resource / skill memory
 - provider-native graph browsing
 That is enough to support:
 - a local markdown-first baseline similar to `memsearch`
 - hosted services similar to `mem0`, `supermemory`, or `Memori`
 - richer agent-memory systems like `MemOS` or `OpenViking`
 without forcing Paperclip itself to become a monolithic memory engine.
--- a/doc/plans/2026-03-17-memory-service-surface-api.md
+++ b/doc/plans/2026-03-17-memory-service-surface-api.md
@@ -0,0 +1,426 @@
 # Paperclip Memory Service Plan
 ## Goal
 Define a Paperclip memory service and surface API that can sit above multiple memory backends, while preserving Paperclip's control-plane requirements:
 - company scoping
 - auditability
 - provenance back to Paperclip work objects
 - budget / cost visibility
 - plugin-first extensibility
 This plan is based on the external landscape summarized in `doc/memory-landscape.md` and on the current Paperclip architecture in:
 - `doc/SPEC-implementation.md`
 - `doc/plugins/PLUGIN_SPEC.md`
 - `doc/plugins/PLUGIN_AUTHORING_GUIDE.md`
 - `packages/plugins/sdk/src/types.ts`
 ## Recommendation In One Sentence
 Paperclip should not embed one opinionated memory engine into core. It should add a company-scoped memory control plane with a small normalized adapter contract, then let built-ins and plugins implement the provider-specific behavior.
 ## Product Decisions
 ### 1. Memory is company-scoped by default
 Every memory binding belongs to exactly one company.
 That binding can then be:
 - the company default
 - an agent override
 - a project override later if we need it
 No cross-company memory sharing in the initial design.
 ### 2. Providers are selected by key
 Each configured memory provider gets a stable key inside a company, for example:
 - `default`
 - `mem0-prod`
 - `local-markdown`
 - `research-kb`
 Agents and services resolve the active provider by key, not by hard-coded vendor logic.
 ### 3. Plugins are the primary provider path
 Built-ins are useful for a zero-config local path, but most providers should arrive through the existing Paperclip plugin runtime.
 That keeps the core small and matches the current direction that optional knowledge-like systems live at the edges.
 ### 4. Paperclip owns routing, provenance, and accounting
 Providers should not decide how Paperclip entities map to governance.
 Paperclip core should own:
 - who is allowed to call a memory operation
 - which company / agent / project scope is active
 - what issue / run / comment / document the operation belongs to
 - how usage gets recorded
 ### 5. Automatic memory should be narrow at first
 Automatic capture is useful, but broad silent capture is dangerous.
 Initial automatic hooks should be:
 - post-run capture from agent runs
 - issue comment / document capture when the binding enables it
 - pre-run recall for agent context hydration
 Everything else should start explicit.
 ## Proposed Concepts
 ### Memory provider
 A built-in or plugin-supplied implementation that stores and retrieves memory.
 Examples:
 - local markdown + vector index
 - mem0 adapter
 - supermemory adapter
 - MemOS adapter
 ### Memory binding
 A company-scoped configuration record that points to a provider and carries provider-specific config.
 This is the object selected by key.
 ### Memory scope
 The normalized Paperclip scope passed into a provider request.
 At minimum:
 - `companyId`
 - optional `agentId`
 - optional `projectId`
 - optional `issueId`
 - optional `runId`
 - optional `subjectId` for external/user identity
 ### Memory source reference
 The provenance handle that explains where a memory came from.
 Supported source kinds should include:
 - `issue_comment`
 - `issue_document`
 - `issue`
 - `run`
 - `activity`
 - `manual_note`
 - `external_document`
 ### Memory operation
 A normalized write, query, browse, or delete action performed through Paperclip.
 Paperclip should log every operation, whether the provider is local or external.
 ## Required Adapter Contract
 The required core should be small enough to fit `memsearch`, `mem0`, `Memori`, `MemOS`, or `OpenViking`.
 ```ts
 export interface MemoryAdapterCapabilities {
  profile?: boolean;
  browse?: boolean;
  correction?: boolean;
  asyncIngestion?: boolean;
  multimodal?: boolean;
  providerManagedExtraction?: boolean;
 }
 export interface MemoryScope {
  companyId: string;
  agentId?: string;
  projectId?: string;
  issueId?: string;
  runId?: string;
  subjectId?: string;
 }
 export interface MemorySourceRef {
  kind:
    | "issue_comment"
    | "issue_document"
    | "issue"
    | "run"
    | "activity"
    | "manual_note"
    | "external_document";
  companyId: string;
  issueId?: string;
  commentId?: string;
  documentKey?: string;
  runId?: string;
  activityId?: string;
  externalRef?: string;
 }
 export interface MemoryUsage {
  provider: string;
  model?: string;
  inputTokens?: number;
  outputTokens?: number;
  embeddingTokens?: number;
  costCents?: number;
  latencyMs?: number;
  details?: Record<string, unknown>;
 }
 export interface MemoryWriteRequest {
  bindingKey: string;
  scope: MemoryScope;
  source: MemorySourceRef;
  content: string;
  metadata?: Record<string, unknown>;
  mode?: "append" | "upsert" | "summarize";
 }
 export interface MemoryRecordHandle {
  providerKey: string;
  providerRecordId: string;
 }
 export interface MemoryQueryRequest {
  bindingKey: string;
  scope: MemoryScope;
  query: string;
  topK?: number;
  intent?: "agent_preamble" | "answer" | "browse";
  metadataFilter?: Record<string, unknown>;
 }
 export interface MemorySnippet {
  handle: MemoryRecordHandle;
  text: string;
  score?: number;
  summary?: string;
  source?: MemorySourceRef;
  metadata?: Record<string, unknown>;
 }
 export interface MemoryContextBundle {
  snippets: MemorySnippet[];
  profileSummary?: string;
  usage?: MemoryUsage[];
 }
 export interface MemoryAdapter {
  key: string;
  capabilities: MemoryAdapterCapabilities;
  write(req: MemoryWriteRequest): Promise<{
    records?: MemoryRecordHandle[];
    usage?: MemoryUsage[];
  }>;
  query(req: MemoryQueryRequest): Promise<MemoryContextBundle>;
  get(handle: MemoryRecordHandle, scope: MemoryScope): Promise<MemorySnippet | null>;
  forget(handles: MemoryRecordHandle[], scope: MemoryScope): Promise<{ usage?: MemoryUsage[] }>;
 }
 ```
 This contract intentionally does not force a provider to expose its internal graph, filesystem, or ontology.
 ## Optional Adapter Surfaces
 These should be capability-gated, not required:
 - `browse(scope, filters)` for file-system / graph / timeline inspection
 - `correct(handle, patch)` for natural-language correction flows
 - `profile(scope)` when the provider can synthesize stable preferences or summaries
 - `sync(source)` for connectors or background ingestion
 - `explain(queryResult)` for providers that can expose retrieval traces
 ## What Paperclip Should Persist
 Paperclip should not mirror the full provider memory corpus into Postgres unless the provider is a Paperclip-managed local provider.
 Paperclip core should persist:
 - memory bindings and overrides
 - provider keys and capability metadata
 - normalized memory operation logs
 - provider record handles returned by operations when available
 - source references back to issue comments, documents, runs, and activity
 - usage and cost data
 For external providers, the memory payload itself can remain in the provider.
 ## Hook Model
 ### Automatic hooks
 These should be low-risk and easy to reason about:
 1. `pre-run hydrate`
   Before an agent run starts, Paperclip may call `query(... intent = "agent_preamble")` using the active binding.
 2. `post-run capture`
   After a run finishes, Paperclip may write a summary or transcript-derived note tied to the run.
 3. `issue comment / document capture`
   When enabled on the binding, Paperclip may capture selected issue comments or issue documents as memory sources.
 ### Explicit hooks
 These should be tool- or UI-driven first:
 - `memory.search`
 - `memory.note`
 - `memory.forget`
 - `memory.correct`
 - `memory.browse`
 ### Not automatic in the first version
 - broad web crawling
 - silent import of arbitrary repo files
 - cross-company memory sharing
 - automatic destructive deletion
 - provider migration between bindings
 ## Agent UX Rules
 Paperclip should give agents both automatic recall and explicit tools, with simple guidance:
 - use `memory.search` when the task depends on prior decisions, people, projects, or long-running context that is not in the current issue thread
 - use `memory.note` when a durable fact, preference, or decision should survive this run
 - use `memory.correct` when the user explicitly says prior context is wrong
 - rely on post-run auto-capture for ordinary session residue so agents do not have to write memory notes for every trivial exchange
 This keeps memory available without forcing every agent prompt to become a memory-management protocol.
 ## Browse And Inspect Surface
 Paperclip needs a first-class UI for memory, otherwise providers become black boxes.
 The initial browse surface should support:
 - active binding by company and agent
 - recent memory operations
 - recent write sources
 - query results with source backlinks
 - filters by agent, issue, run, source kind, and date
 - provider usage / cost / latency summaries
 When a provider supports richer browsing, the plugin can add deeper views through the existing plugin UI surfaces.
 ## Cost And Evaluation
 Every adapter response should be able to return usage records.
 Paperclip should roll up:
 - memory inference tokens
 - embedding tokens
 - external provider cost
 - latency
 - query count
 - write count
 It should also record evaluation-oriented metrics where possible:
 - recall hit rate
 - empty query rate
 - manual correction count
 - per-binding success / failure counts
 This is important because a memory system that "works" but silently burns budget is not acceptable in Paperclip.
 ## Suggested Data Model Additions
 At the control-plane level, the likely new core tables are:
 - `memory_bindings`
  - company-scoped key
  - provider id / plugin id
  - config blob
  - enabled status
 - `memory_binding_targets`
  - target type (`company`, `agent`, later `project`)
  - target id
  - binding id
 - `memory_operations`
  - company id
  - binding id
  - operation type (`write`, `query`, `forget`, `browse`, `correct`)
  - scope fields
  - source refs
  - usage / latency / cost
  - success / error
 Provider-specific long-form state should stay in plugin state or the provider itself unless a built-in local provider needs its own schema.
 ## Recommended First Built-In
 The best zero-config built-in is a local markdown-first provider with optional semantic indexing.
 Why:
 - it matches Paperclip's local-first posture
 - it is inspectable
 - it is easy to back up and debug
 - it gives the system a baseline even without external API keys
 The design should still treat that built-in as just another provider behind the same control-plane contract.
 ## Rollout Phases
 ### Phase 1: Control-plane contract
 - add memory binding models and API types
 - add plugin capability / registration surface for memory providers
 - add operation logging and usage reporting
 ### Phase 2: One built-in + one plugin example
 - ship a local markdown-first provider
 - ship one hosted adapter example to validate the external-provider path
 ### Phase 3: UI inspection
 - add company / agent memory settings
 - add a memory operation explorer
 - add source backlinks to issues and runs
 ### Phase 4: Automatic hooks
 - pre-run hydrate
 - post-run capture
 - selected issue comment / document capture
 ### Phase 5: Rich capabilities
 - correction flows
 - provider-native browse / graph views
 - project-level overrides if needed
 - evaluation dashboards
 ## Open Questions
 - Should project overrides exist in V1 of the memory service, or should we force company default + agent override first?
 - Do we want Paperclip-managed extraction pipelines at all, or should built-ins be the only place where Paperclip owns extraction?
 - Should memory usage extend the current `cost_events` model directly, or should memory operations keep a parallel usage log and roll up into `cost_events` secondarily?
 - Do we want provider install / binding changes to require approvals for some companies?
 ## Bottom Line
 The right abstraction is:
 - Paperclip owns memory bindings, scopes, provenance, governance, and usage reporting.
 - Providers own extraction, ranking, storage, and provider-native memory semantics.
 That gives Paperclip a stable "memory service" without locking the product to one memory philosophy or one vendor.