← Back to index

🔀 CMV: Contextual Memory Virtualisation

Git-like version control for LLM conversations — structural trimming without semantic loss

The Problem: Context Ephemerality

After 30+ minutes of deep work, an LLM builds a mental model of your codebase — architecture mapped, trade-offs weighed, conventions learned. When the context window fills:

flowchart LR
    subgraph BEFORE["Before Autocompaction"]
        B1[132k tokens]
        B2[76% capacity]
        B3[Full understanding]
    end
    
    subgraph AFTER["After Autocompaction"]
        A1[2.3k tokens]
        A2[12% capacity]
        A3[Brief summary]
    end
    
    BEFORE -->|"Native /compact"| AFTER
    
    style BEFORE fill:#1f3d1f,stroke:#3fb950
    style AFTER fill:#3d1f1f,stroke:#f85149
            
98% reduction. Hours of accumulated understanding reduced to a few sentences. Each new session starts from scratch.

The Insight: Virtual Memory for LLMs

Just like OS virtual memory abstracts physical RAM limits, CMV abstracts context window limits:

Operating System

  • Pages swapped to disk
  • Process sees infinite memory
  • Demand paging on access
  • Memory-mapped files

CMV

  • Context snapshots to disk
  • Agent sees infinite sessions
  • Branch from any snapshot
  • Re-read files on demand

The DAG State Model

CMV models session history as a Directed Acyclic Graph — like Git for conversations:

flowchart TB
    subgraph ROOT["Initial Session"]
        R1[40 min deep work]
        R2[80k tokens]
        R3[Full codebase mental model]
    end
    
    ROOT -->|snapshot| S1["📸 Snapshot: 'arch-complete'"]
    
    S1 -->|branch| B1["🔀 Auth Work"]
    S1 -->|branch| B2["🔀 API Refactor"]
    S1 -->|branch| B3["🔀 Perf Tuning"]
    
    B1 -->|snapshot| S2["📸 'auth-done'"]
    B2 -->|snapshot| S3["📸 'api-v2'"]
    
    S2 -->|branch| B4["🔀 OAuth Integration"]
    
    style ROOT fill:#1f3d1f,stroke:#3fb950
    style S1 fill:#1f2d3d,stroke:#58a6ff
    style S2 fill:#1f2d3d,stroke:#58a6ff
    style S3 fill:#1f2d3d,stroke:#58a6ff
    style B1 fill:#2d1f3d,stroke:#a371f7
    style B2 fill:#2d1f3d,stroke:#a371f7
    style B3 fill:#2d1f3d,stroke:#a371f7
    style B4 fill:#2d1f3d,stroke:#a371f7
            

Core Operations

Operation Description Git Equivalent
Snapshot(session) Copy JSONL conversation to immutable storage git commit
Branch(snapshot, trim) Create new session from snapshot, optionally trimmed git checkout -b
Trim(session) Snapshot + trim + branch in one step git stash + checkout
Tree() Visualize full DAG with lineage git log --graph
Key benefit: Spend 40 minutes building understanding once, then branch unlimited times without repeating the context-building phase.

Three-Pass Structurally Lossless Trimming

The core algorithm strips mechanical bloat while preserving every user message and assistant response verbatim:

Pass 1: Compaction Boundary Detection

Fast string scan to find the last native compaction marker. Everything before it is already summarized — skip it.

Pass 2: Pre-Boundary Tool ID Collection

Collect all tool_use IDs from before the boundary. Their corresponding results will be orphaned and must be stripped to maintain API correctness.

Pass 3: Stream-Process with Trim Rules

Process each line, applying trim rules. Write cleaned output.

What Gets Trimmed?

Content Type Action Rationale
User messages KEEP Your intent, always preserved
Assistant responses KEEP Model's synthesis and reasoning
Tool invocations KEEP What files/commands were accessed
Tool results (>500 chars) STUB Replace with "[Trimmed: ~N chars]"
Base64 images STRIP Massive, can be re-sent if needed
Thinking blocks STRIP Non-portable signatures
File history metadata STRIP Internal bookkeeping
Orphaned tool results STRIP API correctness (no matching tool_use)
"Structurally lossless" = If the model needs file contents again after trimming, it simply re-reads the file. The conversation (intent + synthesis) is never touched.

Empirical Results (76 Real Sessions)

20%
Mean Reduction
86%
Max Reduction
39%
Mixed Tool Sessions
10
Turns to Break-Even
flowchart LR
    subgraph SESSIONS["Session Types"]
        CONV["Conversational
(<15% tool bytes)
~12% reduction"] MIXED["Mixed Tool Use
(≥15% tool bytes)
~39% reduction"] end subgraph TRIMMED["What's Removed"] T1["Tool outputs"] T2["Base64 images"] T3["File history"] T4["Thinking blocks"] end MIXED --> TRIMMED CONV -.->|"less bloat"| TRIMMED style CONV fill:#1f3d1f,stroke:#3fb950 style MIXED fill:#3d3d1f,stroke:#d29922 style TRIMMED fill:#3d1f1f,stroke:#f85149

Break-Even Analysis

Trimming invalidates the prompt cache, incurring a one-time penalty. But smaller context = lower per-turn cost:

# Cost model (with 90% cache hit rate) Session: 84k tokens → 46k tokens (46% reduction) Turn 1 (cold cache): $0.53 ← penalty, full write rate Turn 2+ (cached): $0.04 ← now caching smaller prefix Break-even: Turn 6 After 20 turns: $0.55 saved vs untrimmed

Sessions with >30% reduction hit break-even within 15 turns. Sessions with minimal bloat correctly show trimming is unnecessary.

CMV vs LCM: Different Problems

Aspect CMV LCM
Core metaphor Git for conversations Memory hierarchy
Compression Structural (strip bloat) Semantic (LLM summaries)
What's removed Tool outputs, images Nothing (compressed)
Retrieval Re-read files from disk Expand summary to original
Best for Branching workflows Infinite single sessions
LLM cost Zero (no summarization) Per compaction cycle
They're complementary! Run CMV's structural trim first (strip tool outputs), then LCM's semantic compression (summarize the rest). Maximum compression with minimal information loss.

The Unquantified Cost: Context Rebuilding

The paper's most important observation:

# Starting fresh vs branching from snapshot Fresh session: - Re-read files - Re-derive architecture - Re-establish conventions - 10-20 turns, 15-30 minutes - Cumulative cost grows quadratically Branch from snapshot: - Full prior context in one prompt - $0.53 at cache-write rate (84k tokens) - $0.04 on subsequent cache hits - Instant
The primary value isn't trimming — it's avoiding context rebuilding entirely. Trimming just makes the branched sessions more economical to continue.

Example Workflow

sequenceDiagram
    participant U as User
    participant CC as Claude Code
    participant CMV as CMV
    
    Note over U,CC: Deep work session (40 min)
    U->>CC: Map codebase architecture
    CC-->>U: Understanding built (80k tokens)
    
    U->>CMV: snapshot("arch-complete")
    CMV-->>U: ✓ Saved to DAG
    
    Note over U,CMV: Later: need to work on auth
    U->>CMV: branch("arch-complete", trim=true)
    CMV->>CMV: Three-pass trim
    CMV-->>CC: New session with 52k tokens
    
    U->>CC: Implement OAuth
    Note over CC: Has full arch context!
    
    Note over U,CMV: Parallel: API work
    U->>CMV: branch("arch-complete", trim=true)
    CMV-->>CC: Another session, same root
    U->>CC: Refactor API layer
            

Limitations

⚠️ Blind Trimming

  • Strips by type, not importance
  • If stripped content is needed, model may hallucinate or re-read
  • Mitigated: model's synthesis is preserved

📊 Single-User Study

  • 76 sessions from one user
  • May not generalize
  • Byte-to-token estimation imprecise for images

Resources

Built for the Ori group chat — March 2026

← Back to all visualizations