CMV: Contextual Memory Virtualisation

The Problem: Context Ephemerality

After 30+ minutes of deep work, an LLM builds a mental model of your codebase — architecture mapped, trade-offs weighed, conventions learned. When the context window fills:

flowchart LR
    subgraph BEFORE["Before Autocompaction"]
        B1[132k tokens]
        B2[76% capacity]
        B3[Full understanding]
    end
    
    subgraph AFTER["After Autocompaction"]
        A1[2.3k tokens]
        A2[12% capacity]
        A3[Brief summary]
    end
    
    BEFORE -->|"Native /compact"| AFTER
    
    style BEFORE fill:#1f3d1f,stroke:#3fb950
    style AFTER fill:#3d1f1f,stroke:#f85149

98% reduction. Hours of accumulated understanding reduced to a few sentences. Each new session starts from scratch.

The Insight: Virtual Memory for LLMs

Just like OS virtual memory abstracts physical RAM limits, CMV abstracts context window limits:

Operating System

Pages swapped to disk
Process sees infinite memory
Demand paging on access
Memory-mapped files

CMV

Context snapshots to disk
Agent sees infinite sessions
Branch from any snapshot
Re-read files on demand

The DAG State Model

CMV models session history as a Directed Acyclic Graph — like Git for conversations:

flowchart TB
    subgraph ROOT["Initial Session"]
        R1[40 min deep work]
        R2[80k tokens]
        R3[Full codebase mental model]
    end
    
    ROOT -->|snapshot| S1["📸 Snapshot: 'arch-complete'"]
    
    S1 -->|branch| B1["🔀 Auth Work"]
    S1 -->|branch| B2["🔀 API Refactor"]
    S1 -->|branch| B3["🔀 Perf Tuning"]
    
    B1 -->|snapshot| S2["📸 'auth-done'"]
    B2 -->|snapshot| S3["📸 'api-v2'"]
    
    S2 -->|branch| B4["🔀 OAuth Integration"]
    
    style ROOT fill:#1f3d1f,stroke:#3fb950
    style S1 fill:#1f2d3d,stroke:#58a6ff
    style S2 fill:#1f2d3d,stroke:#58a6ff
    style S3 fill:#1f2d3d,stroke:#58a6ff
    style B1 fill:#2d1f3d,stroke:#a371f7
    style B2 fill:#2d1f3d,stroke:#a371f7
    style B3 fill:#2d1f3d,stroke:#a371f7
    style B4 fill:#2d1f3d,stroke:#a371f7

Core Operations

Operation	Description	Git Equivalent
`Snapshot(session)`	Copy JSONL conversation to immutable storage	git commit
`Branch(snapshot, trim)`	Create new session from snapshot, optionally trimmed	git checkout -b
`Trim(session)`	Snapshot + trim + branch in one step	git stash + checkout
`Tree()`	Visualize full DAG with lineage	git log --graph

Key benefit: Spend 40 minutes building understanding once, then branch unlimited times without repeating the context-building phase.

Three-Pass Structurally Lossless Trimming

The core algorithm strips mechanical bloat while preserving every user message and assistant response verbatim:

Pass 1: Compaction Boundary Detection

Fast string scan to find the last native compaction marker. Everything before it is already summarized — skip it.

Pass 2: Pre-Boundary Tool ID Collection

Collect all tool_use IDs from before the boundary. Their corresponding results will be orphaned and must be stripped to maintain API correctness.

Pass 3: Stream-Process with Trim Rules

Process each line, applying trim rules. Write cleaned output.

What Gets Trimmed?

Content Type	Action	Rationale
User messages	KEEP	Your intent, always preserved
Assistant responses	KEEP	Model's synthesis and reasoning
Tool invocations	KEEP	What files/commands were accessed
Tool results (>500 chars)	STUB	Replace with "[Trimmed: ~N chars]"
Base64 images	STRIP	Massive, can be re-sent if needed
Thinking blocks	STRIP	Non-portable signatures
File history metadata	STRIP	Internal bookkeeping
Orphaned tool results	STRIP	API correctness (no matching tool_use)

"Structurally lossless" = If the model needs file contents again after trimming, it simply re-reads the file. The conversation (intent + synthesis) is never touched.

Empirical Results (76 Real Sessions)

20%

Mean Reduction

86%

Max Reduction

39%

Mixed Tool Sessions

10

Turns to Break-Even

flowchart LR
    subgraph SESSIONS["Session Types"]
        CONV["Conversational
(<15% tool bytes)
~12% reduction"]
        MIXED["Mixed Tool Use
(≥15% tool bytes)
~39% reduction"]
    end
    
    subgraph TRIMMED["What's Removed"]
        T1["Tool outputs"]
        T2["Base64 images"]
        T3["File history"]
        T4["Thinking blocks"]
    end
    
    MIXED --> TRIMMED
    CONV -.->|"less bloat"| TRIMMED
    
    style CONV fill:#1f3d1f,stroke:#3fb950
    style MIXED fill:#3d3d1f,stroke:#d29922
    style TRIMMED fill:#3d1f1f,stroke:#f85149

Break-Even Analysis

Trimming invalidates the prompt cache, incurring a one-time penalty. But smaller context = lower per-turn cost:

# Cost model (with 90% cache hit rate)

Session: 84k tokens → 46k tokens (46% reduction)

Turn 1 (cold cache):  $0.53  ← penalty, full write rate
Turn 2+ (cached):    $0.04  ← now caching smaller prefix

Break-even: Turn 6

After 20 turns: $0.55 saved vs untrimmed

Sessions with >30% reduction hit break-even within 15 turns. Sessions with minimal bloat correctly show trimming is unnecessary.

CMV vs LCM: Different Problems

Aspect	CMV	LCM
Core metaphor	Git for conversations	Memory hierarchy
Compression	Structural (strip bloat)	Semantic (LLM summaries)
What's removed	Tool outputs, images	Nothing (compressed)
Retrieval	Re-read files from disk	Expand summary to original
Best for	Branching workflows	Infinite single sessions
LLM cost	Zero (no summarization)	Per compaction cycle

They're complementary! Run CMV's structural trim first (strip tool outputs), then LCM's semantic compression (summarize the rest). Maximum compression with minimal information loss.

The Unquantified Cost: Context Rebuilding

The paper's most important observation:

# Starting fresh vs branching from snapshot

Fresh session:
  - Re-read files
  - Re-derive architecture
  - Re-establish conventions
  - 10-20 turns, 15-30 minutes
  - Cumulative cost grows quadratically

Branch from snapshot:
  - Full prior context in one prompt
  - $0.53 at cache-write rate (84k tokens)
  - $0.04 on subsequent cache hits
  - Instant

The primary value isn't trimming — it's avoiding context rebuilding entirely. Trimming just makes the branched sessions more economical to continue.

Example Workflow

sequenceDiagram
    participant U as User
    participant CC as Claude Code
    participant CMV as CMV
    
    Note over U,CC: Deep work session (40 min)
    U->>CC: Map codebase architecture
    CC-->>U: Understanding built (80k tokens)
    
    U->>CMV: snapshot("arch-complete")
    CMV-->>U: ✓ Saved to DAG
    
    Note over U,CMV: Later: need to work on auth
    U->>CMV: branch("arch-complete", trim=true)
    CMV->>CMV: Three-pass trim
    CMV-->>CC: New session with 52k tokens
    
    U->>CC: Implement OAuth
    Note over CC: Has full arch context!
    
    Note over U,CMV: Parallel: API work
    U->>CMV: branch("arch-complete", trim=true)
    CMV-->>CC: Another session, same root
    U->>CC: Refactor API layer

Limitations

⚠️ Blind Trimming

Strips by type, not importance
If stripped content is needed, model may hallucinate or re-read
Mitigated: model's synthesis is preserved

📊 Single-User Study

76 sessions from one user
May not generalize
Byte-to-token estimation imprecise for images

Resources

Built for the Ori group chat — March 2026

← Back to all visualizations

🔀 CMV: Contextual Memory Virtualisation

The Problem: Context Ephemerality

The Insight: Virtual Memory for LLMs

Operating System

CMV

The DAG State Model

Core Operations

Three-Pass Structurally Lossless Trimming

Pass 1: Compaction Boundary Detection

Pass 2: Pre-Boundary Tool ID Collection

Pass 3: Stream-Process with Trim Rules

What Gets Trimmed?

Empirical Results (76 Real Sessions)

Break-Even Analysis

CMV vs LCM: Different Problems

The Unquantified Cost: Context Rebuilding

Example Workflow

Limitations

⚠️ Blind Trimming

📊 Single-User Study

Resources