Git-like version control for LLM conversations — structural trimming without semantic loss
After 30+ minutes of deep work, an LLM builds a mental model of your codebase — architecture mapped, trade-offs weighed, conventions learned. When the context window fills:
flowchart LR
subgraph BEFORE["Before Autocompaction"]
B1[132k tokens]
B2[76% capacity]
B3[Full understanding]
end
subgraph AFTER["After Autocompaction"]
A1[2.3k tokens]
A2[12% capacity]
A3[Brief summary]
end
BEFORE -->|"Native /compact"| AFTER
style BEFORE fill:#1f3d1f,stroke:#3fb950
style AFTER fill:#3d1f1f,stroke:#f85149
Just like OS virtual memory abstracts physical RAM limits, CMV abstracts context window limits:
CMV models session history as a Directed Acyclic Graph — like Git for conversations:
flowchart TB
subgraph ROOT["Initial Session"]
R1[40 min deep work]
R2[80k tokens]
R3[Full codebase mental model]
end
ROOT -->|snapshot| S1["📸 Snapshot: 'arch-complete'"]
S1 -->|branch| B1["🔀 Auth Work"]
S1 -->|branch| B2["🔀 API Refactor"]
S1 -->|branch| B3["🔀 Perf Tuning"]
B1 -->|snapshot| S2["📸 'auth-done'"]
B2 -->|snapshot| S3["📸 'api-v2'"]
S2 -->|branch| B4["🔀 OAuth Integration"]
style ROOT fill:#1f3d1f,stroke:#3fb950
style S1 fill:#1f2d3d,stroke:#58a6ff
style S2 fill:#1f2d3d,stroke:#58a6ff
style S3 fill:#1f2d3d,stroke:#58a6ff
style B1 fill:#2d1f3d,stroke:#a371f7
style B2 fill:#2d1f3d,stroke:#a371f7
style B3 fill:#2d1f3d,stroke:#a371f7
style B4 fill:#2d1f3d,stroke:#a371f7
| Operation | Description | Git Equivalent |
|---|---|---|
Snapshot(session) |
Copy JSONL conversation to immutable storage | git commit |
Branch(snapshot, trim) |
Create new session from snapshot, optionally trimmed | git checkout -b |
Trim(session) |
Snapshot + trim + branch in one step | git stash + checkout |
Tree() |
Visualize full DAG with lineage | git log --graph |
The core algorithm strips mechanical bloat while preserving every user message and assistant response verbatim:
Fast string scan to find the last native compaction marker. Everything before it is already summarized — skip it.
Collect all tool_use IDs from before the boundary. Their corresponding results will be orphaned and must be stripped to maintain API correctness.
Process each line, applying trim rules. Write cleaned output.
| Content Type | Action | Rationale |
|---|---|---|
| User messages | KEEP | Your intent, always preserved |
| Assistant responses | KEEP | Model's synthesis and reasoning |
| Tool invocations | KEEP | What files/commands were accessed |
| Tool results (>500 chars) | STUB | Replace with "[Trimmed: ~N chars]" |
| Base64 images | STRIP | Massive, can be re-sent if needed |
| Thinking blocks | STRIP | Non-portable signatures |
| File history metadata | STRIP | Internal bookkeeping |
| Orphaned tool results | STRIP | API correctness (no matching tool_use) |
flowchart LR
subgraph SESSIONS["Session Types"]
CONV["Conversational
(<15% tool bytes)
~12% reduction"]
MIXED["Mixed Tool Use
(≥15% tool bytes)
~39% reduction"]
end
subgraph TRIMMED["What's Removed"]
T1["Tool outputs"]
T2["Base64 images"]
T3["File history"]
T4["Thinking blocks"]
end
MIXED --> TRIMMED
CONV -.->|"less bloat"| TRIMMED
style CONV fill:#1f3d1f,stroke:#3fb950
style MIXED fill:#3d3d1f,stroke:#d29922
style TRIMMED fill:#3d1f1f,stroke:#f85149
Trimming invalidates the prompt cache, incurring a one-time penalty. But smaller context = lower per-turn cost:
Sessions with >30% reduction hit break-even within 15 turns. Sessions with minimal bloat correctly show trimming is unnecessary.
| Aspect | CMV | LCM |
|---|---|---|
| Core metaphor | Git for conversations | Memory hierarchy |
| Compression | Structural (strip bloat) | Semantic (LLM summaries) |
| What's removed | Tool outputs, images | Nothing (compressed) |
| Retrieval | Re-read files from disk | Expand summary to original |
| Best for | Branching workflows | Infinite single sessions |
| LLM cost | Zero (no summarization) | Per compaction cycle |
The paper's most important observation:
sequenceDiagram
participant U as User
participant CC as Claude Code
participant CMV as CMV
Note over U,CC: Deep work session (40 min)
U->>CC: Map codebase architecture
CC-->>U: Understanding built (80k tokens)
U->>CMV: snapshot("arch-complete")
CMV-->>U: ✓ Saved to DAG
Note over U,CMV: Later: need to work on auth
U->>CMV: branch("arch-complete", trim=true)
CMV->>CMV: Three-pass trim
CMV-->>CC: New session with 52k tokens
U->>CC: Implement OAuth
Note over CC: Has full arch context!
Note over U,CMV: Parallel: API work
U->>CMV: branch("arch-complete", trim=true)
CMV-->>CC: Another session, same root
U->>CC: Refactor API layer
Built for the Ori group chat — March 2026