Streaming Repair Guide
Batch repair waits for the full output. Streaming repair processes JSON token-by-token as it arrives. No buffering delays.Why Streaming Matters
The Problem:- LLMs stream tokens one at a time
- Your UI wants to show progress
- OutputFixingParser waits for full output (2-5s delay)
How It Works
Three-Step Process
- Start Session → Get
session_id - Push Chunks → Send tokens as they arrive, get state back
- Finalize → Get repaired result with confidence score
Session Lifecycle
State Flags
Each push returns aStreamingState object:
| Flag | Meaning | Use Case |
|---|---|---|
structurally_complete | Braces balanced, not in string | Safe to finalize |
json_parseable | JSON.parse() will succeed | Safe to parse |
safe_to_emit | Parseable + no critical issues | Safe to show user |
partial | Parsed object (or null) | Preview data |
Example State Progression
Buffer Bottleneck Solution
Problem: Waiting for full output adds latency. Solution: Shim’s streaming engine:- Strips markdown fences early
- Holds incomplete tokens (e.g.,
"0.") - Attempts parse on every chunk
- Returns
partialas soon as parseable
Numerical Gyrations
Partial numbers can cause UI flicker:Session Management
Expiration
Sessions expire after 60 seconds of inactivity.Circuit Breaker
Sessions terminate at 5MB buffer (hallucination loop protection).Complete Example
Best Practices
1. Check safe_to_emit Before Displaying
2. Handle Session Expiration
3. Set Max Tokens on LLM
Prevent hallucination loops:4. Finalize When structurally_complete
Performance
- Latency:
<1msper push - Memory: O(m) buffer where m = total output size (circuit breaker at 5MB)
- Throughput: 1M+ chunks/sec per Worker
Next Steps
Streaming API Reference
Full API documentation
TypeScript SDK
Use the official SDK
Error Handling
Handle errors gracefully
Confidence Levels
Understand confidence scoring
