Streaming Engine
Shim’s streaming engine solves three core problems with LLM streaming outputs.The Three Problems
1. Buffer Bottleneck
Problem: Most JSON parsers require the full string. Waiting for full output adds 2-5 seconds of latency. Solution: Shim attemptsJSON.parse() on every chunk. Returns partial object as soon as parseable.
2. Markdown Fence Trap
Problem: LLMs often wrap JSON in markdown fences:```json ... `````` ... ```- Leading/trailing text before
{or[
3. Numerical Gyrations
Problem: Partial numbers cause UI flicker:- Trailing decimal:
0. - Leading decimal:
.5 - Partial escape:
\u00 - Partial string:
"incomplete
Architecture
Session State Machine
State Transitions
Key Algorithms
Structural Completeness Check
Incomplete Token Detection
Brace Tracking
Performance Characteristics
Time Complexity
| Operation | Complexity | Notes |
|---|---|---|
| Push chunk | O(n) | Linear scan of chunk |
| Parse attempt | O(m) | m = buffer size |
| Finalize | O(m) | Full repair pipeline |
Memory Usage
- Buffer: O(m) where m = total output size
- State: O(1) constant overhead
- Circuit breaker: 5MB max buffer
Throughput
- Chunks/sec: 1M+ per Worker
- Latency/push:
<1msaverage - Parse attempts: 1 per push
Safety Features
Circuit Breaker
Terminates sessions at 5MB buffer:Session Expiration
Sessions expire after 60 seconds:Junk Seek
Ignores data before first{ or [:
Comparison: Streaming vs Batch
| Metric | Streaming | Batch |
|---|---|---|
| Latency | <1ms per chunk | 2-10ms total |
| Memory | O(m) buffer | O(m) input |
| Use case | Real-time UIs | Completed outputs |
| Complexity | Stateful sessions | Stateless calls |
| Preview | Yes (partial) | No |
When to Use Streaming
Use Streaming When:- LLM is streaming tokens
- You want real-time UI updates
- Output size is large (
>1MB) - You need progress indicators
- You have the full output
- Real-time updates aren’t needed
- Output is small (
<100KB) - You want simplicity
Next Steps
Streaming Guide
Best practices for streaming
Streaming API
API reference
Confidence Levels
Understand confidence scoring
TypeScript SDK
Use the official SDK
