Streaming Engine

Shim’s streaming engine solves three core problems with LLM streaming outputs.

The Three Problems

1. Buffer Bottleneck

Problem: Most JSON parsers require the full string. Waiting for full output adds 2-5 seconds of latency. Solution: Shim attempts JSON.parse() on every chunk. Returns partial object as soon as parseable.

// Chunk 1: '{"name": "Jo'
state.json_parseable = false;  // Can't parse yet
state.partial = null;

// Chunk 2: 'hn", "age": 30}'
state.json_parseable = true;   // Parseable now
state.partial = { name: "John", age: 30 };

2. Markdown Fence Trap

Problem: LLMs often wrap JSON in markdown fences:

```json
{"data": "value"}
```

Parsers fail because of the fence. Solution: Shim strips fences early in the pipeline:

// Input: '```json\n{"name": "John"}\n```'
// After fence removal: '{"name": "John"}'

Fence patterns detected:

```json ... ```
``` ... ```
Leading/trailing text before { or [

3. Numerical Gyrations

Problem: Partial numbers cause UI flicker:

// Chunk 1: '{"score": 0'
partial = { score: 0 };  // UI shows 0

// Chunk 2: '.5'
partial = { score: 0.5 }; // UI updates to 0.5 (flicker)

Solution: Shim detects incomplete tokens and holds them:

// Chunk 1: '{"score": 0'
state.incomplete_token = "0";  // Held
state.partial = null;           // Not emitted

// Chunk 2: '.5'
state.incomplete_token = "";    // Released
state.partial = { score: 0.5 }; // Emitted complete

Incomplete token patterns:

Trailing decimal: 0.
Leading decimal: .5
Partial escape: \u00
Partial string: "incomplete

Architecture

Session State Machine

START → ACCUMULATING → PARSEABLE → FINALIZED
  ↓          ↓             ↓            ↓
buffer   structural    partial      repaired
empty    complete      object       + metadata

State Transitions

// 1. START
session = new StreamingRepairSession();
// buffer: ""
// brace_depth: 0
// in_string: false

// 2. Push: '{"name": "Jo'
session.push('{"name": "Jo');
// buffer: '{"name": "Jo'
// brace_depth: 1
// in_string: true
// structurally_complete: false

// 3. Push: 'hn", "age": 30}'
session.push('hn", "age": 30}');
// buffer: '{"name": "John", "age": 30}'
// brace_depth: 0
// in_string: false
// structurally_complete: true
// json_parseable: true
// partial: { name: "John", age: 30 }

// 4. Finalize
result = session.finalize();
// success: true
// repaired: { name: "John", age: 30 }
// metadata.confidence: "high"

Key Algorithms

Structural Completeness Check

function isStructurallyComplete(state): boolean {
  return (
    state.brace_depth === 0 &&
    !state.in_string &&
    state.incomplete_token === '' &&
    state.buffer.trim().length > 0
  );
}

Incomplete Token Detection

function detectIncompleteToken(buffer: string): string {
  // Trailing decimal: "0."
  if (/\d\.$/.test(buffer)) {
    return buffer.match(/\d\.$/)[0];
  }

  // Partial escape: "\u00"
  if (/\\u[0-9a-fA-F]{0,3}$/.test(buffer)) {
    return buffer.match(/\\u[0-9a-fA-F]{0,3}$/)[0];
  }

  // Unclosed string: '"text
  if (/"[^"]*$/.test(buffer)) {
    return buffer.match(/"[^"]*$/)[0];
  }

  return '';
}

Brace Tracking

function updateBraceDepth(char: string, state): void {
  if (state.in_string) {
    return; // Ignore braces in strings
  }

  if (char === '{' || char === '[') {
    state.brace_depth++;
  }

  if (char === '}' || char === ']') {
    state.brace_depth--;
  }
}

Performance Characteristics

Time Complexity

Operation	Complexity	Notes
Push chunk	O(n)	Linear scan of chunk
Parse attempt	O(m)	m = buffer size
Finalize	O(m)	Full repair pipeline

Memory Usage

Buffer: O(m) where m = total output size
State: O(1) constant overhead
Circuit breaker: 5MB max buffer

Throughput

Chunks/sec: 1M+ per Worker
Latency/push: <1ms average
Parse attempts: 1 per push

Safety Features

Circuit Breaker

Terminates sessions at 5MB buffer:

if (buffer.length > MAX_BUFFER_SIZE) {
  throw {
    code: 'BUFFER_SIZE_EXCEEDED',
    message: 'Possible hallucination loop',
    severity: 'critical'
  };
}

Session Expiration

Sessions expire after 60 seconds:

if (Date.now() - session.createdAt > 60000) {
  throw {
    code: 'SESSION_NOT_FOUND',
    message: 'Session expired',
    severity: 'critical'
  };
}

Junk Seek

Ignores data before first { or [:

const jsonStart = buffer.search(/[{\[]/);

if (jsonStart > 0) {
  buffer = buffer.substring(jsonStart);
}

Comparison: Streaming vs Batch

Metric	Streaming	Batch
Latency	`<1ms` per chunk	2-10ms total
Memory	O(m) buffer	O(m) input
Use case	Real-time UIs	Completed outputs
Complexity	Stateful sessions	Stateless calls
Preview	Yes (`partial`)	No

When to Use Streaming

Use Streaming When:

LLM is streaming tokens
You want real-time UI updates
Output size is large (>1MB)
You need progress indicators

Use Batch When:

You have the full output
Real-time updates aren’t needed
Output is small (<100KB)
You want simplicity

Next Steps

Streaming Guide

Best practices for streaming

Streaming API

API reference

Confidence Levels

Understand confidence scoring

TypeScript SDK

Use the official SDK

Getting Started

Guides

Concepts

API Reference

SDKs

Migration

Streaming Engine

Streaming Engine

The Three Problems

1. Buffer Bottleneck

2. Markdown Fence Trap

3. Numerical Gyrations

Architecture

Session State Machine

State Transitions

Key Algorithms

Structural Completeness Check

Incomplete Token Detection

Brace Tracking

Performance Characteristics

Time Complexity

Memory Usage

Throughput

Safety Features

Circuit Breaker

Session Expiration

Junk Seek

Comparison: Streaming vs Batch

When to Use Streaming

Next Steps

Streaming Guide

Streaming API

Confidence Levels

TypeScript SDK

​Streaming Engine

​The Three Problems

​1. Buffer Bottleneck

​2. Markdown Fence Trap

​3. Numerical Gyrations

​Architecture

​Session State Machine

​State Transitions

​Key Algorithms

​Structural Completeness Check

​Incomplete Token Detection

​Brace Tracking

​Performance Characteristics

​Time Complexity

​Memory Usage

​Throughput

​Safety Features

​Circuit Breaker

​Session Expiration

​Junk Seek

​Comparison: Streaming vs Batch

​When to Use Streaming

​Next Steps

Streaming Guide

Streaming API

Confidence Levels

TypeScript SDK

Streaming Engine

The Three Problems

1. Buffer Bottleneck

2. Markdown Fence Trap

3. Numerical Gyrations

Architecture

Session State Machine

State Transitions

Key Algorithms

Structural Completeness Check

Incomplete Token Detection

Brace Tracking

Performance Characteristics

Time Complexity

Memory Usage

Throughput

Safety Features

Circuit Breaker

Session Expiration

Junk Seek

Comparison: Streaming vs Batch

When to Use Streaming

Next Steps