Streaming Repair Guide

Batch repair waits for the full output. Streaming repair processes JSON token-by-token as it arrives. No buffering delays.

Why Streaming Matters

The Problem:

LLMs stream tokens one at a time
Your UI wants to show progress
OutputFixingParser waits for full output (2-5s delay)

The Solution: Shim repairs JSON incrementally. Push chunks as they arrive. Get parseable JSON immediately.

How It Works

Three-Step Process

Start Session → Get session_id
Push Chunks → Send tokens as they arrive, get state back
Finalize → Get repaired result with confidence score

Session Lifecycle

// 1. Start session
const { session_id } = await shim.stream.start({
  schema: mySchema,
  mode: 'strict'
});

// 2. Push chunks (as LLM streams)
for await (const chunk of llmStream) {
  const { state } = await shim.stream.push({
    session_id,
    chunk
  });

  // state.safe_to_emit → safe to show to user
  // state.partial → parsed object (or null)
}

// 3. Finalize
const result = await shim.stream.finalize({ session_id });
console.log(result.repaired);

State Flags

Each push returns a StreamingState object:

Flag	Meaning	Use Case
`structurally_complete`	Braces balanced, not in string	Safe to finalize
`json_parseable`	`JSON.parse()` will succeed	Safe to parse
`safe_to_emit`	Parseable + no critical issues	Safe to show user
`partial`	Parsed object (or null)	Preview data

Example State Progression

// Push: '{"name": "Jo'
{
  structurally_complete: false,  // Still in string
  json_parseable: false,          // Not parseable yet
  safe_to_emit: false,            // Don't show yet
  partial: null,
  buffered: '{"name": "Jo',
  in_string: true,
  brace_depth: 1
}

// Push: 'hn", "age": 30}'
{
  structurally_complete: true,   // Braces balanced
  json_parseable: true,           // Can parse
  safe_to_emit: true,             // Safe to show
  partial: { name: "John", age: 30 },
  buffered: '{"name": "John", "age": 30}',
  in_string: false,
  brace_depth: 0
}

Buffer Bottleneck Solution

Problem: Waiting for full output adds latency. Solution: Shim’s streaming engine:

Strips markdown fences early
Holds incomplete tokens (e.g., "0.")
Attempts parse on every chunk
Returns partial as soon as parseable

Numerical Gyrations

Partial numbers can cause UI flicker:

// Chunk 1: '{"score": 0'
partial: { score: 0 }  // UI shows 0

// Chunk 2: '.5'
partial: { score: 0.5 } // UI updates to 0.5

Shim detects incomplete numbers and holds them:

// Chunk 1: '{"score": 0'
{
  partial: null,           // Not emitted yet
  incomplete_token: "0"    // Held
}

// Chunk 2: '.5'
{
  partial: { score: 0.5 }, // Emitted once complete
  incomplete_token: ""
}

Session Management

Expiration

Sessions expire after 60 seconds of inactivity.

const { session_id, expires_at } = await shim.stream.start();
// expires_at: Unix timestamp (ms)

// Push within 60s
await shim.stream.push({ session_id, chunk });

Circuit Breaker

Sessions terminate at 5MB buffer (hallucination loop protection).

{
  "code": "BUFFER_SIZE_EXCEEDED",
  "message": "Buffer size exceeded 5MB limit",
  "severity": "critical"
}

Complete Example

import { ShimClient } from 'shim-sdk';

const shim = new ShimClient({ apiKey: process.env.SHIM_API_KEY });

async function streamingRepair(llmStream) {
  // Start session
  const { session_id } = await shim.stream.start({
    schema: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        age: { type: 'number' }
      }
    },
    mode: 'strict'
  });

  // Push chunks
  for await (const chunk of llmStream) {
    const { state } = await shim.stream.push({
      session_id,
      chunk
    });

    // Show progress to user
    if (state.safe_to_emit && state.partial) {
      console.log('Preview:', state.partial);
    }
  }

  // Finalize and get result
  const result = await shim.stream.finalize({ session_id });

  if (result.success) {
    console.log('Final:', result.repaired);
    console.log('Confidence:', result.metadata.confidence);
  }

  return result;
}

Best Practices

1. Check `safe_to_emit` Before Displaying

if (state.safe_to_emit && state.partial) {
  updateUI(state.partial);
}

2. Handle Session Expiration

const { state } = await shim.stream.push({ session_id, chunk });

if (state.error?.code === 'SESSION_NOT_FOUND') {
  // Restart session
  const { session_id } = await shim.stream.start();
}

3. Set Max Tokens on LLM

Prevent hallucination loops:

const stream = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: prompt }],
  max_tokens: 1000,  // Circuit breaker
  stream: true
});

4. Finalize When `structurally_complete`

if (state.structurally_complete) {
  // LLM output looks complete, finalize now
  const result = await shim.stream.finalize({ session_id });
}

Performance

Latency: <1ms per push
Memory: O(m) buffer where m = total output size (circuit breaker at 5MB)
Throughput: 1M+ chunks/sec per Worker

Next Steps

Streaming API Reference

Full API documentation

TypeScript SDK

Use the official SDK

Error Handling

Handle errors gracefully

Confidence Levels

Understand confidence scoring

Getting Started

Guides

Concepts

API Reference

SDKs

Migration

Streaming Repair Guide

Streaming Repair Guide

Why Streaming Matters

How It Works

Three-Step Process

Session Lifecycle

State Flags

Example State Progression

Buffer Bottleneck Solution

Numerical Gyrations

Session Management

Expiration

Circuit Breaker

Complete Example

Best Practices

1. Check `safe_to_emit` Before Displaying

2. Handle Session Expiration

3. Set Max Tokens on LLM

4. Finalize When `structurally_complete`

Performance

Next Steps

Streaming API Reference

TypeScript SDK

Error Handling

Confidence Levels

​Streaming Repair Guide

​Why Streaming Matters

​How It Works

​Three-Step Process

​Session Lifecycle

​State Flags

​Example State Progression

​Buffer Bottleneck Solution

​Numerical Gyrations

​Session Management

​Expiration

​Circuit Breaker

​Complete Example

​Best Practices

​1. Check safe_to_emit Before Displaying

​2. Handle Session Expiration

​3. Set Max Tokens on LLM

​4. Finalize When structurally_complete

​Performance

​Next Steps

Streaming API Reference

TypeScript SDK

Error Handling

Confidence Levels

Streaming Repair Guide

Why Streaming Matters

How It Works

Three-Step Process

Session Lifecycle

State Flags

Example State Progression

Buffer Bottleneck Solution

Numerical Gyrations

Session Management

Expiration

Circuit Breaker

Complete Example

Best Practices

1. Check `safe_to_emit` Before Displaying

2. Handle Session Expiration

3. Set Max Tokens on LLM

4. Finalize When `structurally_complete`

Performance

Next Steps