Skip to main content

Streaming Repair Guide

Batch repair waits for the full output. Streaming repair processes JSON token-by-token as it arrives. No buffering delays.

Why Streaming Matters

The Problem:
  • LLMs stream tokens one at a time
  • Your UI wants to show progress
  • OutputFixingParser waits for full output (2-5s delay)
The Solution: Shim repairs JSON incrementally. Push chunks as they arrive. Get parseable JSON immediately.

How It Works

Three-Step Process

  1. Start Session → Get session_id
  2. Push Chunks → Send tokens as they arrive, get state back
  3. Finalize → Get repaired result with confidence score

Session Lifecycle

// 1. Start session
const { session_id } = await shim.stream.start({
  schema: mySchema,
  mode: 'strict'
});

// 2. Push chunks (as LLM streams)
for await (const chunk of llmStream) {
  const { state } = await shim.stream.push({
    session_id,
    chunk
  });

  // state.safe_to_emit → safe to show to user
  // state.partial → parsed object (or null)
}

// 3. Finalize
const result = await shim.stream.finalize({ session_id });
console.log(result.repaired);

State Flags

Each push returns a StreamingState object:
FlagMeaningUse Case
structurally_completeBraces balanced, not in stringSafe to finalize
json_parseableJSON.parse() will succeedSafe to parse
safe_to_emitParseable + no critical issuesSafe to show user
partialParsed object (or null)Preview data

Example State Progression

// Push: '{"name": "Jo'
{
  structurally_complete: false,  // Still in string
  json_parseable: false,          // Not parseable yet
  safe_to_emit: false,            // Don't show yet
  partial: null,
  buffered: '{"name": "Jo',
  in_string: true,
  brace_depth: 1
}

// Push: 'hn", "age": 30}'
{
  structurally_complete: true,   // Braces balanced
  json_parseable: true,           // Can parse
  safe_to_emit: true,             // Safe to show
  partial: { name: "John", age: 30 },
  buffered: '{"name": "John", "age": 30}',
  in_string: false,
  brace_depth: 0
}

Buffer Bottleneck Solution

Problem: Waiting for full output adds latency. Solution: Shim’s streaming engine:
  1. Strips markdown fences early
  2. Holds incomplete tokens (e.g., "0.")
  3. Attempts parse on every chunk
  4. Returns partial as soon as parseable

Numerical Gyrations

Partial numbers can cause UI flicker:
// Chunk 1: '{"score": 0'
partial: { score: 0 }  // UI shows 0

// Chunk 2: '.5'
partial: { score: 0.5 } // UI updates to 0.5
Shim detects incomplete numbers and holds them:
// Chunk 1: '{"score": 0'
{
  partial: null,           // Not emitted yet
  incomplete_token: "0"    // Held
}

// Chunk 2: '.5'
{
  partial: { score: 0.5 }, // Emitted once complete
  incomplete_token: ""
}

Session Management

Expiration

Sessions expire after 60 seconds of inactivity.
const { session_id, expires_at } = await shim.stream.start();
// expires_at: Unix timestamp (ms)

// Push within 60s
await shim.stream.push({ session_id, chunk });

Circuit Breaker

Sessions terminate at 5MB buffer (hallucination loop protection).
{
  "code": "BUFFER_SIZE_EXCEEDED",
  "message": "Buffer size exceeded 5MB limit",
  "severity": "critical"
}

Complete Example

import { ShimClient } from 'shim-sdk';

const shim = new ShimClient({ apiKey: process.env.SHIM_API_KEY });

async function streamingRepair(llmStream) {
  // Start session
  const { session_id } = await shim.stream.start({
    schema: {
      type: 'object',
      properties: {
        name: { type: 'string' },
        age: { type: 'number' }
      }
    },
    mode: 'strict'
  });

  // Push chunks
  for await (const chunk of llmStream) {
    const { state } = await shim.stream.push({
      session_id,
      chunk
    });

    // Show progress to user
    if (state.safe_to_emit && state.partial) {
      console.log('Preview:', state.partial);
    }
  }

  // Finalize and get result
  const result = await shim.stream.finalize({ session_id });

  if (result.success) {
    console.log('Final:', result.repaired);
    console.log('Confidence:', result.metadata.confidence);
  }

  return result;
}

Best Practices

1. Check safe_to_emit Before Displaying

if (state.safe_to_emit && state.partial) {
  updateUI(state.partial);
}

2. Handle Session Expiration

const { state } = await shim.stream.push({ session_id, chunk });

if (state.error?.code === 'SESSION_NOT_FOUND') {
  // Restart session
  const { session_id } = await shim.stream.start();
}

3. Set Max Tokens on LLM

Prevent hallucination loops:
const stream = await openai.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: prompt }],
  max_tokens: 1000,  // Circuit breaker
  stream: true
});

4. Finalize When structurally_complete

if (state.structurally_complete) {
  // LLM output looks complete, finalize now
  const result = await shim.stream.finalize({ session_id });
}

Performance

  • Latency: <1ms per push
  • Memory: O(m) buffer where m = total output size (circuit breaker at 5MB)
  • Throughput: 1M+ chunks/sec per Worker

Next Steps

Streaming API Reference

Full API documentation

TypeScript SDK

Use the official SDK

Error Handling

Handle errors gracefully

Confidence Levels

Understand confidence scoring