Data Privacy

Shim never stores your raw LLM outputs or repaired JSON. Zero persistence. GDPR compliant by design.

Zero-Persistence Promise

What We Never Store

Raw LLM outputs (the raw_output field)
Repaired JSON (the repaired field)
Field names (e.g., name, email, address)
Field values (e.g., "John Doe", "[email protected]")

What We Store

Metadata only (90-day retention):

{
  "api_key_id": "key_abc123",
  "timestamp": 1705780800000,
  "latency_ms": 5,
  "input_size_bytes": 1024,
  "output_size_bytes": 1050,
  "confidence": "high",
  "repair_types": ["closed_bracket", "trailing_comma"],
  "success": true
}

No content. No PII. Only operational metrics.

Architecture

In-Memory Processing

All repairs happen in memory. Nothing touches disk.

// 1. Request arrives
const { raw_output, schema } = request.body;

// 2. Repair in memory (0 disk writes)
const repaired = repairSyntax(raw_output);

// 3. Validate against schema
const validated = repairWithSchema(repaired, schema);

// 4. Log metadata only
await logRepairToD1({
  api_key_id,
  input_size: raw_output.length,
  confidence: metadata.confidence
  // No raw_output or repaired stored
});

// 5. Return response
return { success: true, repaired: validated };

Cloudflare Workers

Serverless functions with no persistent filesystem:

Runtime: V8 isolates (ephemeral)
Memory: Cleared after request
Storage: KV (metadata only) + D1 (logs)
Edge: Runs close to user, no backhaul

What Gets Logged

Batch Repair

INSERT INTO repair_logs (
  api_key_id,
  timestamp,
  latency_ms,
  input_size_bytes,
  output_size_bytes,
  confidence,
  repair_types,
  success
) VALUES (?, ?, ?, ?, ?, ?, ?, ?);

No raw_output or repaired column exists in the schema.

Streaming Repair

Same as batch. Session ID is ephemeral (expires after 60s).

Data Processing

Processor: Shim acts as a data processor (you are the controller)
Purpose: JSON repair only (no analysis, no training)
Duration: In-memory only (milliseconds)
Location: Cloudflare edge network (regional processing)

User Rights

Right to erasure: No data to erase. Logs contain no PII. Right to access: Metadata logs available via dashboard. Right to portability: Export metadata via API (coming soon).

Data Residency

Cloudflare Workers run in the region closest to the request:

EU requests: EU data centers
US requests: US data centers
Asia requests: Asia data centers

No cross-border data transfer of content.

Threat Model

What Shim Protects Against

1. Data Breach

No stored content → Nothing to breach
Only metadata exposed in worst case

2. Unauthorized Access

API keys hashed (SHA-256)
No plaintext credentials in database

3. Compliance Violations

No PII storage → No GDPR violations
Metadata retention: 90 days (configurable)

What Shim Doesn’t Protect Against

1. Network-Level Attacks

Use HTTPS (TLS 1.3) for API calls
Shim enforces HTTPS-only

2. Your LLM Provider

OpenAI/Anthropic store your prompts
Shim doesn’t change this

3. Your Application

You control what happens after repair
Shim doesn’t store your data

Security Headers

All responses include security headers:

Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

Observability vs Privacy

What We Track

Operational metrics:

Repair success rate
Latency percentiles
Error rates
Confidence distribution

Business metrics:

API usage counts
Tier limits
Overage billing

What We Don’t Track

Sensitive data:

User identifiers (beyond API key hash)
Field names or values
Schema structures
Error messages with content

Comparison: Shim vs Others

Feature	Shim	OutputFixingParser	Custom Parser
Stores raw output	No	No	Depends
Stores repaired	No	No	Depends
Logs metadata	Yes (90d)	No	Depends
Edge processing	Yes	No	No
In-memory	Yes	Yes	Depends
GDPR compliant	Yes	N/A	Depends

Retention Policy

Metadata Logs

Retention: 90 days (configurable per tier) After expiration:

Automatic deletion
No backups kept
No recovery possible

API Keys

Retention: Until revoked Storage:

Hashed (SHA-256)
Salted per key
Never plaintext

Sessions (Streaming)

Retention: 60 seconds Storage:

KV namespace (ephemeral)
Auto-expires
No persistent backups

Audit Log

Coming in v1.2 (Enterprise tier):

{
  "event": "repair_requested",
  "timestamp": 1705780800000,
  "api_key_id": "key_abc123",
  "user_id": "user_xyz789",
  "input_hash": "sha256:abc123...",
  "confidence": "high",
  "repairs": ["closed_bracket"]
}

Still no content. Just hashed fingerprints.

Guardian Tier (Coming Soon)

Real-Time PII Scrubbing

Shim will detect and redact PII before repair:

// Input: '{"name": "John Doe", "ssn": "123-45-6789"}'
// After scrubbing: '{"name": "[REDACTED]", "ssn": "[REDACTED]"}'

Patterns detected:

SSN (US/UK)
Credit card numbers
Email addresses
Phone numbers
IP addresses

Still zero persistence. Scrubbing happens in-memory.

FAQs

Does Shim train models on my data?

No. Shim is a rule-based repair engine. No ML training.

Can Shim employees see my data?

No. Shim has no access to content. Only metadata in logs.

What if I need to prove deletion?

Metadata logs show no content was stored. Deletion is inherent.

Is Shim SOC 2 compliant?

Roadmap for Q3 2026. GDPR compliant today.

Can I self-host Shim?

Not currently. Cloudflare Workers are hosted only.

What about encryption at rest?

No content stored → Nothing to encrypt at rest.

Are API keys encrypted?

Yes. Hashed with SHA-256. Never stored plaintext.

Next Steps

Authentication

Secure your API keys

Rate Limits

Usage tracking (metadata only)

Error Handling

No sensitive data in errors

Console

View metadata logs

​Data Privacy

​Zero-Persistence Promise

​What We Never Store

​What We Store

​Architecture

​In-Memory Processing

​Cloudflare Workers

​What Gets Logged

​Batch Repair

​Streaming Repair

​GDPR Compliance

​Data Processing

​User Rights

​Data Residency

​Threat Model

​What Shim Protects Against

​What Shim Doesn’t Protect Against

​Security Headers

​Observability vs Privacy

​What We Track

​What We Don’t Track

​Comparison: Shim vs Others

​Retention Policy

​Metadata Logs

​API Keys

​Sessions (Streaming)

​Audit Log

​Guardian Tier (Coming Soon)

​Real-Time PII Scrubbing

​FAQs

​Does Shim train models on my data?

​Can Shim employees see my data?

​What if I need to prove deletion?

​Is Shim SOC 2 compliant?

​Can I self-host Shim?

​What about encryption at rest?

​Are API keys encrypted?

​Next Steps