Skip to main content

Data Privacy

Shim never stores your raw LLM outputs or repaired JSON. Zero persistence. GDPR compliant by design.

Zero-Persistence Promise

What We Never Store

  • Raw LLM outputs (the raw_output field)
  • Repaired JSON (the repaired field)
  • Field names (e.g., name, email, address)
  • Field values (e.g., "John Doe", "[email protected]")

What We Store

Metadata only (90-day retention):
{
  "api_key_id": "key_abc123",
  "timestamp": 1705780800000,
  "latency_ms": 5,
  "input_size_bytes": 1024,
  "output_size_bytes": 1050,
  "confidence": "high",
  "repair_types": ["closed_bracket", "trailing_comma"],
  "success": true
}
No content. No PII. Only operational metrics.

Architecture

In-Memory Processing

All repairs happen in memory. Nothing touches disk.
// 1. Request arrives
const { raw_output, schema } = request.body;

// 2. Repair in memory (0 disk writes)
const repaired = repairSyntax(raw_output);

// 3. Validate against schema
const validated = repairWithSchema(repaired, schema);

// 4. Log metadata only
await logRepairToD1({
  api_key_id,
  input_size: raw_output.length,
  confidence: metadata.confidence
  // No raw_output or repaired stored
});

// 5. Return response
return { success: true, repaired: validated };

Cloudflare Workers

Serverless functions with no persistent filesystem:
  • Runtime: V8 isolates (ephemeral)
  • Memory: Cleared after request
  • Storage: KV (metadata only) + D1 (logs)
  • Edge: Runs close to user, no backhaul

What Gets Logged

Batch Repair

INSERT INTO repair_logs (
  api_key_id,
  timestamp,
  latency_ms,
  input_size_bytes,
  output_size_bytes,
  confidence,
  repair_types,
  success
) VALUES (?, ?, ?, ?, ?, ?, ?, ?);
No raw_output or repaired column exists in the schema.

Streaming Repair

Same as batch. Session ID is ephemeral (expires after 60s).

GDPR Compliance

Data Processing

  • Processor: Shim acts as a data processor (you are the controller)
  • Purpose: JSON repair only (no analysis, no training)
  • Duration: In-memory only (milliseconds)
  • Location: Cloudflare edge network (regional processing)

User Rights

Right to erasure: No data to erase. Logs contain no PII. Right to access: Metadata logs available via dashboard. Right to portability: Export metadata via API (coming soon).

Data Residency

Cloudflare Workers run in the region closest to the request:
  • EU requests: EU data centers
  • US requests: US data centers
  • Asia requests: Asia data centers
No cross-border data transfer of content.

Threat Model

What Shim Protects Against

1. Data Breach
  • No stored content → Nothing to breach
  • Only metadata exposed in worst case
2. Unauthorized Access
  • API keys hashed (SHA-256)
  • No plaintext credentials in database
3. Compliance Violations
  • No PII storage → No GDPR violations
  • Metadata retention: 90 days (configurable)

What Shim Doesn’t Protect Against

1. Network-Level Attacks
  • Use HTTPS (TLS 1.3) for API calls
  • Shim enforces HTTPS-only
2. Your LLM Provider
  • OpenAI/Anthropic store your prompts
  • Shim doesn’t change this
3. Your Application
  • You control what happens after repair
  • Shim doesn’t store your data

Security Headers

All responses include security headers:
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-XSS-Protection: 1; mode=block

Observability vs Privacy

What We Track

Operational metrics:
  • Repair success rate
  • Latency percentiles
  • Error rates
  • Confidence distribution
Business metrics:
  • API usage counts
  • Tier limits
  • Overage billing

What We Don’t Track

Sensitive data:
  • User identifiers (beyond API key hash)
  • Field names or values
  • Schema structures
  • Error messages with content

Comparison: Shim vs Others

FeatureShimOutputFixingParserCustom Parser
Stores raw outputNoNoDepends
Stores repairedNoNoDepends
Logs metadataYes (90d)NoDepends
Edge processingYesNoNo
In-memoryYesYesDepends
GDPR compliantYesN/ADepends

Retention Policy

Metadata Logs

Retention: 90 days (configurable per tier) After expiration:
  • Automatic deletion
  • No backups kept
  • No recovery possible

API Keys

Retention: Until revoked Storage:
  • Hashed (SHA-256)
  • Salted per key
  • Never plaintext

Sessions (Streaming)

Retention: 60 seconds Storage:
  • KV namespace (ephemeral)
  • Auto-expires
  • No persistent backups

Audit Log

Coming in v1.2 (Enterprise tier):
{
  "event": "repair_requested",
  "timestamp": 1705780800000,
  "api_key_id": "key_abc123",
  "user_id": "user_xyz789",
  "input_hash": "sha256:abc123...",
  "confidence": "high",
  "repairs": ["closed_bracket"]
}
Still no content. Just hashed fingerprints.

Guardian Tier (Coming Soon)

Real-Time PII Scrubbing

Shim will detect and redact PII before repair:
// Input: '{"name": "John Doe", "ssn": "123-45-6789"}'
// After scrubbing: '{"name": "[REDACTED]", "ssn": "[REDACTED]"}'
Patterns detected:
  • SSN (US/UK)
  • Credit card numbers
  • Email addresses
  • Phone numbers
  • IP addresses
Still zero persistence. Scrubbing happens in-memory.

FAQs

Does Shim train models on my data?

No. Shim is a rule-based repair engine. No ML training.

Can Shim employees see my data?

No. Shim has no access to content. Only metadata in logs.

What if I need to prove deletion?

Metadata logs show no content was stored. Deletion is inherent.

Is Shim SOC 2 compliant?

Roadmap for Q3 2026. GDPR compliant today.

Can I self-host Shim?

Not currently. Cloudflare Workers are hosted only.

What about encryption at rest?

No content stored → Nothing to encrypt at rest.

Are API keys encrypted?

Yes. Hashed with SHA-256. Never stored plaintext.

Next Steps

Authentication

Secure your API keys

Rate Limits

Usage tracking (metadata only)

Error Handling

No sensitive data in errors

Console

View metadata logs