Data Privacy
Shim never stores your raw LLM outputs or repaired JSON. Zero persistence. GDPR compliant by design.Zero-Persistence Promise
What We Never Store
- Raw LLM outputs (the
raw_outputfield) - Repaired JSON (the
repairedfield) - Field names (e.g.,
name,email,address) - Field values (e.g.,
"John Doe","[email protected]")
What We Store
Metadata only (90-day retention):Architecture
In-Memory Processing
All repairs happen in memory. Nothing touches disk.Cloudflare Workers
Serverless functions with no persistent filesystem:- Runtime: V8 isolates (ephemeral)
- Memory: Cleared after request
- Storage: KV (metadata only) + D1 (logs)
- Edge: Runs close to user, no backhaul
What Gets Logged
Batch Repair
raw_output or repaired column exists in the schema.
Streaming Repair
Same as batch. Session ID is ephemeral (expires after 60s).GDPR Compliance
Data Processing
- Processor: Shim acts as a data processor (you are the controller)
- Purpose: JSON repair only (no analysis, no training)
- Duration: In-memory only (milliseconds)
- Location: Cloudflare edge network (regional processing)
User Rights
Right to erasure: No data to erase. Logs contain no PII. Right to access: Metadata logs available via dashboard. Right to portability: Export metadata via API (coming soon).Data Residency
Cloudflare Workers run in the region closest to the request:- EU requests: EU data centers
- US requests: US data centers
- Asia requests: Asia data centers
Threat Model
What Shim Protects Against
1. Data Breach- No stored content → Nothing to breach
- Only metadata exposed in worst case
- API keys hashed (SHA-256)
- No plaintext credentials in database
- No PII storage → No GDPR violations
- Metadata retention: 90 days (configurable)
What Shim Doesn’t Protect Against
1. Network-Level Attacks- Use HTTPS (TLS 1.3) for API calls
- Shim enforces HTTPS-only
- OpenAI/Anthropic store your prompts
- Shim doesn’t change this
- You control what happens after repair
- Shim doesn’t store your data
Security Headers
All responses include security headers:Observability vs Privacy
What We Track
Operational metrics:- Repair success rate
- Latency percentiles
- Error rates
- Confidence distribution
- API usage counts
- Tier limits
- Overage billing
What We Don’t Track
Sensitive data:- User identifiers (beyond API key hash)
- Field names or values
- Schema structures
- Error messages with content
Comparison: Shim vs Others
| Feature | Shim | OutputFixingParser | Custom Parser |
|---|---|---|---|
| Stores raw output | No | No | Depends |
| Stores repaired | No | No | Depends |
| Logs metadata | Yes (90d) | No | Depends |
| Edge processing | Yes | No | No |
| In-memory | Yes | Yes | Depends |
| GDPR compliant | Yes | N/A | Depends |
Retention Policy
Metadata Logs
Retention: 90 days (configurable per tier) After expiration:- Automatic deletion
- No backups kept
- No recovery possible
API Keys
Retention: Until revoked Storage:- Hashed (SHA-256)
- Salted per key
- Never plaintext
Sessions (Streaming)
Retention: 60 seconds Storage:- KV namespace (ephemeral)
- Auto-expires
- No persistent backups
Audit Log
Coming in v1.2 (Enterprise tier):Guardian Tier (Coming Soon)
Real-Time PII Scrubbing
Shim will detect and redact PII before repair:- SSN (US/UK)
- Credit card numbers
- Email addresses
- Phone numbers
- IP addresses
FAQs
Does Shim train models on my data?
No. Shim is a rule-based repair engine. No ML training.Can Shim employees see my data?
No. Shim has no access to content. Only metadata in logs.What if I need to prove deletion?
Metadata logs show no content was stored. Deletion is inherent.Is Shim SOC 2 compliant?
Roadmap for Q3 2026. GDPR compliant today.Can I self-host Shim?
Not currently. Cloudflare Workers are hosted only.What about encryption at rest?
No content stored → Nothing to encrypt at rest.Are API keys encrypted?
Yes. Hashed with SHA-256. Never stored plaintext.Next Steps
Authentication
Secure your API keys
Rate Limits
Usage tracking (metadata only)
Error Handling
No sensitive data in errors
Console
View metadata logs
