Cloak your prompts. Prove your compliance.
Open-source PII protection middleware for LLMs. Detect sensitive data, replace it with reversible tokens, and maintain tamper-evident audit logs — all before your prompts leave your infrastructure.
$ pip install cloakllm$ npm install cloakllm
Your LLM prompts are plaintext confessions.
Every API call to an LLM sends raw customer data — names, emails, SSNs — to third-party servers. Under the EU AI Act, that's a compliance liability.
The Risk
PII in prompts means your users' personal data is processed by third-party LLM providers — often without consent or safeguards.
The Deadline
December 2, 2027 — EU AI Act Article 12 transparency requirements take effect for high-risk AI systems.
The Penalty
Non-compliance fines up to 7% of global revenue or 35 million, whichever is higher.
The Article 12 Paradox
Why GDPR and the EU AI Act cannot both be satisfied without PII middleware.
The EU AI Act requires logging every high-risk AI interaction. GDPR prohibits retaining personal data. This whitepaper explains the structural conflict and the architectural middleware layer that resolves it — with no legal trade-offs.
- Article 12 logging vs. GDPR data minimisation — a structural, unavoidable conflict
- Deterministic tokenization as GDPR-recognised pseudonymisation (Recital 26, Art. 4(5))
- Behavioral traceability vs. identity traceability — what regulators actually require
- Article 4a readiness: pseudonymised special-category data for bias detection
3-Pass Detection Pipeline
Multiple layers of detection ensure no PII slips through. Each pass catches what the previous one missed.
Regex
High-precision pattern matching for structured data.
spaCy NER
Named entity recognition for names, orgs, and locations. (Python only)
Ollama LLM
Local LLM-based semantic detection for contextual PII. (opt-in)
Help me write a follow-up email
to Sarah Johnson (sarah.j@techcorp.io)
about the Q3 security audit.
Her direct line is +1-555-0142.Help me write a follow-up email
to [PERSON_0] ([EMAIL_0])
about the Q3 security audit.
Her direct line is [PHONE_0].Everything you need to protect PII
Drop-in middleware that works with your existing LLM stack. No vendor lock-in, no cloud dependencies.
9 Detection Categories
Emails, SSNs, credit cards, phone numbers, API keys, IBANs, JWTs, AWS keys, and IP addresses — all detected out of the box.
Reversible Tokenization
Deterministic [CATEGORY_N] tokens preserve context for the LLM. Desanitize to restore originals in responses.
Tamper-Evident Audit Logs
Hash-chained JSONL entries with SHA-256 and per-entity metadata. No PII stored — just hashes and counts. EU AI Act Article 12 ready.
One-Line Integration
cloakllm.enable() wraps LiteLLM (Python) or the OpenAI SDK (JS). Works with Vercel AI SDK middleware too.
Multi-Language Detection
13 locales (DE, FR, ES, IT, PT, NL, PL, SE, NO, DK, FI, GB, AU) with country-specific PII patterns for SSNs, tax IDs, and more.
Local LLM Detection
Opt-in Ollama integration catches addresses, medical terms, DOBs, and more. Data never leaves your machine.
Cryptographic Attestation
Ed25519 signed sanitization certificates prove compliance. Merkle tree batch proofs. Cross-language compatible.
Incremental Streaming
StreamDesanitizer replaces tokens as chunks arrive — no buffering the full response. All middleware paths stream incrementally.
Context Risk Analysis
Scores re-identification risk in sanitized text. Detects token density, identifying descriptors, and relationship edges that could reveal identity.
Normalized Token Standard
Formal spec (TOKEN_SPEC.md) with validation utilities, canonical regex, and 62 built-in categories. Both SDKs produce identical tokens.
Pluggable Backends
DetectorBackend base class lets you swap or extend the default regex→NER→LLM pipeline with custom detection stages.
Article 12 Compliance Mode
Formal EU AI Act compliance profile (v0.6) with tamper-detectable compliance fields, complianceSummary() API, and structured COMPLIANT/NON_COMPLIANT verify reports for auditors.
Enterprise Key Management (experimental)
KMS provider scaffolding for AWS KMS, GCP KMS, Azure Key Vault, HashiCorp Vault. Disabled in v0.6.1 pending rebuild in v0.7.0 — use LocalKeyProvider for now.
Security Hardened
Ollama SSRF prevention, CLI PII redaction by default, thread-safe internals, and redacted analysis output.
One line to protect your LLM calls
Drop-in middleware for every major LLM framework. No code rewrites needed.
from cloakllm import enable_openai
from openai import OpenAI
client = OpenAI()
enable_openai(client) # Wraps OpenAI SDK — all calls are now protected
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": "Help me email Sarah Johnson (sarah.j@techcorp.io)"
}],
)
# PII automatically restored in the response
print(response.choices[0].message.content)Get started in seconds
Install the SDK for your language and start protecting PII immediately.
Python
$ pip install cloakllm$ python -m spacy download en_core_web_smJavaScript / TypeScript
$ npm install cloakllmMCP Server
$ pip install cloakllm-mcpSDK Comparison
Three SDKs, same core protection. Pick the one that fits your stack.
| Feature | Python | JavaScript | MCP |
|---|---|---|---|
| Regex PII Detection | |||
| spaCy NER (PERSON, ORG, GPE) | |||
| Ollama LLM Detection (opt-in) | |||
| Reversible Tokenization | |||
| Redaction Mode | |||
| Hash-Chained Audit Logs | |||
| CLI (scan / verify / stats) | |||
| Multi-Turn Token Maps | |||
| Custom Patterns | |||
| Field-Level PII Metadata | |||
| Batch Processing | |||
| Performance Metrics | |||
| Incremental Streaming | |||
| Middleware Integration | OpenAI / LiteLLM | OpenAI / Vercel | Claude Desktop |
| Zero Runtime Dependencies |