The product
Middleware in front of your LLM calls: detect PII, tokenize it, log tamper-evident evidence — all before the prompt leaves your infrastructure. Open source, MIT, self-hosted.
3-Pass Detection Pipeline
Multiple layers of detection ensure no PII slips through. Each pass catches what the previous one missed.
Regex
High-precision pattern matching for structured data.
spaCy NER
Named entity recognition for names, orgs, and locations. (Python only)
Ollama LLM
Local LLM-based semantic detection for contextual PII. (opt-in)
Help me write a follow-up email
to Sarah Johnson (sarah.j@techcorp.io)
about the Q3 security audit.
Her direct line is +1-555-0142.Help me write a follow-up email
to [PERSON_0] ([EMAIL_0])
about the Q3 security audit.
Her direct line is [PHONE_0].Everything you need to protect PII
Drop-in middleware that works with your existing LLM stack. No vendor lock-in, no cloud dependencies.
9 Detection Categories
Emails, SSNs, credit cards, phone numbers, API keys, IBANs, JWTs, AWS keys, and IP addresses — all detected out of the box.
Reversible Tokenization
Deterministic [CATEGORY_N] tokens preserve context for the LLM. Desanitize to restore originals in responses.
Tamper-Evident Audit Logs
Hash-chained JSONL entries with SHA-256 and per-entity metadata. No PII stored — just hashes and counts. EU AI Act Article 12 ready.
One-Line Integration
cloakllm.enable() wraps LiteLLM (Python) or the OpenAI SDK (JS). Works with Vercel AI SDK middleware too.
Multi-Language Detection
13 locales (DE, FR, ES, IT, PT, NL, PL, SE, NO, DK, FI, GB, AU) with country-specific PII patterns for SSNs, tax IDs, and more.
Local LLM Detection
Opt-in Ollama integration catches addresses, medical terms, DOBs, and more. Data never leaves your machine.
Cryptographic Attestation
Ed25519 signed sanitization certificates prove compliance. Merkle tree batch proofs. Cross-language compatible.
Incremental Streaming
StreamDesanitizer replaces tokens as chunks arrive — no buffering the full response. All middleware paths stream incrementally.
Context Risk Analysis
Scores re-identification risk in sanitized text. Detects token density, identifying descriptors, and relationship edges that could reveal identity.
Normalized Token Standard
Formal spec (TOKEN_SPEC.md) with validation utilities, canonical regex, and 62 built-in categories. Both SDKs produce identical tokens.
Pluggable Backends
DetectorBackend base class lets you swap or extend the default regex→NER→LLM pipeline with custom detection stages.
EU AI Act Compliance Reports
Per-article compliance reports (Art 12/19/4a/50) with COMPLIANT / NON_COMPLIANT verdicts and an honest, machine-readable coverage matrix — what CloakLLM provides and what remains your responsibility. JSON, Markdown, or PDF.
Independent Verification
A standalone verifier (cloakllm-verifier, Python + JS) re-checks the hash chain, RFC 3161 timestamps, key provenance, and compliance reports — without the SDK and without trusting our code.
Security Hardened
Ollama SSRF prevention, CLI PII redaction by default, thread-safe internals, and redacted analysis output.
One line to protect your LLM calls
Drop-in middleware for every major LLM framework. No code rewrites needed.
from cloakllm import enable_openai
from openai import OpenAI
client = OpenAI()
enable_openai(client) # Wraps OpenAI SDK — all calls are now protected
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": "Help me email Sarah Johnson (sarah.j@techcorp.io)"
}],
)
# PII automatically restored in the response
print(response.choices[0].message.content)Get started in seconds
Install the SDK for your language and start protecting PII immediately.
Python
$ pip install cloakllm[detection]$ python -m spacy download en_core_web_smThe base pip install cloakllm is detection-free (regex + audit only); the [detection] extra adds spaCy NER (PERSON/ORG/GPE).
JavaScript / TypeScript
$ npm install cloakllmMCP Server & Verifier
$ pip install cloakllm-mcp$ pip install cloakllm-verifierSDK Comparison
Three SDKs, same core protection. Pick the one that fits your stack.
| Feature | Python | JavaScript | MCP |
|---|---|---|---|
| Regex PII Detection | |||
| spaCy NER (PERSON, ORG, GPE) | |||
| Ollama LLM Detection (opt-in) | |||
| Reversible Tokenization | |||
| Redaction Mode | |||
| Hash-Chained Audit Logs | |||
| CLI (scan / verify / stats) | |||
| Multi-Turn Token Maps | |||
| Custom Patterns | |||
| Field-Level PII Metadata | |||
| Batch Processing | |||
| Performance Metrics | |||
| Incremental Streaming | |||
| Middleware Integration | OpenAI / LiteLLM | OpenAI / Vercel | Claude Desktop |
| Zero Runtime Dependencies |
Using CloakLLM? Preparing for a 2027 audit?
We're talking to teams deploying AI under the EU AI Act — what you're building, what an audit will ask of you, and what's missing. Auditors and certification bodies especially welcome.
Talk to us