Open-source PII protection middleware for LLMs. Detect, tokenize, and audit — before prompts leave your infrastructure.

CloakLLM Documentation

Welcome to the CloakLLM documentation. CloakLLM is open-source PII protection middleware for LLMs that detects sensitive data, replaces it with reversible tokens, and maintains tamper-evident audit logs — all before your prompts leave your infrastructure.

📄 New essay: Hand your logs to an auditor — they can check them without us — the standalone verifier that makes "verify, don't trust" literal.

Quick Install

Python:

pip install cloakllm              # core: regex detection, tokenization, audit (no NER)
pip install cloakllm[detection]   # + spaCy NER (PERSON/ORG/GPE) — recommended

JavaScript / TypeScript:

npm install cloakllm

MCP Server:

pip install cloakllm-mcp

Standalone verifier (for auditors / CI):

pip install cloakllm-verifier     # or: npm install cloakllm-verifier

Key Features

3-pass PII detection: regex, spaCy NER (Python), and optional Ollama LLM
Multi-language detection: 13 locales (de, fr, es, it, pt, nl, pl, se, no, dk, fi, gb, au) with locale-specific PII patterns
Reversible tokenization: deterministic [CATEGORY_N] tokens preserve context for the LLM
Cryptographic attestation: Ed25519 signed sanitization certificates with Merkle tree batch proofs
Tamper-evident audit logs: hash-chained JSONL with per-entity metadata, EU AI Act Article 12 ready
Incremental streaming: StreamDesanitizer replaces tokens as chunks arrive
Context risk analysis: scores re-identification risk in sanitized text by analyzing token density, identifying descriptors, and relationship edges
Normalized Token Standard: formal spec with validation utilities (validateToken, parseToken), canonical regex, and 62 built-in categories
Pluggable detection backends: DetectorBackend base class for custom detection pipelines; swap or extend the default regex→NER→LLM pipeline
Article 12 Compliance Mode: formal EU AI Act compliance profile with compliance_summary(), export_compliance_config(), and structured verify_audit(output_format="compliance_report")
Article 4a bias detection: BiasDetectionSession — pseudonymised special-category data workflow for bias auditing
Article 50 content-labeling records: record_content_generation() — durable, verifiable records of generated content (the asset never enters CloakLLM)
Compliance reports: generate_compliance_report() — per-article EU AI Act rollup (JSON / Markdown / PDF) with an honest machine-readable coverage matrix stating what CloakLLM provides and what remains your responsibility
Trusted timestamping: RFC 3161 checkpoints (Shield.checkpoint()) prove the chain existed no later than an independent authority's clock — tokens verify with plain OpenSSL
Independent verification: the standalone cloakllm-verifier re-checks hash chains, timestamps, key provenance, and compliance reports — without the SDK, without trusting our code
KeyManifest provenance + revocation: externally-verifiable key ownership with a root-signed revocation list
Security hardening: Ollama SSRF prevention, CLI PII redaction, thread-safe internals, tool-call argument sanitization in all middleware
One-line integration: wraps OpenAI SDK, LiteLLM, Vercel AI SDK, and MCP

Next Steps

Read the complete usage guide covering installation, configuration, middleware integration, audit logs, and more.

CloakLLM Documentation

CloakLLM Documentation

Quick Install

Key Features

Next Steps

On this page