How It Works

The three-step pipeline that keeps your data private.

Overview

Noirdoc operates as a transparent reverse proxy between your application and your LLM provider. Every request passes through a three-step pipeline that ensures the language model never sees real personal data.

Your App
Noirdoc Proxy
LLM Provider
DETECT
Detect & Replace
PII detected and replaced with pseudonyms
FORWARD
Forward
Sanitized request sent to the model
RESTORE
Restore
Pseudonyms restored to original values

The entire process is invisible to your application. You send a normal API request and receive a normal response — Noirdoc handles everything in between.

The three-step pipeline

Step 1: Detect & Replace

When Noirdoc receives your request, it scans every message in the conversation for personal data. Detected entities — names, emails, phone numbers, IBANs, and more — are replaced with deterministic pseudonyms.

For example, a message like:

Please draft a reply to Max Mustermann (max.mustermann@example.com) regarding invoice #4021.

becomes:

Please draft a reply to <<PERSON_1>> (<<EMAIL_1>>) regarding invoice #4021.

The mapping between real values and pseudonyms is stored in an encrypted session state, so it can be reversed later.

Step 2: Forward

The pseudonymized request is forwarded to your configured LLM provider — OpenAI, Anthropic, Azure OpenAI, or OpenRouter. The model processes the request using only the pseudonymized data. It has no access to the original values.

Noirdoc also injects a brief system prompt that informs the model about the pseudonym format. This helps the model treat tokens like <<PERSON_1>> as proper nouns and reference them consistently throughout its response.

Step 3: Restore

When the LLM responds, Noirdoc scans the response for pseudonym tokens and replaces them with the original values from the session mapping. The restored response is returned to your application as if the model had worked with real data all along.

If the model produces <<PERSON_1>> in its reply, it becomes Max Mustermann again before your application ever sees it.

Session state and mapping persistence

Pseudonym mappings are not one-off replacements. They persist across the entire conversation session, which means:

  • <<PERSON_1>> always refers to the same person within a session
  • If Max Mustermann appears in message 1 and again in message 5, both occurrences map to <<PERSON_1>>
  • New entities get the next available index — a second person becomes <<PERSON_2>>

This consistency is critical for multi-turn conversations. The model can reason about <<PERSON_1>> across multiple exchanges without ever learning who that person actually is.

Mappings are retained for a configurable period (default: 30 days) controlled by the mapping_ttl_days setting. After the TTL expires, the mapping is deleted and the same real value would receive a new pseudonym in future sessions. Setting TTL to 0 disables persistence entirely — mappings exist only for the duration of a single request.

Multi-layer detection

Noirdoc uses multiple complementary detection methods running in parallel to maximize accuracy:

  • Pattern-based detection catches structured entities with predictable formats — email addresses, phone numbers, IBANs, credit card numbers, tax IDs, and similar data. This layer is fast, deterministic, and produces very few false positives on structured data.

  • Context-sensitive detection understands the meaning of surrounding text to detect entities that do not follow a fixed pattern, such as person names, organizations, and locations. For example, it recognizes that “Schwarz” in “Dr. Schwarz called yesterday” is a person name, not a color.

Both methods run on every request. Their results are merged and deduplicated. Each detection carries a confidence score — only entities that meet the configured threshold are pseudonymized, reducing false positives while maintaining high recall.

What the model sees

To illustrate the full picture, here is what actually arrives at the LLM provider:

Your original request:

Summarize the case file for Anna Schmidt, DOB 12.04.1990,
insured under SVNR 1234 120490. Her physician Dr. Weber
can be reached at weber@praxis-berlin.de.

What the model receives:

Summarize the case file for <<PERSON_1>>, DOB <<DATE_1>>,
insured under SVNR <<SVNR_1>>. Her physician <<PERSON_2>>
can be reached at <<EMAIL_1>>.

The model generates a coherent response using the pseudonyms. Noirdoc then restores the original values before returning the response to your application. At no point does the LLM provider have access to real personal data.