Trust

Where the model runs.

Sparcle does not ship a model. We ship the runtime that drives a model you chose. The prompt assembly, the access-controlled retrieval, the PII masking, and the audit log all live inside your perimeter — regardless of where the model itself lives. And much of the platform never calls a model at all: launcher recognition, manifest→MCP utility calls, and search are deterministic and on-device — no LLM, no egress, in any mode. This page explains the three deployment modes our customers actually use, the privacy taxonomy the runtime applies to every endpoint, and the masking guarantee that holds even when the customer points us at a public LLM API.

The architectural decomposition

The runtime is invariant. Only the model location changes.

Bolt's runtime — the agent loop, the prompt assembly, the retrieval ACL, the PII masking, the audit log — runs inside the customer's perimeter in every shipping topology. The LLM adapter is a thin HTTP client whose base URL points at whichever inference endpoint the customer configured. Swapping the model location does not change the security posture of anything else.

Bolt deployment topology — three model-location modes The customer perimeter contains the Bolt UI, agent runtime, Aeira retrieval with ACL pre-filter, PII masking layer, and the LLM adapter. The LLM adapter exits the perimeter and routes to one of three modes: Mode A on-prem inference, Mode B in-tenant cloud LLM, or Mode C public vendor API. CUSTOMER PERIMETER Bolt UI user query Agent runtime orchestration Aeira retrieval + ACL pre-filter identity-bound prompt + scoped context PII masking layer tokenize before every LLM call LLM adapter OpenAI-compatible HTTP client outbound HTTP base URL = customer-configured MODE A On-prem inference private hostname inside your perimeter MODE B In-tenant cloud LLM Azure / Bedrock / Vertex private endpoint, your tenant MODE C Public vendor API OpenAI / Anthropic / Gemini PII-masked prompt egresses

The adapter speaks the OpenAI-compatible chat-completions API. That single interface covers every major inference server in production today — your own GPU stack, your cloud tenant's managed LLM, or any public vendor API — without changing the Bolt runtime.

The three modes

What customers actually configure.

Ordered from most private to least private. Mode A and Mode B are the configurations regulated industries adopt. Mode C is useful for evaluation and lower-sensitivity workloads, and degrades gracefully via the masking layer described below.

Mode A

On-prem inference

Strongest data-residency. Customer-owned GPUs.

Customer setup
vLLM, llama.cpp, TGI, Ollama — any OpenAI-compatible server on your hardware.
What leaves the perimeter
Nothing. Stays inside your perimeter.
Endpoint privacy tier
Local-only
Suited for
Classified, ITAR, air-gapped, sovereign cloud.
Mode B

In-tenant cloud LLM

No GPUs to run. Prompts never leave your cloud tenant.

Customer setup
Azure OpenAI (Private Endpoints), Bedrock (VPC interface), Vertex (Private Service Connect), or equivalent.
What leaves the perimeter
Stays inside your cloud tenant. No public internet.
Endpoint privacy tier
No-train-default
Suited for
The common regulated config. Reuses your existing cloud-LLM contract.
Mode C

Public vendor LLM API

Single-user evaluation, lower-sensitivity workloads, accepted vendor-API contract.

Customer setup
OpenAI, Anthropic, OpenRouter, Gemini — any OpenAI-compatible public API.
What leaves the perimeter
PII-masked prompt to vendor API. Unmasked record stays in your audit log.
Endpoint privacy tier
No-train-default · Trains-by-default (Gemini free)
Suited for
Eval, internal-only, public-scope content. Not the posture for regulated data.

Endpoint privacy taxonomy

Every configured endpoint gets a tier.

The Bolt runtime derives a privacy tier from the provider, the base URL, and the model. The derivation is a pure function — no I/O, no provider call. The tiers are an ordinal scale so 'minimum acceptable tier' is a single comparison.

  1. 1. Local-only. No data leaves the host. Local inference servers running on customer hardware.
  2. 2. ZDR. Contractual zero-data-retention. Either ZDR routes on a routing layer, or an enterprise ZDR contract the customer holds with the provider.
  3. 3. No-train-default. Provider does not train on inputs by default. May retain briefly for abuse review.
  4. 4. Trains-by-default. Provider may train on inputs unless the customer has opted out elsewhere.
  5. 5. Unknown. Endpoint policy is not knowable from the request side. Custom base URLs that are not obviously local fall here.

The derivation is conservative. When in doubt, the runtime labels an endpoint as less private than it might actually be. A customer with a ZDR contract on top of an OpenAI deployment, for example, will see the endpoint labelled No-train-default in the runtime; the contractual upgrade lives in their record of the agreement, not in a probe the runtime can perform.

The masking layer

PII is masked before any outbound LLM call.

Mode A and Mode B already keep the prompt inside the perimeter. The masking layer is what protects Mode C, and what keeps every other mode's audit log free of model-side exposure.

Detect before the call

Every request body is scanned for PII before the LLM adapter is invoked. Detected values are replaced with opaque tokens, and a mapping between tokens and real values is stashed in the per-request context.

Restore on the way out

When the LLM response comes back, an outbound layer rehydrates the tokens before the response leaves the process. The model sees masked text; the calling user sees the real values.

Tools see real values

Tool calls bypass the masking layer by design. A tool that needs a real customer ID gets the real customer ID; only the LLM boundary is masked. This is a deliberate contract — masking the tool surface would corrupt the workflow.

Your perimeter keeps the record

The conversation is retained inside your perimeter, encrypted at rest and governed by your retention rules. The cryptographically signed audit trail records tamper-evident metadata of every privileged action — who, what, when — never the prompt or response text. The LLM only ever sees the masked form.

Shipped, rolling out, and not claimed

Honest about which parts are operational.

The trust center's voice across every page on this site. We will tell you what is in production today, what is in flight, and what we deliberately do not claim.

What is shipped

The five-tier privacy taxonomy, the per-endpoint tier derivation, the PII-masking layer that runs before every outbound LLM call, the outbound-restoration layer, and the cryptographically signed, tamper-evident audit trail of every privileged action. These are operational today across all three deployment modes.

What is rolling out

A configurable minimum-tier policy floor enforced at boot. The taxonomy is in place; the boot-time refusal-to-start-on-misconfiguration gate is being wired across all the call sites that read endpoint config. Customers who want this enforced today can have it as a contractual requirement.

What we do not claim

We do not claim that data never leaves your perimeter in Mode C. It does — in masked form. We do not claim ZDR by default; ZDR is a contract the customer holds with their provider, and Bolt reflects that label without verifying it from the request side. The runtime is conservative in both directions: when in doubt, it labels an endpoint as less private than it might actually be.

What this means for your edge AI inspection

The AI workload no longer routes through it.

Inline AI gateways exist to inspect outbound prompts on their way to a public LLM. In Mode A and Mode B, the call never reaches the public internet, so the inline gateway has no traffic to inspect for AI use. In Mode C, the traffic that does leave carries the masked form; the proxy can still log the connection, but the masking has already done the work the DLP layer of an inline gateway exists to do.

The AI Protect line item of an edge gateway can be retired once Bolt becomes the sanctioned AI surface. The general SWG and ZTNA spend stays, because that protects web browsing and SaaS access — workloads Sparcle does not replace. We are the AI runtime, not the network.

Architecture review.

If your security or network team wants to walk this in detail against your current edge configuration, a 30-minute architecture review covers the deployment mode, the masking layer, the audit-log surface, and the endpoint privacy floor your policy team wants enforced.