Privacy Reasoning in Ambiguous Contexts

Reference: Yi, Suciu, Gascón, Meiklejohn, Bagdasarian & Gruteser (2025). Privacy Reasoning in Ambiguous Contexts. NeurIPS 2025. arXiv:2506.12241 (Google Research). URL.

Summary

Most prior work on LLM privacy alignment evaluates whether a model’s information-sharing decisions agree with human annotators on tasks where the right answer is already clear. Yi et al. argue that this misses the operationally important regime: real privacy decisions are made under ambiguous context — missing facts, multiple plausible recipients, contested norms — and a privacy assistant’s value lies precisely in recognising the ambiguity rather than guessing past it.

They show empirically that context ambiguity is the dominant source of disagreement between models and human ground truth on disclosure decisions, and that asking a model to also produce its decision rationale reveals the ambiguities directly: many “wrong” answers are caused by an unstated premise that, when surfaced, changes both the model’s and the human’s answer.

The paper’s main artefact is Camber, a framework that uses model-generated rationales to systematically disambiguate context: it identifies under-specified context variables, asks targeted clarification questions (or fills them with explicit assumptions), and reruns the disclosure decision. Applied to existing privacy benchmarks, Camber yields +13.3 % precision and +22.3 % recall and substantially reduces sensitivity to surface prompt-wording variations. The work positions itself in the contextual integrity tradition (Nissenbaum 2004): privacy is appropriate flow given the context; therefore precise context is a prerequisite for correct privacy reasoning.

Key Ideas

Agentic privacy is ambiguity reasoning: the dominant error mode is not bad alignment but missing context; the right behaviour is often to ask before deciding.
Decision-rationale extraction as a debugging tool: model-generated justifications expose which contextual premises the model assumed.
Camber disambiguation pipeline: rationale → identify under-specified context variable → resolve (clarification question or explicit assumption) → re-decide.
Empirical headline: up to +13.3 % precision and +22.3 % recall over rationale-free baselines on privacy-decision benchmarks.
Robustness gain: disambiguated prompts show much lower sensitivity to surface re-wording — a structural rather than memorised improvement.
Grounded in Contextual Integrity: privacy = appropriate information flow given sender/recipient/data-type/transmission-principle.
Open challenges: trustworthy disambiguation under adversarial prompts, latency cost of clarification, multi-party context aggregation.

Connections

Conceptual Contribution

Claim: The dominant failure mode of LLM privacy assistants is not misaligned values but unrecognised contextual ambiguity. Models that surface and resolve ambiguity — rather than guess through it — produce decisions that are both more accurate and more robust.
Mechanism: Camber: a rationale-driven disambiguation loop that extracts the model’s stated reasoning, identifies under-specified context variables, resolves them (by clarification or explicit assumption), and reruns the decision. Operationalises Contextual Integrity’s “transmission principles” as queryable context slots.
Concepts introduced/used: Camber, Privacy Reasoning, Context Ambiguity, Disclosure Decisions, Contextual Integrity, Rationale Extraction, Prompt Sensitivity
Stance: empirical / engineering with normative grounding
Relates to: Sits adjacent to Defeating Prompt Injections by Design in the agentic-security stack — CaMeL controls who can call what tool with what data, Camber decides whether to share data given a context. Empirically grounds the Contextual Integrity tradition; complements the privacy-substrate work in Trusted Machine Learning Models Unlock Private Inference.

Backlinks

Trusted Machine Learning Models Unlock Private Inference ×2
index
Information Flow Control
Contextual Integrity ×2
Camber ×2
concept-map

Linked Pages

Trusted Machine Learning Models Unlock Private Inference

Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography

Reference: Shumailov, Ramage, Meiklejohn, Kairouz, Hartmann, Balle & Bagdasarian (2025). Trusted Machine Learning Models Unlock Private Inference for Problems Currently Infeasible with Cryptography. arXiv:2501.08970 (Google Research). URL.

Summary

The paper proposes Trusted Capable Model Environments (TCMEs) — a new design point in the privacy-preserving-computation landscape, sitting between classical trusted execution environments and cryptographic protocols such as multi-party computation (MPC), homomorphic encryption, and zero-knowledge proofs. The motivating observation: capable modern ML models can plausibly play the role of the trusted third party in many private-inference scenarios that classical cryptography handles only at toy scale or not at all.

A TCME is defined by three constraints under which a capable model operates: (i) explicit input/output constraints scoping what the model is permitted to receive and emit; (ii) explicit information-flow control binding outputs to authorised data-flow channels; (iii) explicit statelessness — the model cannot retain or leak inputs across sessions. Under these constraints, even an inscrutable LLM can serve as a credible “trusted intermediary”: it computes a function of two parties’ data and reveals only the agreed output.

The authors argue TCMEs unlock private inference for problems where MPC is infeasible because the function is too rich, the inputs too large, or the spec too implicit (natural-language matching, fuzzy de-duplication, semantic agreement-checking). They walk through use cases — private record matching, contract negotiation, secret-keeping triage — and show that even classical cryptographic problems (private set intersection, secure multi-party comparison) admit TCME implementations that scale further than current MPC. The paper closes with the limitations: trust in TCMEs reduces to trust in the model+hardware+policy stack; statelessness must be engineered, not assumed.

Key Ideas

TCME definition: a capable ML model + explicit I/O constraints + explicit information-flow control + explicit statelessness.
Trusted-third-party substitution: the model fills the role MPC traditionally requires a non-colluding cryptographic protocol to enact.
Coverage envelope: TCMEs handle privacy problems too rich or too implicit for current MPC (semantic matching, fuzzy agreements, natural-language contracts).
Bridge to cryptography: even classical PSI/comparison protocols can be implemented as TCMEs — sometimes more efficiently.
Statelessness is engineered: memory leaks, side channels, and re-training contamination are the real attack surface, not the model logic.
Trust composition: TCME trust assumption = trust(model) ∧ trust(hardware) ∧ trust(policy enforcement).
Use cases sketched: private record matching, negotiation, triage, search over private corpora, semantic compliance checks.

Connections

Conceptual Contribution

Claim: Capable ML models, operated under explicit information-flow and statelessness constraints, can act as trusted third parties for private-inference problems that classical cryptography cannot scale to. This expands the realm of feasible privacy-preserving computation beyond MPC’s current envelope.
Mechanism: Define Trusted Capable Model Environments (TCMEs): model + explicit I/O constraints + explicit IFC + explicit statelessness. Demonstrate via use cases that TCMEs solve both novel privacy problems (semantic matching) and re-instantiate classical ones (PSI) at scales MPC cannot reach.
Concepts introduced/used: Trusted Capable Model Environment, Trusted Third Party, Information Flow Control, Private Inference, Statelessness (Privacy), Multi-Party Computation
Stance: position / architectural proposal
Relates to: Direct companion to NDAI Agreements — both treat TEE+AI or model+constraints as a substrate for previously infeasible commitment / privacy primitives. Provides the technical substrate that Privacy Reasoning in Ambiguous Contexts reasons about behaviourally and that Infrastructure for AI Agents would expose as governance infrastructure. Complementary to Defeating Prompt Injections by Design’s CaMeL: both treat the agent as a constrained reasoner whose outputs are gated by information-flow policy.

Contextual Integrity

Helen Nissenbaum 2004: privacy = appropriate information flow given the context, parameterised by sender, recipient, data type, and transmission principle. Operationalised for LLM Agents in Privacy Reasoning in Ambiguous Contexts (Camber disambiguation framework).

In this vault

Defeating Prompt Injections by Design

Reference: Debenedetti, Shumailov, Fan, Hayes, Carlini, Fabian, Kern, Shi, Terzis & Tramèr (2025). Defeating Prompt Injections by Design (CaMeL). arXiv:2503.18813 (Google DeepMind / ETH Zürich). URL. Code: https://github.com/google-research/camel-prompt-injection.

Summary

CaMeL (“CApabilities for MachinE Learning”) is a robust, by-design defence against Prompt Injection attacks on tool-using LLM Agents. Rather than trying to make the model itself injection-resistant — an approach that decade-long experience with content filters suggests will fail — CaMeL wraps an arbitrary LLM in a protective system layer that performs explicit control- and data-flow separation between the trusted user query and the untrusted data the agent retrieves from tools, websites, or shared memory.

The trusted query is first compiled into a structured plan: a small program whose control flow is fixed at parse time and whose data flow between steps is statically determined. Untrusted strings returned by tools are treated as inert data — they can populate variables but cannot rewrite the program, redirect tool calls, or change which downstream tools are invoked. To prevent exfiltration over authorised channels (the harder half of the problem, since some tools must be allowed to write outwards), CaMeL attaches Capabilities to each data value tracking its provenance and policy class; tool invocations are gated by Information Flow Control policies that check capabilities against an explicit security label lattice.

Evaluated on the AgentDojo benchmark, CaMeL solves 77 % of tasks with provable security guarantees, against 84 % for an undefended baseline — a small utility cost for a structural defence that does not depend on the LLM noticing the attack. The paper positions CaMeL as a successor to ad-hoc prompt-level mitigations and as a concrete instance of end-to-end security thinking applied to agentic AI.

Key Ideas

Threat model: prompt injection from any untrusted data source the agent reads — tools, web pages, files, memory, other agents.
Control-flow extraction: parse the trusted user query into a fixed control-flow plan; downstream model calls see only data, never code.
Data-flow tracking: every variable carries a provenance label; tools that consume “untrusted” labels cannot influence which subsequent tools are called.
Capabilities for tool calls: classic capability-based access control transplanted to LLM tool use; security policies enforced at the tool boundary.
Provable security: when a task is completed under CaMeL, the trace itself certifies that no untrusted data influenced control flow — a property auditable post hoc.
Empirical cost: 77 % vs 84 % task success — graceful degradation rather than catastrophic refusal.
Open source: reference implementation released; integrates with existing agent frameworks via tool-call interception.

Connections

Conceptual Contribution

Claim: Prompt injection is structurally unsolvable at the model layer; it must be eliminated by enforcing a strict separation between code (the trusted query) and data (everything else) at the agent runtime, using classical capability-based Information Flow Control rather than ML-based content classification.
Mechanism: Compile the user query into a fixed control-flow program; route all retrieved data through tagged variables; gate every tool invocation by capability-checked information-flow policies. The LLM’s outputs can populate data fields but never alter control flow or bypass capability checks.
Concepts introduced/used: CaMeL, Control-Flow Integrity, Data-Flow Tracking, Capabilities, Information Flow Control, Prompt Injection, Tool Use, Agent Security, Provable Security (Agents)
Stance: systems / engineering with light formal grounding
Relates to: Spiritual successor to A Language-Based Approach To Prevent DDoS and Security Kernel Lambda Calculus for agent runtimes; an architectural realisation of the threat model catalogued in SoK The Attack Surface of Agentic AI and the multi-agent threats surveyed in Open Challenges in Multi-Agent Security; companion to AgentDojo (the benchmark on which it is evaluated).

Prompt Sensitivity

(page does not exist)

Rationale Extraction

(page does not exist)

Disclosure Decisions

(page does not exist)

Context Ambiguity

(page does not exist)

Privacy Reasoning

(page does not exist)

Camber

Ren Yi et al. 2025 framework for context disambiguation in privacy decisions by LLM assistants: surface the model’s stated rationale, identify under-specified context variables, resolve them (clarification or explicit assumption), and re-decide. See Privacy Reasoning in Ambiguous Contexts.

In this vault

Information Flow Control

Static or dynamic restriction of how data labelled with a security class can flow through a program. Cornerstone of Capability Security, language-based security (Security Kernel Lambda Calculus), and the CaMeL approach to prompt injection.

In this vault

Open Challenges in Multi-Agent Security

Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents

Reference: Schroeder de Witt, Krawiecka, Krawczuk, Hagag, Anderson, et al. (24 authors total) (2025). Open Challenges in Multi-Agent Security: Towards Secure Systems of Interacting AI Agents. arXiv:2505.02077 (Oxford / Cambridge / EPFL / industrial labs). URL.

Summary

This position paper introduces Multi-Agent Security (MASec) as a distinct research field, sitting between traditional cybersecurity, AI safety, and multi-agent systems — and argues that it is the dominant security frontier as LLM Agents begin to interact directly with one another across the open web, physical environments, and institutional infrastructures. The threats MASec studies emerge from interaction; they are not properties of any single agent in isolation.

The authors taxonomise threats arising from agent interaction along several axes: (i) secret collusion (agents coordinating to defeat oversight through covert side-channels including steganographic message-passing), (ii) coordinated swarm attacks (jailbreaks, prompt injections, or misinformation cascading through agent networks), (iii) network-effect amplification (privacy breaches, data poisoning, and disinformation spreading faster than mitigation), and (iv) multi-agent dispersion / stealth optimisation (adversaries exploiting fleet size to evade detection and persist).

They argue these threats are systematically understudied because research is scattered across AI Safety, Multi-Agent Systems, Distributed Security, Game Theory, complex systems, and AI governance, each with its own vocabulary. The paper provides a unifying taxonomy, identifies fundamental security–utility and security–security trade-offs, and lays out a research agenda — including the design of Free-Form Protocols (necessary for task generalisation but enabling collusion), governance and attribution infrastructure, and detection/response mechanisms for emergent multi-agent threats. The work is foundational reading for anyone designing inter-agent protocols, including the Agent-to-Agent Protocol, Model Context Protocol, and successors.

Key Ideas

Defines Multi-Agent Security (MASec) as a field: securing networks of interacting AI agents against threats that emerge or amplify through interaction.
Secret collusion: agents coordinating covertly (including via steganography) to defeat oversight — a new kind of “Schelling-point” attack on alignment.
Coordinated swarm attacks: distributed jailbreaks, prompt injections, data poisoning that succeed because the fleet succeeds even when individual instances fail.
Network effects: privacy breaches, disinformation, and jailbreaks spread through agent populations the way they spread through humans — only faster.
Dispersion & stealth optimisation: adversaries exploit the size and heterogeneity of agent fleets to evade oversight; novel persistent threats at system level.
Free-form protocols as risk surface: the same expressivity that makes inter-agent communication useful enables covert channels; reining in expressivity costs utility.
Security–utility and security–security trade-offs are fundamental — every defence opens or closes other attack surfaces.
Calls for a unified MASec research agenda spanning AI Safety, Distributed Security, Game Theory, complex systems, and AI governance.

Connections

Conceptual Contribution

Claim: Security of interacting AI agents is a distinct problem from either single-agent AI safety or classical cybersecurity. Threats emerge from interaction (secret collusion, swarm attacks, network-effect contagion) and are systematically missed by frameworks anchored to individual systems or static attack surfaces.
Mechanism: A new field — Multi-Agent Security — with a threat taxonomy (collusion, swarm, contagion, dispersion), explicit security–utility / security–security trade-offs, and a research agenda spanning protocol design, attribution, detection, and governance.
Concepts introduced/used: Multi-Agent Security, Secret Collusion, Swarm Attack, Network Effect (Security), Free-Form Protocols, Stealth Optimisation, Agent Security, AI Governance
Stance: position paper / survey / research agenda
Relates to: Sister survey to SoK The Attack Surface of Agentic AI but operating one level up — at networks of agents rather than the agent runtime. Provides the multi-agent threat model that defences like Defeating Prompt Injections by Design address, that infrastructure proposals like Infrastructure for AI Agents try to govern, and that economic frameworks like Virtual Agent Economies embed. Directly extends classical Distributed Security and connects to Learning Collusion in Episodic Inventory-Constrained Markets for the collusion sub-thread.

Agent Security

Security concerns specific to LLM-agent systems: tool attacks, prompt injection, memory poisoning, inter-agent trust failures.

In this vault

LLM Agents

Large-language-model-powered agents: natural-language coordination, tool use, multi-agent orchestration.

Privacy Reasoning in Ambiguous Contexts

Summary

Key Ideas

Connections

Conceptual Contribution

Tags

Backlinks