Language Models are Few-Shot Learners

Reference: Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell et al. (2020). NeurIPS 2020. Source file: 2005.14165.pdf. URL

Summary

Introduces GPT-3, a 175-billion-parameter autoregressive Transformer language model, and shows that scaling up enables task-agnostic few-shot learning purely through in-context demonstrations — no gradient updates or fine-tuning required. The paper establishes the empirical basis for the “prompt as interface” paradigm that underpins modern LLM Agents.

GPT-3 is evaluated across dozens of NLP benchmarks (translation, QA, cloze, Winograd, arithmetic, word unscrambling, SuperGLUE, NLI) in zero-, one-, and few-shot regimes, often matching or exceeding state-of-the-art fine-tuned systems. The authors also examine broader impacts: misuse potential, bias, fairness, and energy cost — topics that later crystallise into the threat surfaces surveyed in AI Agents Under Threat.

Key Ideas

Scaling laws: performance on downstream tasks improves smoothly with model size, compute, and data.
In-context learning: a “meta-learning” inner loop where the model adapts to a task from examples in its prompt window, without weight updates.
Few-shot prompting as a general interface: the prompt becomes the programmable surface for LLM behaviour — the same surface later exploited by Prompt Injection and Jailbreak attacks.
Emergent capabilities (arithmetic, novel-word use) appearing only at scale.
Early catalogue of misuse risks (disinformation, generated news indistinguishable from human-written) foreshadowing agent-era threats.

Connections

Conceptual Contribution

Claim: Sufficiently large autoregressive language models become few-shot learners, performing new tasks from prompt demonstrations alone — establishing the prompt as the universal programming surface for LLM systems.
Mechanism: Train a 175B-parameter Transformer on ~300B tokens of filtered Common Crawl, WebText2, Books, and Wikipedia; evaluate on 40+ benchmarks in zero/one/few-shot settings without gradient updates.
Concepts introduced/used: in-context learning, few-shot prompting, scaling laws, emergent capabilities, prompt-as-interface — all prerequisites for LLM Agents, Tool Use, Prompt Injection, and the threat taxonomy of AI Agents Under Threat.
Stance: empirical/position
Relates to: Foundational substrate cited throughout AI Agents Under Threat; the prompt interface it popularises is the attack surface studied in Prompt Injection, Jailbreak, and ClawWorm Self-Propagating Attacks Across LLM Agent Ecosystems.

Tags

#llm #foundational #few-shot-learning #scaling #in-context-learning

Summary

Presents ClawWorm, the first demonstrated self-replicating, worm-style attack on a production-scale autonomous LLM-agent ecosystem. The target is OpenClaw, an open-source personal AI-agent framework with over 40,000 active instances, a persistent Markdown workspace (SOUL.md, AGENTS.md, SKILL.md), tool-execution privileges, and cross-platform messaging (Telegram, Discord, WhatsApp, Slack, Signal, Moltbook). A single crafted message triggers the victim to write a malicious payload into its highest-privilege configuration file, which then auto-fires at every session restart and autonomously propagates to every newly encountered peer — all without further attacker intervention.

The worm implements a dual-anchor persistence mechanism: one anchor injects the payload into the Session Startup section of AGENTS.md (guaranteeing execution on reboot), the other injects a global interaction rule (guaranteeing propagation during routine replies). Three attack vectors are studied (A: web injection, B: skill-supply-chain via ClawHub, C: direct fenced-code replication with word-by-word handshake) and three payloads (P1 recon, P2 resource exhaustion, P3 command-and-control via URL retrieval). Across 1,800 trials on four frontier LLM backends (Minimax-M2.5, DeepSeek-V3.2, GLM-5, Kimi-K2.5) the aggregate attack success rate is 64.5%, with Vector B (skill supply chain) reaching 81% and sustained multi-hop propagation up to 5 hops. An epidemiological projection with basic reproduction number R0 = k × ASR shows inevitable ecosystem-wide saturation even for security-conscious models.

The root cause is identified as the flat context trust model: the LLM cannot distinguish instructions from its owner, the system layer, or an arbitrary channel participant, so architectural patterns (unconditional workspace loading, LLM-mediated tool authorisation, unreviewed skill packages) amount to structural — not idiosyncratic — vulnerabilities shared by any agent ecosystem of similar design.

Key Ideas

Single-message, fully autonomous worm against a production agent framework

Dual-anchor persistence: Session Startup + global interaction rule

Three attack vectors (web URL, skill supply chain, direct instruction replication)

Multi-turn autonomous-retry social engineering boosts ASR by up to +24 pp

Epidemiological SI model with R0 = k × ASR predicts ecosystem saturation

Execution-layer guardrails alone cannot halt propagation (dormant payloads persist)

Flat context trust model as structural root cause

Conceptual Contribution

Claim: Production-scale autonomous LLM-agent ecosystems are vulnerable to single-message, self-replicating worms whose root cause is architectural (flat context trust, unconditional config loading, unreviewed skill supply chains), not model-specific.

Mechanism: Empirical red-team against unmodified OpenClaw v2026.3.12 across four LLM backends, three vectors, three payloads (1,800 trials). A dual-anchor persistence pattern writes the payload to AGENTS.md and installs a global propagation rule; session-restart loading re-injects the payload into the system prompt; routine replies carry the payload to peers. Evaluated with per-phase metrics (persistence, execution, propagation) and a mean-field R0 epidemiological projection.

Concepts introduced/used: Self-Replicating Agent, Dual-Anchor Persistence, Flat Context Trust Model, Skill Supply Chain Attack, Indirect Prompt Injection, Agent Worm, Configuration Integrity, Multi-Turn Social Engineering, Epidemiological Projection R0

Stance: empirical / critique

Relates to: Concrete multi-agent instantiation of the threat surface catalogued in SoK The Attack Surface of Agentic AI. The flat-trust critique complements the trust-model taxonomy in Inter-Agent Trust Models - A Comparative Study and the safety failures observed in Agents of Chaos. Motivates verifiable specifications of the kind proposed in Intent Formalization - A Grand Challenge for Reliable Coding.

Summary

This survey organizes the emerging threat landscape of LLM-powered AI agents around four knowledge gaps: unpredictability of multi-step user inputs, complexity of internal execution, variability of operational environments, and interactions with untrusted external entities. It unifies single-agent and multi-agent attack surfaces within a perception/brain/action + agent2agent/agent2env/agent2memory taxonomy.

Concrete threats reviewed include adversarial prompts, prompt injection, jailbreaks, backdoor attacks, hallucination and misalignment, tool-use risks, indirect prompt injection, reinforcement-learning environment attacks, cooperative and competitive inter-agent risks, and long/short-term memory attacks. The authors tabulate defenses (prevention- and detection-based), rate their efficacy, and highlight open directions for robust and trustworthy agents.

Key Ideas

Four knowledge gaps framing agent security.

Taxonomy: perception / brain / action / agent2agent / agent2env / agent2memory threats.

Six categories of prompt-injection attack engineering (naive, escape, context-ignore, fake-completion, multimodal, combined).

Jailbreak domino effect in multi-agent populations.

Memory poisoning and indirect prompt injection as underexplored surfaces.

Conceptual Contribution

Claim: LLM Agents security should be organised around four knowledge gaps (input unpredictability, internal complexity, environmental variability, untrusted interactions) mapped onto a perception/brain/action + agent2{agent,env,memory} taxonomy.

Mechanism: Surveys adversarial prompts, prompt injection, jailbreaks, backdoors, hallucination, tool-use risks, indirect injection, RL environment attacks, inter-agent cooperative/competitive risks, memory poisoning; tabulates prevention- vs detection-based defences and rates their efficacy.

Concepts introduced/used: Prompt Injection, Jailbreak, Backdoor Attacks, Tool Use, Memory Poisoning, Hallucination, Model Context Protocol, LLM Agents, Multi-Agent Systems, Trust and Reputation, Distributed Security, Agent Security

Stance: survey

Relates to: Provides the threat scaffolding that MalTool Malicious Tool Attacks deepens at the tool layer; complements lifecycle threats in Survey Of Agent Interoperability Protocols; motivates static-analysis defences like A Language-Based Approach To Prevent DDoS.

Language Models are Few-Shot Learners

Summary

Key Ideas

Connections

Conceptual Contribution

Tags

Backlinks