Byzantine Agreement

The classical problem (Pease, Shostak & Lamport 1980; Lamport, Shostak & Pease 1982 Byzantine Generals Problem): get n processes to agree on a value when up to f of them may be Byzantine (arbitrarily faulty, including coordinating malicious behaviour). The standard bound is n ≥ 3f+1 for deterministic agreement with authenticated messages, n ≥ 3f+1 is also necessary in the partially-synchronous model. Practical algorithms: PBFT (Castro & Liskov 1999), HotStuff (Yin et al. 2019), Tendermint, Honeybadger.

In this vault

Backlinks

Linked Pages

Distributed Consensus

The problem of getting a set of distributed processes to agree on a single value despite some of them failing or behaving adversarially. The negative bound is FLP: in a fully asynchronous network with even one crash failure, no deterministic protocol can guarantee both safety and liveness. Practical algorithms (Paxos, Raft, PBFT, HotStuff) circumvent FLP by making partial-synchrony or randomisation assumptions. Crash-fault tolerance requires 2f+1 nodes to tolerate f failures; Byzantine tolerance requires 3f+1.

In this vault

Byzantine Fault Tolerance

The property of a distributed protocol to reach correct consensus despite arbitrary, including malicious, failures of up to f of 3f+1 participants. BFT underlies replicated coordination kernels (e.g., DepSpace/EDS) and motivates constraints on server-side extensions to preserve determinism.

In this vault

HotStuff

Yin et al.’s (2019) BFT consensus protocol with linear communication complexity (O(n) per decision in both the common case and view change) and responsiveness (commits at network speed, not max-timeout speed) — both improvements over PBFT. Achieves linearity via threshold-signature quorum certificates and uniform three-chain commit; chained variant pipelines three views into one message per leader. The consensus core of Diem, Aptos, Sui’s Mysticeti, and many recent BFT-PoS blockchains.

In this vault

PBFT

Castro & Liskov’s (1999) Practical Byzantine Fault Tolerance protocol — 3f+1 replicas tolerating up to f Byzantine failures via a three-phase primary-backup protocol (pre-prepare / prepare / commit) plus view-change recovery. The first BFT protocol fast enough for production deployment; demonstrated on Byzantine NFS within 3% of unreplicated performance. Direct ancestor of Tendermint, HotStuff, Diem/Aptos consensus.

In this vault

Practical Byzantine Fault Tolerance

Reference: Castro, M. & Liskov, B. (1999). Practical Byzantine Fault Tolerance. In 3rd Symposium on Operating Systems Design and Implementation (OSDI ’99), pp. 173–186, USENIX. Companion: Castro, M. (2001). Practical Byzantine Fault Tolerance. PhD thesis, MIT. Journal version: ACM Transactions on Computer Systems 20(4), pp. 398–461, 2002. Open access PDF (MIT CSAIL) · USENIX OSDI ’99 page · Journal version (BFT-TOCS)

Summary

Castro and Liskov demonstrate that Byzantine fault tolerance — agreement among 3f+1 replicas in the presence of up to f arbitrarily faulty (malicious, buggy, compromised) nodes — can be made practical: their protocol, PBFT, achieves throughput within a small factor of unreplicated service for realistic workloads, where prior BFT protocols had been orders of magnitude slower. The protocol assumes the partial-synchrony model (eventual upper bound on message delay) for liveness; safety holds in fully asynchronous networks. The core protocol is a three-phase primary-backup scheme — pre-prepare, prepare, commit — driven by a designated primary (replica p such that p ≡ v mod n for view number v). The primary orders client requests; the prepare phase ensures 2f+1 replicas agree on the order in the current view; the commit phase ensures persistence across view changes. A view change protocol replaces the primary if it is suspected of failure: backups time out, exchange certified message logs, and elect the next primary; the new primary reconstructs the longest committed prefix from the received certificates. The two key engineering moves that make BFT practical are (1) MAC vectors (one symmetric MAC per recipient) instead of public-key signatures on every message — public-key crypto is reserved for view changes — and (2) a careful checkpoint-and-garbage-collect mechanism that bounds memory and accelerates recovery. The paper applies PBFT to a Byzantine-fault-tolerant NFS implementation; performance is within 3% of unreplicated NFS for realistic file-system workloads. PBFT inaugurated 25+ years of practical BFT research and is the direct ancestor of modern blockchain consensus protocols including Tendermint, HotStuff, and Diem/Aptos’s BFT family.

Key Ideas

3f+1 replicas tolerate f Byzantine failures: the standard BFT bound (Lamport, Shostak & Pease 1982) — needed because Byzantine replicas can equivocate, so a 2f+1 quorum (sufficient against crash failures) can be split if f Byzantine replicas vote opposite ways to two halves of the honest set.
Three-phase commit driven by a primary: pre-prepare (primary assigns sequence number n to a request, broadcasts), prepare (each replica that accepts the pre-prepare broadcasts a prepare message; once 2f+1 prepare messages agree, the request is prepared), commit (each prepared replica broadcasts commit; once 2f+1 commit messages agree, the request is committed and executed).
View changes for primary failure: when a backup times out without progress, it broadcasts a view-change message containing certified prepared / committed certificates from the previous view; the new primary (next in round-robin) collects 2f+1 view-change messages and constructs a new-view containing the prepared requests that must be re-executed.
MAC vectors instead of signatures: every message carries a MAC for each recipient (computed under a pairwise symmetric key). One MAC is two orders of magnitude faster than a public-key signature; pairwise MACs are sufficient because Byzantine replicas cannot forge a MAC under a key they don’t know.
Checkpoints and garbage collection: every K requests, replicas take a checkpoint of state and broadcast a checkpoint message; once 2f+1 matching checkpoints exist (a stable checkpoint), older log entries can be discarded. Stable checkpoints also accelerate state transfer for recovering or lagging replicas.
Byzantine-fault-tolerant NFS: end-to-end demonstration that BFT can be deployed in production-style systems with manageable performance overhead — essential evidence that BFT was practical, not just theoretically interesting.
Safety always, liveness under partial synchrony: PBFT preserves safety (no two committed values disagree) under fully asynchronous networks; liveness requires partial synchrony (timeouts must eventually become accurate). FLP-compatible: the asynchronous gap is in liveness, not safety.

Connections

Conceptual Contribution

Claim: Byzantine fault tolerance can be made practical (within a small factor of unreplicated performance) by combining a three-phase primary-backup protocol with view-change recovery, MAC vectors instead of public-key signatures on the common path, and checkpoint-driven garbage collection. Safety holds in asynchronous networks; liveness requires partial synchrony.
Mechanism: Primary-backup protocol with 3f+1 replicas; three phases (pre-prepare, prepare, commit) each requiring 2f+1-strong quorum certificates; view-change protocol triggered by backup timeouts, electing the next primary by round-robin; MAC vectors with public-key crypto reserved for view changes; periodic stable checkpoints for log truncation and state transfer.
Concepts introduced/used: PBFT, Byzantine Agreement, View Change, MAC Vector, Stable Checkpoint, 3f+1 Quorum, Partial Synchrony.
Stance: systems-engineering paper / dissertation summary.
Relates to: Implements Byzantine agreement (Lamport, Shostak & Pease 1982 / Pease, Shostak & Lamport 1980) practically. Subject to the same FLP impossibility as crash-fault consensus, with the same partial-synchrony resolution. Pre-blockchain, BFT was largely a theoretical curiosity; PBFT proved deployment viable, but it took the explicit blockchain-as-economic-system framing of Bitcoin (Nakamoto 2008, not in vault) and especially of Tendermint (Buchman 2016) to drive industrial BFT adoption. Direct ancestor of HotStuff (Yin et al. 2019), which inherits PBFT’s three-phase structure but achieves linear communication complexity (vs PBFT’s quadratic) and responsiveness (no waiting for max network delay during normal operation). PBFT-style three-phase protocols underlie the consensus layers of Diem / Aptos, much of Hyperledger Fabric, and (with adaptations) Cosmos SDK chains. Compared to Raft / multi-Paxos: PBFT tolerates malicious nodes at the cost of 3f+1 replicas (vs 2f+1), three communication phases (vs two), and quadratic message complexity (vs linear); for crash-only environments Raft is simpler and faster.