Extensible Distributed Coordination

Reference: Tobias Distler, Christopher Bahn, Alysson Bessani, Frank Fischer, Flavio Junqueira (2015). EuroSys ’15. Source file: eurosys15-edc.pdf. URL

Summary

The paper argues that coordination services such as ZooKeeper and DepSpace, which centralize primitives like locks, leader election, and queues for datacenter services, suffer because their fixed limited interfaces force multi-RPC workarounds that are slow and race-prone. The authors propose extensibility: clients can dynamically install small server-side “recipes” that execute atomically on the coordination kernel’s state, turning multi-step RPC patterns into single efficient operations.

They describe a model where extensions are sandboxed for safety (bounded resource use, determinism, no API extension) and present implementations — Extensible ZooKeeper (EZK) and Extensible DepSpace (EDS) — showing that extensibility increases throughput of a distributed queue by over an order of magnitude (17x for ZK, 24x for DepSpace) without breaking Byzantine fault tolerance or adding significant kernel complexity.

Key Ideas

  • Limited coordination-kernel APIs force inefficient RPC chains.
  • Custom extensions execute atomically server-side, like stored procedures for coordination state.
  • Sandboxing requirements: no API changes, security, determinism, bounded resource use.
  • Operation extensions vs. event extensions.
  • Huge throughput gains (17x, 24x) over ZooKeeper/DepSpace.

Connections

Conceptual Contribution

Tags

#coordination #distributed-systems #zookeeper #extensibility

Backlinks