All notes
Technical·November 2024

CAP Theorem and Real Systems


CAP theorem: a distributed system can guarantee at most two of — Consistency, Availability, Partition tolerance.

Since network partitions happen in the real world and you can't opt out, the real tradeoff is: when a partition occurs, do you want the system to return potentially stale data (AP) or refuse to respond until it's sure it has fresh data (CP)?

CP systems (Zookeeper, HBase): consistent when things go wrong, but some nodes may become unavailable. Better for coordination, locks, leader election.

AP systems (Cassandra, CouchDB): available when things go wrong, but different nodes may temporarily see different data. Better for user-facing reads where eventual consistency is fine.

The thing CAP doesn't capture: latency. A CP system that returns in 100ms vs. 10 seconds is a different product. PACELC is a more complete model: even without a partition, you're trading off latency against consistency.

In practice: most systems are configurable — Cassandra lets you tune consistency level per query. The interesting decisions are not "pick one" but "pick the right tradeoff for each operation."