Paxos
CS 505 Spring 2021
Leader Election
● Allows the system to skip phase 1 (prepare) when there is a stable leader
● 2 possible designs (maybe more?)
○ Proposers immediately start first phase of Paxos when they are “preempted”
(Design outlined in PMMC, but we don’t recommend doing this)
■ Can be inefficient - no bound on the number of preemptions that can occur
■ Case when proposers compete with each other with higher proposal number
○ Alternative: assume that the preempting node (higher proposal number) is the new
“leader”. Wait for that node to drive the Paxos instance to a chosen value (or timeout).
■ Still need to handle multiple simultaneous requests!
○ Alternative 2: read the lab3 spec
Lab 3 Tips
● We recommend using PMMC as a starting place, but…
○ Don’t code up PMMC exactly (lab3 spec provides guidance)
○ Try to figure out what each part/role does and why it’s there
● Every PaxosServer will be playing acceptor and learner roles
○ If a server believes itself to be leader (or is trying to become leader), it will also play the proposer
role
○ With regards to PMMC and implementing something similar, try to avoid making inner classes for
roles like Scout or Commander, since that may blow up the state space
● Read the lab3 spec about “sending” a message from a server to itself
● If you receive a message with a higher ballot number than yours, stop being
leader (or stop trying to become leader)
○ Not required for correctness, but makes it easier to implement
General Tips/What previous students said
● Start early (applies to life too!)
● According to a majority of students in previous offerings/quarters @ UW:
Paxos is non-trivial
● Students from UW also say to expect this to take more time than you think it
would and that they wish they started earlier and that we emphasized to them
that they should start earlier, hence this slide
● Please start
Other Resources
● This Google tech talk: https://youtu.be/d7nAGI_NZPk
● Stanford lecture: https://youtu.be/JEpsBg0AO6o
● PMMC in a website: http://paxos.systems/
Problem: Contention
● If many proposers are sending values for the same slot at the same time,
problems can occur
○ Acceptors may reject accept requests if proposal number isn’t high enough
● Don’t want to deal with contention for every log slot!
● Solutions:
○ Random timeouts
○ Exponential backoff
○ Stable leader
PMMC vs. Lab 3
● Split into roles (replicas,
acceptors, leaders, etc…)
● Multi-threaded
● Handles reconfiguration
● Retries leader election
when preempted
● One class (PaxosServer)
○ Handles all the roles
● Single-threaded
● Ignores reconfiguration
● Uses heartbeat messages to
determine when to begin leader
election
Phase 2: Leader
● Upon receiving a P2A, an acceptor should only accept it if the ballot number in the message
matches the acceptors ballot number
○ This means the acceptor considers the sender to be the current leader
○ Prevents split brain problems
○ Can always reply with a P2B, just don’t update state if ballot numbers don’t match!
● Need a way to sync logs between leader and all the acceptors. One way - when leader sends an
accept message, attach the leader’s log with the request. When acceptor replies, attach the
acceptor’s log with the response. Then update/merge each side’s log. Can also sync up with
heartbeat messages. Lots of different possible protocols and implementations to consider!
Roles
● As leader:
○ Act as a proposer and acceptor from Basic Paxos
○ Propose requests from clients
■ First, check if the command has already been proposed, decided, or executed
○ Keeps replicas up to date
○ Send heartbeats to servers so they know the leader is alive
■ Can include garbage collection information in these messages (more info soon)
● As anyone else (not leader):
○ Drop client requests
○ Act only as an acceptor, not a proposer
■ That is, until the server believes the leader died. Then it should start phase 1 (scout
phase)
Garbage Collection
● Problem: log (and thus memory usage) can grow indefinitely
● Solution: garbage collect (delete) log entries that are no longer needed
● Can discard log slots that everyone has already executed
● Need a way to determine what slots other nodes are done with
○ One solution: piggyback this information on heartbeat replies
Garbage Collection: 3 servers in the system
Server1 (leader):
Latest executed slot: 4
Server2:
Latest executed slot: 4
Server3:
Latest executed slot: 3
Hasn’t heard about slot 4
Can garbage collect slots 1-3
Garbage Collection: 5 servers in the system
Server1 (leader):
Latest executed slot: 4
Server2:
Latest executed slot: 4
Server3:
Latest executed slot: 3
Hasn’t heard about slot 4
Cannot garbage collect anything until
all servers have executed a slot
Server4:
Latest executed slot: N/A
Server5:
Latest executed slot: N/A
Lab 3 Questions, q1
Question: What’s the difference between a ballot number and a log slot?
Answer:
● Ballots are tuples of (round, leader) and are used during phase 1 of Multi-
Paxos (analogous to proposal numbers from Basic Paxos).
● Log slots are more general – namely, each log slot corresponds to an
instance of Paxos.
○ Basic Paxos has no notion of log slots because it is only used to decide on a single value
○ A single ballot can be used to decide multiple log slots
■ This means the leader did not change
Lab 3 Questions, q2
Question: When are you done with a log slot? How can you make sure that this
is done efficiently?
Answer: You are done when all servers in the Paxos group have executed the
request. You can piggyback messages to let everyone know not only what you
are done with, but what you think everyone else is done with.
● Note: Some log slots may have not been decided because the proposer may have died, stopping the
push to agreement on that log slot. You can resolve this by proposing a value, which will either drive
any already accepted value to be chosen for that log slot, or will place a no-op into that log slot.
Dealing with Log “Holes”/Proposing no-ops
If we have in our log:
put(a, 1), ______, ______, append(a, 2)
Do we know those empty log slots are decided? How can we push these log
slots to completion?
Your implementation needs to be able to handle "holes" in the Paxos log. That is,
at certain points, a server might see agreement being reached on a slot but not
previous slots. Your implementation should still make progress in this case.