(article originally posted on Medium)
The Byzantine Generals Problem is a metaphor used in computer science to illustrate the difficulty of achieving and maintaining security on a distributed network. In order to solve this problem, ‘honest’ nodes (e.g. computers or other physical devices) must be able to reach consensus despite the presence of ‘dishonest’ nodes. This means that a majority of nodes must establish a set of rules and come to an agreement on how to enforce those rules on the network.
In this article, we will delve deeper into what the Byzantine Generals Problem is and why it is so important to solve. We will examine some of the best solutions that have been proposed and implemented over the years. Finally, we will explore how various blockchain networks have used consensus protocols to overcome the Byzantine Generals Problem and facilitate secure transactions.
What constitutes the Byzantine General ’s Problem ?
The Byzantine Generals Problem is a common challenge that decentralized computer systems must overcome. Let’s examine this analogy and how it relates to modern data security.
The Byzantine General’ s Problem Release Paper
In 1982, computer scientists Leslie Lamport, Robert Shostak, and Marshall Pease published a research paper titled “The Byzantine Generals Problem.”
The significance of this concept is evident from the first page, which states that the research was funded by NASA, the Ballistic Missile Defense Systems Command, and the Army Research Office.
Although the Byzantine Generals Problem had existed in computer science long before 1982, this was one of the first attempts to put this issue into a relatable analogy and propose solutions for solving it.
The comparison used for the Byzantine Generals Problem is of the following :
Different divisions of the Byzantine army are stationed outside an opponent castle and are preparing for invasion. The army generals can only communicate with each other via human messenger.
They are forced to agree upon a common course of action otherwise battle will be lost. However, a fair assumption would be that some of the generals are traitors who wish to prevent the loyal generals from agreeing on a common course of action.
It is paramount to ensure that a small group of malicious generals is not able to disrupt communication.
To solve the Byzantine Generals Problem, the loyal generals need a secure way to reach agreement on a plan (aka consensus) and pursue with their chosen plan (aka coordination).
While actually solving the Byzantine Generals Problem is quite complex, we now understand the fundamental challenge. It is important to acknowledge that the concept can be applied strictly to military communications, as the analogy does. However, this issue also applies to all types of computer systems.
Whenever a distributed group of nodes are required to achieve reliable communication, the network must tackle the Byzantine Generals Problem.
What are Byzantine Failures ?
It exists several potential causes for a distributed computer system to crash.
These are referred to as Byzantine failures.
Using the analogy described formerly, Byzantine failures are equivalent to the traitorous generals who try to mess up communication between the loyal ones.
In practical terms, Byzantine failures can be software bugs, hardware malfunctions, or malicious attacks. In other words, Byzantine failures do not necessarily have to be a deliberately orchestrated attack from a malicious actor. They can simply be issues that prevent nodes from reaching agreement on decisions for a distributed network.
What is Byzantine Failure Tolerance ?
Byzantine failures are practically unavoidable within any distributed computer system.
For example, consider a power outage that causes some nodes to go awry in a data center for example. Is the network still able to operate normally and maintain reliable communication? Or does the whole system out-of-the-blue stop working or become susceptible to attacks?
In a moderately secure network, a simple event like a few nodes going offline would not have a significant impact on the network. The ability to withstand such scenarios is known as Byzantine Fault Tolerance. Networks handling more Byzantine failures are considered to have a superior tolerance, meaning they are more secure than networks that cannot handle Byzantine failures.
To prevent against such scenarios, networks must have a high degree of Byzantine Fault Tolerance. This means that they are able to withstand a certain number of failures without collapsing or becoming vulnerable.
For example, if the case of a power outage (nodes going offline) , a network with high Byzantine Fault Tolerance will still be able to function and maintain reliable communication. On the other hand, a network with low Byzantine Fault Tolerance may stop working .
The Byzantine Fault Tolerant Consensus algorithms are similar to previously created consensus algorithms, except that the replicated databases (aka the participants in the protocol) do not blindly trust one another anymore.
The first real-world BFT consensus was named “Practical Byzantine Fault Tolerant” .
Practical Byzantine Fault Tolerance (pBFT)
At the end of the 20th century, Miguel Castro and Barbara Liskov published a paper that introduced a new algorithm, called Practical Byzantine Fault Tolerance (aka PBFT), that allowed asynchronous networks to process thousands of requests per second with only a slight increase in latency.
PBFT is a consensus algorithm where one leader is in charge of making proposals while the rest of the nodes attempts to agree on the proposals.
However, PBFT had two main issues that limited its adoption: it becomes more expensive to use as the number of participants increases, has bad scalability beyond a dozen participants and it is vulnerable to Sybil attacks.
BFT in the context of distributed ledgers:
Bitcoin was the first network to utilize a process called cryptocurrency mining with a consensus coined as ‘proof-of-work’.
In this system, the Byzantine Generals Problem is solved by rewarding honest participants who solve complex math problems using powerful computers. Once the correct solution is verified, the miner receives a reward in the blockchain network’s native coin.
While the process involves a few additional steps, the key idea to understand is that large PoW-based blockchain networks like Bitcoin are highly secure. This is because the expenditure would cost gazillion of dollar in hardware equipments to attack the network and provoke a Byzantine failure.
First introduced in 2012, Proof of Stake (PoS) is another blockchain consensus protocol whose objective is the solving of the Byzantine Generals Problem. Contrary to PoW-based networks, PoS-based systems do not rely on cryptocurrency mining. Instead, a procedure called staking is used. In this system, participants (aka validators) stake funds, and validators who stake a larger portion of a blockchain’s coins are able to validate more blocks and earn more rewards.
Users who try to validate invalid transactions or blatantly act maliciously may also loose their staked funds through a slashing process.