Stage 1 of 6 · Estimated 4–6 weeks
This stage constructs the cognitive and technical foundation upon which every subsequent blockchain concept rests. You will develop fluency in networking, cryptography, data structures, distributed systems theory, and game theory — not as abstract academic exercises, but as the exact engineering primitives that make blockchains possible.
Every blockchain failure — every exploit, every broken consensus, every flawed protocol — traces back to a misunderstanding of fundamentals. Engineers who skip this stage build fragile systems. They copy patterns without understanding invariants. They write smart contracts without understanding the execution environment beneath them. This stage ensures you are not that engineer.
You can explain how data moves across the internet at a packet level. You can implement basic cryptographic primitives from scratch. You can construct and verify Merkle trees. You can reason about distributed consensus failures using formal models. You can articulate why game-theoretic incentives are not optional in open networks — they are the architecture.
Internet & Networking
├── How data moves (TCP/IP, HTTP, DNS)
├── Client-server vs peer-to-peer models
└── NAT traversal, gossip protocols
│
▼
Cryptography
├── Hash functions (SHA-256, Keccak)
├── Asymmetric cryptography (ECDSA, EdDSA)
├── Digital signatures
└── Commitment schemes
│
▼
Data Structures
├── Hash-linked lists (blockchain as a structure)
├── Merkle trees
├── Patricia/Radix tries
└── Bloom filters
│
▼
Distributed Systems
├── CAP theorem
├── Failure models (crash, Byzantine)
├── State machine replication
├── Byzantine Fault Tolerance
└── Practical BFT algorithms
│
▼
Game Theory
├── Nash equilibrium
├── Mechanism design
├── Incentive compatibility
└── Schelling points
The internet is a layered packet-switching network. Data is broken into packets, each independently routed across interconnected autonomous systems (ASes). The architecture follows the OSI/TCP-IP model:
DNS resolves human-readable names to IP addresses via a hierarchical, cached lookup system (recursive resolvers → root servers → TLD servers → authoritative servers).
Blockchain nodes communicate over the internet. Understanding packet routing, NAT traversal, firewall behavior, and TCP reliability directly impacts your ability to reason about node connectivity, network partitions, eclipse attacks, and propagation delays — all of which affect consensus safety and liveness.
| Tool | Purpose |
|---|---|
Wireshark |
Packet capture and protocol analysis |
tcpdump |
Command-line packet analyzer |
curl / httpie |
HTTP request inspection |
dig / nslookup |
DNS resolution debugging |
traceroute / mtr |
Network path analysis |
netcat |
Raw TCP/UDP socket communication |
dig +trace example.com to trace the full DNS resolution
path from root servers to the authoritative answer.
tcpdump.
mtr to trace the
route to several Ethereum RPC endpoints (infura.io,
alchemy.com). Note hop counts and latency.
Build a simple peer-to-peer chat application using raw TCP sockets in Python. Two peers should be able to discover each other on a local network (via a hardcoded list or a simple discovery mechanism), establish a TCP connection, and exchange text messages bidirectionally. This exercise forces you to handle connection setup, message framing, and concurrent I/O — the same problems that blockchain networking layers solve.
In client-server architecture, a central server mediates all communication. In peer-to-peer (P2P) networks, every participant is both client and server. This eliminates single points of failure but introduces coordination complexity.
Key P2P concepts:
Every blockchain is a P2P network. The security guarantees of a blockchain are only as strong as the networking layer's ability to ensure that all honest nodes receive all valid transactions and blocks in a timely manner. Understanding gossip efficiency, eclipse attack vectors, and network topology is essential for protocol design.
discv5) for peer discovery
and a structured gossip protocol (GossipSub in the
consensus layer via libp2p).
| Tool | Purpose |
|---|---|
libp2p |
Modular P2P networking stack (used by Ethereum 2.0, IPFS, Polkadot) |
devp2p |
Ethereum's P2P networking protocol suite |
| Kademlia | DHT algorithm for structured P2P routing |
GossipSub |
Pub/sub gossip protocol in libp2p |
FIND_NODE and STORE RPCs.
Build a gossip-based message dissemination simulator. Create a network of 50+ simulated nodes. When one node introduces a message, it propagates via gossip. Track and visualize: time to full propagation, redundant messages received, and the effect of node failures (randomly drop 10% of nodes mid-propagation). Output a report showing convergence time and message overhead.
Cryptography provides the mathematical guarantees that make trustless systems possible.
Hash Functions
A cryptographic hash function H maps arbitrary-length input
to a fixed-length output with three properties:
h, it is
computationally infeasible to find m such that
H(m) = h.
m₁, it
is infeasible to find m₂ ≠ m₁ such that
H(m₁) = H(m₂).
m₁ ≠ m₂ such that H(m₁) = H(m₂).
Bitcoin uses SHA-256 (double hashing: SHA256(SHA256(x))).
Ethereum uses Keccak-256 (the original SHA-3 submission, not the
NIST-standardized version).
Asymmetric Cryptography
A key pair (sk, pk) where sk is secret and
pk is public. The core primitive: given pk,
deriving sk is computationally infeasible.
Blockchain systems primarily use elliptic curve cryptography (ECC):
Digital Signatures
A signature scheme consists of three algorithms:
KeyGen() → (sk, pk): Generate a key pair.Sign(sk, m) → σ: Produce a signature.Verify(pk, m, σ) → {0, 1}: Verify the signature.
ECDSA (Elliptic Curve Digital Signature Algorithm) is
used in Bitcoin and Ethereum. The signature (r, s) is
computed as:
k, compute
R = k·G (elliptic curve point multiplication), set
r = R.x mod n.
s = k⁻¹(hash(m) + sk·r) mod n.Critical: Nonce reuse in ECDSA is catastrophic — it leaks the private key. This is not theoretical: the PlayStation 3's ECDSA implementation was broken because Sony used a fixed nonce.
Commitment Schemes
A two-phase protocol:
C = H(value || nonce).
(value, nonce). Anyone
can verify H(value || nonce) = C.
Used in commit-reveal voting, randomness generation, and submarine sends.
Every transaction on every blockchain is authenticated via digital signatures. Address derivation, transaction validation, block hashing, Merkle root computation — cryptography is not a module of the blockchain; it is the substrate.
Keccak-256(public_key).
000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f
demonstrates SHA-256 output.
| Tool | Purpose |
|---|---|
openssl |
CLI for hashing, key generation, and signature operations |
hashlib (Python) |
Compute SHA-256, Keccak-256 |
eth-keys (Python) |
Ethereum key and signature operations |
noble-secp256k1 (JS) |
Pure JS secp256k1 implementation |
libsodium |
Modern cryptographic library (Ed25519, X25519) |
| CyberChef | Web-based tool for encoding/hashing experiments |
SHA256(SHA256(x)))?
What specific attack does this mitigate?
"blockchain". Change one character and compute
again. Observe the avalanche effect.
eth-keys library. Derive the Ethereum address
from the public key.
k (use synthetic data), recover the
private key algebraically.
Build a simple digital identity system. A user
generates a key pair, signs a "profile" (name, timestamp), and publishes
the signature and public key. A verifier takes the published data and
confirms: (a) the profile was signed by the holder of the corresponding
private key, and (b) the profile has not been tampered with. Implement
in Python using eth_keys and eth_abi for
encoding.
Hash-Linked Lists (Blockchain as a Data Structure)
The blockchain is, at its most fundamental, a singly-linked list where each node contains the hash of the previous node:
Block N: { data*N, hash(Block N-1) }
Block N-1: { data*{N-1}, hash(Block N-2) }
This creates tamper evidence: modifying any historical block changes its hash, which invalidates every subsequent block's back-pointer. Integrity verification requires only the latest block's hash.
Merkle Trees
A Merkle tree (hash tree) is a binary tree where:
Key property: Merkle proofs allow O(log n) verification that a specific data element is included in the dataset, given only the root hash and a logarithmic number of sibling hashes (the "proof path").
Patricia/Merkle Patricia Tries (MPT)
Ethereum's state is stored in a Modified Merkle Patricia Trie — a combination of:
This allows: (1) O(log n) key-value lookups, (2) a single hash (state root) that commits to the entire global state, (3) efficient state proofs.
Bloom Filters
A probabilistic data structure that answers set membership queries with:
Ethereum uses Bloom filters in transaction receipts to enable efficient log filtering without scanning all logs.
Merkle trees enable light clients — nodes that verify blockchain data without downloading the full chain. Patricia tries enable Ethereum's stateful model. Without these structures, blockchains would require every participant to store and process the complete history, making the system impractical.
| Tool | Purpose |
|---|---|
merkletreejs (JS) |
Merkle tree construction and proof generation |
py-merkle-tree |
Python Merkle tree library |
ethereum/go-ethereum (Geth) |
Reference implementation of Ethereum's MPT |
rlp (Python) |
RLP encoding used in Ethereum's trie |
0x00/0x01 byte to leaf/internal nodes
prevent it?
Build a verifiable file integrity system. Given a directory of files, compute a Merkle tree over the file hashes. Store the root hash. Later, given a single file and a Merkle proof, verify that the file was part of the original directory without needing the other files. This is exactly how SPV works in Bitcoin.
Distributed Systems Fundamentals
A distributed system is a collection of independent computers that appear as a single system to users. The fundamental challenges:
The CAP Theorem (Brewer's Theorem)
For any distributed data store, you can guarantee at most two of:
Since network partitions are inevitable in real systems, the practical choice is between CP (consistent under partition, but some requests may be rejected) and AP (available under partition, but data may be inconsistent).
Blockchains make a nuanced choice: they provide eventual consistency (AP-leaning) with strong guarantees after sufficient confirmations.
Failure Models
2f + 1 total nodes to tolerate
f crash failures.
3f + 1 total nodes to tolerate f Byzantine
failures.
State Machine Replication (SMR)
The core abstraction: model each node as a deterministic state machine. If all nodes start in the same state and process the same sequence of inputs, they arrive at the same state. The consensus problem reduces to: agree on a total ordering of inputs.
Byzantine Fault Tolerance (BFT)
n ≥ 3f + 1 nodes and
f + 1 communication rounds.
n = 3f + 1 with O(n²) message complexity. Uses a
three-phase protocol: pre-prepare → prepare → commit. Includes
view-change protocol for leader failure.
FLP Impossibility
Fischer, Lynch, and Paterson (1985) proved that no deterministic consensus protocol can guarantee both safety and liveness in an asynchronous system with even one crash failure. Practical systems work around this via:
Blockchain consensus is a specific instance of Byzantine fault-tolerant state machine replication. Without understanding SMR, BFT bounds, and the CAP theorem, you cannot reason about why blockchains make the design trade-offs they do — why Bitcoin has 10-minute blocks, why Ethereum needs finality gadgets, or why "fast finality" chains sacrifice some liveness guarantees.
| Tool | Purpose |
|---|---|
| Tendermint Core | BFT consensus engine |
| CometBFT | Fork of Tendermint, used in Cosmos SDK |
| Raft (etcd) | Crash fault-tolerant consensus (non-Byzantine, for comparison) |
| Jepsen | Distributed systems testing framework |
3f + 1 nodes,
while crash fault tolerance requires only 2f + 1? Derive
this from first principles.
Build a simplified Byzantine fault-tolerant replicated log. Implement a 4-node system where:
Why Game Theory in Blockchain
In traditional systems, you trust the server (or the operator). In open blockchain networks, participants are anonymous and potentially adversarial. You cannot rely on trust — you must design systems where rational self-interest produces correct system behavior.
Core Concepts
Nash Equilibrium: A strategy profile where no player can improve their payoff by unilaterally changing their strategy. In blockchain: the protocol rules should form a Nash equilibrium where following the protocol is the best response to others following the protocol.
Dominant Strategy: A strategy that is optimal regardless of what others do. Example: in Bitcoin, mining on the longest chain is (approximately) a dominant strategy — it maximizes expected reward regardless of what other miners do.
Mechanism Design (Reverse Game Theory): You design the rules of the game so that self-interested agents produce a desired outcome. Blockchain protocol design is mechanism design: you choose the reward structure, penalty structure, and information environment to align incentives.
Incentive Compatibility: A mechanism is incentive-compatible if every participant's best strategy is to act truthfully (reveal true preferences, follow the protocol). Example: Ethereum staking is designed to be incentive-compatible — attesting honestly yields higher expected returns than any deviation.
Schelling Points (Focal Points): In a coordination game with multiple equilibria, a Schelling point is the equilibrium that players converge on without communication. Example: in a hard fork, the "legitimate" chain is a Schelling point — participants coordinate on it because they expect others to.
Tragedy of the Commons: When individuals acting in self-interest deplete a shared resource. Relevant to: transaction fee markets (block space as a shared resource), validator set management.
Griefing Factor: The ratio of cost inflicted on a victim to the cost borne by the attacker. In good mechanism design, the griefing factor should be low (attacking is expensive relative to the harm caused).
Every design decision in a blockchain protocol is a game-theoretic statement. Transaction fee mechanisms, validator rewards, slashing conditions, governance voting — all are games played by rational (and sometimes irrational) agents. If the incentives are wrong, the protocol breaks — not because of a bug in code, but because of a bug in the game.
| Tool | Purpose |
|---|---|
| Gambit | Game theory analysis software |
| Nashpy (Python) | Compute Nash equilibria |
| Agent-based modeling (Mesa, Python) | Simulate multi-agent strategic behavior |
| Cadence (cadCAD) | Python framework for crypto-economic modeling |
Build a crypto-economic simulation of the "nothing at stake" problem. Simulate a proof-of-stake chain where validators can vote on multiple forks at zero cost. Show that rational validators always vote on every fork (since it costs nothing and increases expected reward). Then introduce slashing conditions: validators who vote on conflicting forks lose their stake. Show that slashing changes the Nash equilibrium so that honest single-chain voting becomes the dominant strategy.
Build a system where users can register data entries (e.g., document hashes) into a Merkle tree. The system publishes the Merkle root. Users can later prove that their data was registered by presenting a Merkle proof. Add signature-based access control: only the original registrant (verified by ECDSA signature) can update their entry.
Build a simple P2P file sharing system where nodes can advertise files they have and request files from peers. Use gossip-based discovery and direct TCP transfers. Handle: peer joining, peer leaving, file chunk verification (using hashes from a Merkle tree of file chunks).
Build an interactive simulator that visualizes the Byzantine generals
problem. Allow the user to set: number of generals, number of traitors,
and message strategy for traitors. Show step-by-step message passing and
final agreement. Demonstrate the 3f + 1 threshold
empirically.
Build a simplified blockchain that implements the data structure and cryptographic foundations — without consensus or networking:
This is not yet a working blockchain (no consensus, no networking) — it is the data layer that all subsequent stages will build upon.
| Mistake | Reality |
|---|---|
| "SHA-256 encrypts data" | Hash functions are one-way; they do not encrypt. There is no key and no decryption. |
| "Blockchain is immutable" | Blockchain is tamper-evident, not tamper-proof. If an attacker gains sufficient consensus power, they can rewrite history. |
| "More nodes = more secure" | Security depends on the distribution of stake/hashrate, not the raw number of nodes. 10,000 nodes run by one entity provide less security than 100 independent nodes. |
| "Consensus means everyone agrees" | Consensus means a sufficient supermajority agrees. Minority nodes may disagree (leading to forks). |
| "P2P means no structure" | Most P2P networks have significant structure: routing tables, peer scoring, topology management. |
| "CAP theorem means you pick two and lose one" | The trade-off is nuanced and depends on the type of partition and the required consistency model. |
At this stage, develop the following security mindset:
os.urandom() or /dev/urandom, never
random.random().
Books
Papers
Documentation & Online Resources
Technical Blogs