Stanford CS221 | Autumn 2025 | Lecture 10: Games I

Key Concepts

Game Tree: A tree structure representing the state space of a game, where nodes are states and edges are actions.
Two-Player Zero-Sum Game: A game where the agent’s utility is the negative of the opponent’s utility (total utility = 0).
Minimax: A decision rule for two-player games where the agent maximizes utility and the opponent minimizes it.
Expectimax: A variation where the agent maximizes utility against an opponent with a known, fixed, or stochastic policy.
Alpha-Beta Pruning: An optimization technique for Minimax that skips branches that cannot influence the final decision.
Evaluation Function: A heuristic used in depth-limited search to estimate the value of a non-terminal state.
Perfect Play: A state where the outcome of a game is known under optimal play by both sides (e.g., "solved" games).

1. Modeling Games

Games are modeled as state-based systems similar to Markov Decision Processes (MDPs), but with the addition of an opponent.

Formal Definition: A game consists of a start state, a terminal test (isEnd), a player function (identifying whose turn it is), and a successors function (mapping actions to states).
Sparse Rewards: Unlike some MDPs, utility in games is typically concentrated at the end of the trajectory, making it difficult to evaluate intermediate states.
Policies: A policy $\pi(s)$ maps states to actions. In stochastic policies, this is a probability distribution over actions.

2. Game Evaluation and Recurrences

The lecture introduces several recursive frameworks to compute the value of a game:

Game Evaluation: Computes the expected utility given fixed policies for both players. It is analogous to policy evaluation in MDPs.
Expectimax: Finds the optimal agent policy against a fixed opponent policy. It uses a max node for the agent and an average (expectation) node for the opponent.
Minimax: Assumes the opponent plays optimally to minimize the agent's utility. It uses max nodes for the agent and min nodes for the opponent.
Expected Minimax: A hybrid approach incorporating probabilistic events (e.g., coin flips) alongside min/max nodes.

3. Optimization Techniques

Because exact Minimax search is exponential ($O(b^m)$ where $b$ is branching factor and $m$ is depth), two methods are used to speed up computation:

Alpha-Beta Pruning (Exact)

Mechanism: Maintains an interval $[\alpha, \beta]$ for each node. $\alpha$ is the lower bound (best for agent), and $\beta$ is the upper bound (best for opponent).
Pruning Rule: If at any point the bounds of a subtree do not overlap with the ancestor's bounds, the entire subtree can be pruned.
Ordering: The efficiency of pruning depends heavily on the order of child exploration. Exploring "better" moves first (decreasing for max, increasing for min) leads to more aggressive pruning.

Evaluation Functions (Approximate)

Depth-Limited Search: Instead of searching to the terminal state, the search stops at a fixed depth $d$.
Heuristics: At depth $d$, an evaluation function (e.g., material count in chess) estimates the state's value.
Trade-off: Deeper searches reduce sensitivity to poor evaluation functions but increase computational cost.

4. Key Arguments and Relationships

Optimality Context: An "optimal" policy is always relative to the opponent's strategy. Minimax provides a "worst-case" guarantee (a lower bound against any opponent), whereas Expectimax is only optimal against a specific, known opponent model.
Relationship Summary:
- $V(\pi_{max}, \pi_{min})$ is the Minimax value.
- If the opponent is weaker than $\pi_{min}$, the agent can achieve higher utility than the Minimax value by using Expectimax.
- If the agent knows the opponent's policy ($\pi_7$), they can outperform the Minimax policy by using an Expectimax policy tailored to $\pi_7$.

5. Synthesis and Conclusion

The lecture establishes that games are a natural extension of MDPs and search problems. While Minimax provides a robust, conservative strategy for unknown opponents, it is often suboptimal if the opponent's behavior can be modeled or predicted. The transition from exact search (Minimax) to approximate search (Depth-limited with evaluation functions) mirrors the transition from search to reinforcement learning, setting the stage for future discussions on learning these evaluation functions directly from data.