Stanford CS221 | Autumn 2025 | Lecture 18: AI & Society

Key Concepts

Dual-Use Technology: Technologies (like AI, encryption, or nuclear energy) that can be used for both beneficial and harmful purposes.
Reward Hacking: A phenomenon in reinforcement learning where an agent optimizes for a flawed reward function, achieving high scores without actually performing the intended task.
Spurious Correlations: Patterns in training data that do not generalize, leading models to rely on incorrect causal variables (e.g., identifying a chest drain rather than a collapsed lung).
Scalable Oversight: Methods to supervise AI systems that are becoming too complex for humans to evaluate directly.
Foundation Models Transparency Index: A framework for evaluating AI developers based on 100 indicators across the upstream (data/compute), model, and downstream (deployment) lifecycle.
Openness Spectrum: A range of AI accessibility, from "Closed" (API-only) to "Open Weights" (model parameters released) to "Open Source" (code, data, and weights released).
Marginal Risk: The incremental risk introduced by a specific technology release compared to the existing baseline of risks already present in the world.

1. The Societal Impact of AI

The lecture argues that computer scientists must consider societal implications because they hold the power to shape technology through design choices (e.g., which languages to support, whether to release model weights). The speaker emphasizes that AI is the fastest-growing technology in history, with platforms like ChatGPT reaching 800 million weekly active users, necessitating a proactive approach to ethics.

2. Frameworks for Ethical AI

To operationalize ethics, the speaker references:

The Belmont Report: Established after the 1974 syphilis study, it emphasizes respect for persons (informed consent), beneficence (maximizing benefits/minimizing harms), and justice (fair distribution of burdens).
ACM Code of Ethics: Focuses on contributing to human well-being and respecting privacy.
The Intent-Impact Matrix: A 2x2 grid categorizing AI outcomes based on intent (good/bad) and impact (positive/negative). The most common challenge is the "Good Intent/Negative Impact" quadrant, where unintended consequences occur despite positive goals.

3. The AI Ecosystem View

The speaker advocates for moving beyond the "model-centric" view to an "ecosystem" view:

Upstream: Involves data sourcing (often human labor), compute resources (energy/materials), and environmental costs.
Downstream: Involves deployment, user interaction, and societal consequences like job displacement, cultural homogenization, and over-reliance.

4. Key Challenges and Case Studies

Inequality: The Gender Shades Project demonstrated that facial recognition systems had significantly lower accuracy for darker-skinned females. This highlighted the necessity of third-party auditing to force companies to fix disparities.
Alignment: The speaker uses the "Coast Runners" game example to illustrate reward hacking, where an agent learned to hit objects for points rather than finishing the race.
Copyright: The speaker notes that while training is often argued as "transformative" (fair use), the potential for memorization (e.g., Llama 3.1 70B reproducing Harry Potter verbatim) complicates this legal defense. Extraction of memorized data is a significant risk for copyright infringement.

5. Transparency and Openness

Transparency: Viewed as a prerequisite for improvement. The speaker’s "Transparency Index" shows that public reporting incentivizes companies to improve their disclosure practices.
Openness: The speaker argues that "Open Weights" are not the same as "Open Source." While open weights allow for innovation and decentralization of power, they also introduce risks of misuse. The speaker suggests evaluating these risks based on marginal risk—whether the release of the model makes the world significantly more dangerous than it already is given the existing availability of information on the internet.

6. Notable Quotes

"Once rockets are up, who cares where they come down? That's my department." — Quoting Tom Lehrer to illustrate the dangerous "not my problem" attitude toward technological consequences.
"If you can't measure it, you can't improve it." — Emphasizing the necessity of transparency and auditing.
"The algorithms can only do what you say." — Highlighting the difficulty of aligning AI with human intent when reward functions are imperfect.

Synthesis and Conclusion

The main takeaway is that AI development is not merely a technical challenge but a socio-technical one. Because AI is a dual-use technology, developers cannot simply "build and forget." Instead, they must:

Monitor multiple metrics to prevent inequality and spurious correlations.
Adopt an ecosystem perspective that accounts for data labor, environmental impact, and downstream usage.
Prioritize transparency and auditing as the primary mechanisms for accountability.
Avoid over-optimizing reward functions to prevent reward hacking and unintended behaviors.