Stanford AA228V I Validation of Safety Critical Systems I System Modeling

System Modeling: Lecture 2 Summary

Key Concepts:

System Modeling
Model Class (Probabilistic Models, Discrete Distributions, Continuous Distributions)
Probability Mass Function (PMF)
Probability Density Function (PDF)
Model Parameters (Theta)
Expressiveness of a Model
Gaussian Mixture Model
Functional Transformation of Distributions
Normalizing Flows
Joint Distributions (Multivariate Distributions)
Conditional Distributions
Maximum Likelihood Estimation (MLE)
Least Squares Objective
Bayesian Parameter Learning
Prior Distribution
Posterior Distribution

1. Introduction and Context

The lecture focuses on system modeling, a crucial input component for validation, as outlined in Chapter 2 of the textbook. The goal is to model systems for simulation and offline validation, allowing for early detection of potential issues before real-world deployment. The lecture is self-contained, focusing on various ways to model a system. The rest of the course will provide the system to validate.

2. General Thoughts on Models

Models are diverse: They can range from physical models like the Neutral Buoyancy Lab at NASA's Johnson Space Center (a giant swimming pool simulating spacewalk conditions) to computational models.
Complexity varies: Models can be simple (e.g., two linear equations for aircraft dynamics) or highly complex (e.g., X-Plane flight simulator). The key is to choose a model that is "just complex enough to capture what matters."
White Box vs. Black Box: White box models expose the equations and processes, while black box models only provide input-output relationships. Models can also exist in between.
Key Challenges in Model Creation:
- Expressiveness vs. Simplicity: The model must capture all possible scenarios but avoid unnecessary complexity. Capturing all possible scenarios is difficult, especially for complex systems like self-driving cars. Runtime monitoring can help detect scenarios not captured in the model.
- Data/Knowledge Acquisition: Models rely on real-world data or expert knowledge, which can be difficult to obtain.
- Model Creation: Often involves optimizing for some objective.
"All models are wrong, but some are useful." - George Box: Models are simplifications of reality, but they can still provide valuable insights for decision-making.

3. Selecting a Model Class: Probabilistic Models

Focus on Probabilistic Models: The lecture emphasizes probabilistic models, where probability quantifies the likelihood of an outcome relative to all other possible outcomes.
Probability Distributions: Used to describe probabilities, serving as different model classes.
- Discrete Distributions: Operate on variables with a discrete number of outcomes (e.g., rolling a dice). Represented by Probability Mass Functions (PMF). PMF values must be between 0 and 1, and the sum of probabilities for all possible events must equal 1.
- Continuous Distributions: Defined over variables with a continuous set of possible outcomes (e.g., height of a student). Represented by Probability Density Functions (PDF). PDF values must be greater than or equal to 0, and the integral of all possible values must equal 1. The probability of a single value is 0; instead, we consider the probability of a variable lying within a continuous range.
Model Parameters (Theta): Probability distributions are often represented using a set of parameters (theta). For example, a Gaussian distribution is defined by its mean (mu) and standard deviation (sigma).
Expressiveness of Model Class: The model class must be expressive enough to fit the data well. A Gaussian distribution may not be suitable for multimodal data.

4. Increasing Model Complexity

Gaussian Mixture Model: A weighted combination of simpler distributions (e.g., two Gaussian distributions) to model more complex data. The weights must sum up to 1 to maintain a valid probability distribution.
Functional Transformation of Distributions: Applying a function to a simple distribution to create a more complex one. If the function is invertible and differentiable, the density of the resulting random variable can be calculated using a specific formula.
Normalizing Flows: Use a series of parameterized, differentiable, and invertible functions to transform simple distributions into complex ones. They learn the transformation.
Generative Models: Models that take samples from a known distribution and transform them into a more complex distribution (e.g., Generative Adversarial Networks, Variational Autoencoders, Diffusion Models).

5. Joint and Conditional Distributions

Joint Distributions (Multivariate Distributions): Probability distributions over multiple variables, representing the likelihood of multiple outcomes occurring simultaneously.
Multivariate Gaussian Distribution: An extension of the Gaussian distribution to multiple dimensions, defined by a mean vector and a covariance matrix. The covariance matrix describes how much each variable varies with itself and with other variables.
Independent Variables: If two variables are independent, their joint distribution is the product of their individual probabilities.
Conditional Distributions: Distributions over a single variable or set of variables given the values of one or more other variables.
Conditional Gaussian Distribution: A normal distribution where the parameters (e.g., mean) depend on the value of another variable.

6. Parameter Estimation: Maximum Likelihood Estimation (MLE)

Goal: Find the parameters (theta) that maximize the likelihood of observing the given data set.
Maximum Likelihood Estimation (MLE):
- Find the parameters that make the observed data most likely.
- Assume independent and identically distributed (i.i.d.) data points.
- Maximize the product of the probabilities of each observation given the parameters.
- Take the logarithm to convert the product into a sum for numerical stability.
Optimization Algorithms: Used to find the optimal parameters (e.g., gradient descent, ADAM, genetic algorithms).
Code Example: Demonstrates how to implement MLE using Julia and the Optim package.

7. Derivation of Least Squares Objective

Assumptions: Data comes from a conditional Gaussian model where the inputs (Xs) determine the outputs (Ys) by passing them through some function and adding Gaussian noise with constant variance.
Derivation: Starting with the MLE objective for a conditional Gaussian with constant variance, the derivation shows that maximizing the likelihood is equivalent to minimizing the least squares objective.
Implication: Applying least squares implicitly assumes that the data follows a conditional Gaussian distribution.

8. Parameter Estimation: Bayesian Parameter Learning

Motivation: Maintains a distribution over all possible parameters instead of picking just one.
Bayes' Rule: Used to calculate the posterior distribution over parameters given the data.
Prior Distribution (P(theta)): Represents our prior belief about the parameters before observing any data.
Posterior Distribution (P(theta|D)): Represents our updated belief about the parameters after observing the data.
Challenges: The summation or integral in the denominator of Bayes' rule can be difficult or impossible to compute analytically.
Probabilistic Programming: Allows sampling from the posterior distribution even when the denominator cannot be computed directly, as long as the numerator can be computed.

9. Conclusion

The lecture provides a comprehensive overview of system modeling, focusing on probabilistic models and parameter estimation techniques. It covers various model classes, methods for increasing model complexity, and the derivation of the least squares objective. The lecture also introduces Bayesian parameter learning as an alternative to MLE, highlighting its advantages and challenges. The lecture sets the stage for the rest of the course, which will focus on validation algorithms.