Stanford AA228V I Validation of Safety Critical Systems I Failure Distribution

Key Concepts

Failure Distribution: The probability distribution over all possible failure trajectories.
Trajectory Distribution: The probability distribution over all possible trajectories.
Unnormalized Probability Density: A function that has the same shape as a probability density function but does not integrate to 1.
Rejection Sampling: A method for sampling from an unnormalized probability density by uniformly sampling from a larger space and rejecting samples that fall outside the density.
Proposal Distribution: A distribution that is easy to sample from and is used to generate candidate samples in rejection sampling and Markov Chain Monte Carlo.
Markov Chain Monte Carlo (MCMC): A method for sampling from a probability distribution by constructing a Markov chain whose stationary distribution is the target distribution.
Kernel: A conditional probability distribution that defines the transition probabilities of a Markov chain in MCMC.
Burn-in Period: The initial period of MCMC sampling that is discarded to allow the chain to converge to the target distribution.
Thinning: A technique used in MCMC to reduce the correlation between samples by only keeping every nth sample.
Smoothing: A technique used in MCMC to improve mixing between different modes of the target distribution by adding a small amount of probability to regions of low probability.
Robustness: A measure of how far a system is from failure.

Failure Distribution

The goal is to sample from the failure distribution, which represents the probability of different failure trajectories.
The failure distribution is defined as the probability of a trajectory (τ) given that it is a failure (τ ∈ Ψ₀).
Mathematically, the failure distribution is expressed as: P(τ | τ ∈ Ψ₀) = [1{τ ∈ Ψ₀} * P(τ)] / ∫ [1{τ ∈ Ψ₀} * P(τ)] dτ
- 1{τ ∈ Ψ₀} is the indicator function, which returns 1 if τ is a failure and 0 otherwise.
- P(τ) is the nominal trajectory distribution.
- The numerator gives the shape of the failure distribution.
- The denominator is a normalizing constant that ensures the distribution integrates to 1.
Computing the denominator (the integral) is often difficult or impossible.

Visual Example

A simple system where trajectories consist of a single state sampled from a Gaussian distribution.
Failure is defined as the state being less than -1.
The numerator of the failure distribution is the Gaussian distribution truncated at -1.
The denominator is the area under the truncated Gaussian curve.

Unnormalized Probability Density

Since the denominator of the failure distribution is hard to compute, we often work with the unnormalized probability density, denoted as P̄(τ).
P̄(τ) = 1{τ ∈ Ψ₀} * P(τ)
Algorithms exist to draw samples from an unnormalized probability density.
By sampling from the unnormalized density, we can implicitly represent the failure distribution.

Rejection Sampling

Rejection sampling is a method for sampling from an unnormalized probability density.
It uses the analogy of a dartboard:
1. Draw the shape of the unnormalized target density on the dartboard.
2. Throw darts uniformly at the board.
3. Reject darts that land above the target density.
4. The remaining darts are distributed according to the target density.

Mathematical Formulation

Sample τ from a proposal distribution q(τ).
Sample h from a uniform distribution between 0 and c * q(τ), where c is a constant chosen such that c * q(τ) is always greater than or equal to the target density P̄(τ).
Reject the sample if h > P̄(τ).
Keep the τ value of the accepted samples.

q(τ) is a proposal distribution that we can easily sample from (e.g., a uniform distribution or a Gaussian distribution).
c is a constant that scales the proposal distribution to be above the target density.

Algorithm Refinement

The efficiency of rejection sampling depends on the choice of the proposal distribution q(τ) and the constant c.
A good proposal distribution should be close to the target density to minimize the number of rejected samples.
Selecting an appropriate proposal distribution and value for c can be challenging.

Rejection Sampling for Failure Distribution

To apply rejection sampling to the failure distribution:
- Set the target density P̄(τ) to 1{τ ∈ Ψ₀} * P(τ).
- Choose a proposal distribution q(τ).
- Select a value for c such that 1{τ ∈ Ψ₀} * P(τ) ≤ c * q(τ) for all τ.
If we choose q(τ) = P(τ) (the nominal trajectory distribution) and c = 1, the algorithm simplifies to:
1. Sample τ from the nominal trajectory distribution.
2. Accept the sample if it is a failure (τ ∈ Ψ₀).
This is equivalent to direct sampling from the nominal distribution and keeping only the failures.
This approach can be inefficient for systems with rare failure events.

Markov Chain Monte Carlo (MCMC)

MCMC is a method for sampling from a probability distribution by constructing a Markov chain whose stationary distribution is the target distribution.
Instead of drawing samples independently, MCMC maintains a chain of samples where each sample depends on the previous one.

Algorithm

Start with an initial sample τ₀.
For each iteration:
- Sample a new proposed sample τ' from a kernel g(τ' | τ), where τ is the current sample.
- Accept the proposed sample with probability A(τ', τ) = min(1, [P̄(τ') / P̄(τ)] * [g(τ | τ') / g(τ' | τ)]).
- If the proposed sample is accepted, set τ = τ'. Otherwise, keep the current sample.
The samples in the chain will be distributed according to the target distribution.

The kernel g(τ' | τ) is a conditional probability distribution that defines the transition probabilities of the Markov chain. A common choice for the kernel is a Gaussian distribution centered at the current sample.
The acceptance probability A(τ', τ) ensures that the chain converges to the target distribution.

Symmetric Kernel

If the kernel is symmetric (g(τ' | τ) = g(τ | τ')), the acceptance probability simplifies to A(τ', τ) = min(1, P̄(τ') / P̄(τ)).
If the proposed sample has a higher target density than the current sample, it is always accepted.
If the proposed sample has a lower target density, it is accepted with a probability proportional to the ratio of the target densities.

Practical Considerations

MCMC is guaranteed to converge to the target distribution in the limit of infinite samples. However, in practice, we can only draw a finite number of samples.
To improve the performance of MCMC with a finite number of samples, we can use the following techniques:
- Burn-in Period: Discard the initial samples of the chain to allow it to converge to the target distribution.
- Thinning: Reduce the correlation between samples by only keeping every nth sample.
- Smoothing: Improve mixing between different modes of the target distribution by adding a small amount of probability to regions of low probability.

Smoothing

Smoothing involves modifying the unnormalized density to allow the algorithm to move between different failure modes.
Define a distance function δ(τ) that is 0 if τ is a failure and greater than 0 otherwise.
Replace the indicator function in the unnormalized density with a smooth function that assigns non-zero probability to successes that are close to failure.
One way to do this is to replace the indicator function with a Gaussian distribution centered at 0 with a small variance ε.
The smoothed unnormalized density is then given by P̄ₑ(τ) = [1 / √(2π ε)] * exp(-δ(τ)² / (2ε)) * P(τ).
The parameter ε controls the amount of smoothing. As ε approaches 0, the smoothed density approaches the original unnormalized density. As ε approaches infinity, the smoothed density approaches the nominal trajectory distribution.

Conclusion

The lecture discusses methods for sampling from the failure distribution, which is crucial for understanding and mitigating system failures. While directly computing the failure distribution is often intractable, techniques like rejection sampling and Markov Chain Monte Carlo provide ways to draw samples that represent this distribution. Rejection sampling involves sampling from a proposal distribution and rejecting samples that don't align with the target density, while MCMC constructs a Markov chain that converges to the target distribution. Practical considerations like burn-in periods, thinning, and smoothing are essential for improving the efficiency and accuracy of these methods in real-world applications.