Discrete Probability Distributions
Expected value, mathematical expectation, and common discrete distributions: Binomial, Poisson, Geometric, Negative Binomial, and Hypergeometric.
When analyzing data, we often deal with variables whose outcomes are determined by chance. A random variable is a numerical description of the outcome of an experiment. A discrete random variable can take on a countable number of distinct values (e.g., the number of potholes on a 10km stretch of road, or the number of defective bricks in a pallet).
Probability Mass Functions and Mathematical Expectation
The foundational math behind discrete random variables.
Probability Mass Function (PMF), or
A function that assigns a probability to each possible value of a discrete random variable. It must satisfy two conditions:
- for all .
- .
Cumulative Distribution Function (CDF),
The probability that the random variable will take a value less than or equal to .
Mathematical Expectation
The expected value represents the theoretical mean of the random variable.
Expected Value (Mean), or
The long-run average value of the random variable over infinitely many trials. It is the center of the probability distribution.
Variance, or
A measure of the dispersion or spread of the probability distribution around the mean.
Alternatively, it can be calculated more easily using the computational formula:
Common Discrete Distributions in Engineering
Specific models used to describe common engineering scenarios.
The Binomial Distribution
Models the number of successes in a fixed number of independent trials.
Binomial Distribution
Applicable when:
- There are a fixed number of trials ().
- Each trial has only two possible outcomes (Success or Failure).
- The probability of success () remains constant for each trial.
- The trials are mutually independent.
The probability of exactly successes in trials is:
- Mean:
- Variance:
The Poisson Distribution
Models the number of events occurring in a fixed interval of time or space.
Poisson Distribution
Used for rare events where the exact number of trials is effectively infinite and is very small, but the average rate of occurrence () is known. Examples include the number of traffic accidents per month at a given intersection, or the number of flaws in a 100m reel of fiber optic cable.
The probability of exactly events occurring in a given interval is:
- Mean:
- Variance:
The Geometric and Negative Binomial Distributions
Models the number of trials needed to achieve a specific number of successes.
Geometric Distribution
Models the number of independent trials needed to get the first success. (e.g., How many times must we test a newly designed joint until we observe the first failure, assuming a constant failure probability ?)
- Mean:
- Variance:
Negative Binomial Distribution
A generalization of the geometric distribution. It models the number of independent trials needed to get exactly successes.
The Hypergeometric Distribution
Models sampling without replacement.
Hypergeometric Distribution
Unlike the Binomial distribution where is constant (sampling with replacement), the Hypergeometric distribution is used when sampling without replacement from a finite population of size , containing exactly successes. (e.g., Selecting 5 concrete cylinders from a batch of 50, where 3 are known to be defective).
- Mean:
Key Takeaways
- Random Variables: Numerical values assigned to experimental outcomes.
- Expected Value (): The long-run average of a discrete distribution.
- Binomial: Used for independent trials with exactly two outcomes (success/failure) and constant probability .
- Poisson: Used for modeling the number of rare events occurring within a continuous interval (time, area, volume).
- Geometric/Negative Binomial: Focuses on the number of trials needed to achieve a specified number of successes.
- Hypergeometric: Used for finite populations when sampling without replacement (probability changes trial-to-trial).