Sampling Distributions
When engineers test a sample of materials (e.g., 5 concrete cylinders), the sample mean () is just an estimate of the true population mean (). If we took a different sample of 5 cylinders, we would get a slightly different .
Because the sample mean changes from sample to sample, the sample mean itself is a random variable and has its own probability distribution. This distribution is called a sampling distribution. Understanding sampling distributions is the bridge between probability theory and statistical inference.
Random Sampling Fundamentals
Simple Random Sample (SRS)
A sample of size drawn from a population such that every possible sample of size has an equal probability of being selected. When elements are drawn independently from a population, the resulting random variables are independent and identically distributed (i.i.d.).
The Distribution of the Sample Mean
Expected Value of the Sample Mean
If you draw infinitely many random samples of size from a population with mean , the average of all those sample means will exactly equal the population mean. In statistical terms, is an unbiased estimator of .
Standard Error of the Mean
The standard deviation of the sampling distribution of . It measures how much the sample mean is expected to vary from the true population mean. As the sample size () increases, the standard error decreases, meaning our estimates become more precise.
The Central Limit Theorem (CLT)
What shape does the sampling distribution of take?
Central Limit Theorem
If you draw random samples of size from any population distribution (even one that is highly skewed or non-normal) with mean and standard deviation , the sampling distribution of the sample mean () will approach a Normal Distribution as the sample size becomes large.
- If the original population is already normal, the sampling distribution of is exactly normal for any sample size.
- If the original population is not normal, the sampling distribution becomes approximately normal when .
The Distribution of the Sample Variance
Distribution of the Sample Variance
If is the variance of a random sample of size drawn from a normal population with variance , the quantity follows a Chi-square () distribution with degrees of freedom. This forms the basis for confidence intervals and tests concerning the population variance.
Distributions for Statistical Inference
1. Student's t-Distribution
In practice, if we don't know the true mean , we almost certainly don't know the true standard deviation . Instead, we must estimate using the sample standard deviation (). This introduces extra uncertainty.
The t-Distribution
When standardizing a sample mean using instead of , the resulting variable follows a t-distribution, not a standard normal () distribution.
- It is bell-shaped and symmetric around 0, like the Z-distribution.
- It has heavier (fatter) tails than the Z-distribution, reflecting the added uncertainty of estimating with . This means a t-score must be larger than a Z-score to achieve the same level of confidence.
- Its exact shape depends on the Degrees of Freedom (). As sample size () increases, becomes a better estimate of , and the t-distribution approaches the standard normal distribution.
2. The Chi-Square () Distribution
Engineers are often just as concerned with variability as they are with the mean (e.g., ensuring consistent concrete strength). The sample variance () has its own sampling distribution.
The Chi-Square Distribution
If random samples of size are drawn from a normal population with variance , the statistic relating the sample variance to the population variance follows a Chi-square distribution with .
- Unlike the normal or t-distributions, the Chi-square distribution is strictly positive (because variance is squared) and is heavily right-skewed.
- As degrees of freedom increase, it becomes more symmetric.
3. The F-Distribution
If an engineer wants to determine whether a new concrete mixing method produces more consistent results than the old method, they must compare two sample variances ( and ).
The F-Distribution
If independent random samples are drawn from two normal populations with variances and , the ratio of their sample variances follows an F-distribution.
If we hypothesize that the two population variances are equal (), the formula simplifies to the ratio of the sample variances:
- The F-distribution is right-skewed and defined only for positive values.
- It depends on two sets of degrees of freedom: the numerator () and the denominator ().
- It is the foundational distribution for Analysis of Variance (ANOVA).
Interact with the simulation below to observe the Central Limit Theorem in action.
Engineering Data Analysis
Central Limit Theorem & Sampling Distribution
Number of random items in each sample.
Sampling Statistics
Generate samples to construct the sampling distribution.
Click "+1 Sample" or "Run Auto". As sample size grows, the distribution of sample means approaches normality regardless of the population shape.
Interact with the simulation below to compare the probability density functions of Student's t, Chi-squared, and F distributions under various degrees of freedom.
Engineering Data Analysis • Topic 8
Probability Distribution Shapes
• Observe how as the degrees of freedom increases, the tails of the t-distribution become lighter, and the curve converges directly to the Standard Normal distribution .
- Sampling Distribution: The probability distribution of a statistic (like or ) across many samples.
- Standard Error (): Measures the variability of the sample mean. Larger samples yield smaller standard errors (more precision).
- Central Limit Theorem: The sample mean becomes normally distributed as gets large (), regardless of the population's shape.
- t-Distribution: Used for inferences about the mean () when the population standard deviation () is unknown. Heavy-tailed.
- Chi-Square Distribution: Used for inferences about a single population variance (). Right-skewed.
- F-Distribution: Used for comparing two population variances or conducting ANOVA. Right-skewed, requires two degrees of freedom.