Tests of Hypotheses

Null and alternative hypotheses, Type I/II errors, P-values, statistical power, and tests for means, proportions, variances, and Goodness-of-Fit.
While estimation focuses on finding the value of a population parameter, hypothesis testing focuses on making decisions about a population parameter based on sample data. An engineer might ask: "Does this new steel alloy have a mean tensile strength greater than 400 MPa?" or "Is the variance in asphalt thickness less than 5 mm²?" Hypothesis testing provides a formal, objective framework to answer these yes-or-no questions.

The Framework of Hypothesis Testing

The formal steps required to set up and evaluate a statistical test.

1. Null Hypothesis (H0H_0)

The statement of the status quo, no effect, or no difference. It always contains an equality sign (=,,=, \le, \ge). We assume H0H_0 is true until the sample data provides overwhelming evidence to the contrary.

Example: H0:μ400H_0: \mu \le 400 MPa (The new alloy is no stronger than the old one).

2. Alternative Hypothesis (H1H_1 or HaH_a)

The statement we are trying to prove. It contradicts H0H_0 and never contains an equality sign (<,>,<, >, \neq). If the sample data strongly supports H1H_1, we "reject H0H_0."

Example: H1:μ>400H_1: \mu > 400 MPa (The new alloy is stronger).

3. Test Statistic

A standardized value calculated from the sample data (e.g., a Z-score, t-score, or χ2\chi^2 value) assuming H0H_0 is true. It measures how far our sample result is from the null hypothesis value, expressed in units of standard error.

4. P-Value

The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample, assuming the null hypothesis is true.
  • A very small P-value (typically 0.05\le 0.05) indicates the observed data is highly unlikely under H0H_0, leading us to reject H0H_0.
  • A large P-value indicates the data is consistent with H0H_0, so we "fail to reject H0H_0."

5. Significance Level (α\alpha)

The predetermined threshold for rejecting H0H_0. It is the maximum allowable probability of making a Type I Error. Common values are 0.05 (5%), 0.01 (1%), or 0.10 (10%).

Decision Rule: If P-value α\le \alpha, reject H0H_0. If P-value >α> \alpha, fail to reject H0H_0.

Errors in Decision Making and Statistical Power

The risks inherent in statistical inference.
Because we rely on partial information (a sample), we can make mistakes.

Type I Error (α\alpha)

Rejecting a true Null Hypothesis (a "false positive"). You conclude the new alloy is stronger when it actually isn't. The probability of a Type I error is precisely the significance level α\alpha.

Type II Error (β\beta)

Failing to reject a false Null Hypothesis (a "false negative"). You conclude the new alloy is no better, but it actually is stronger.

Statistical Power (1β1 - \beta)

The probability of correctly rejecting a false Null Hypothesis. A highly powerful test is very likely to detect a real difference if one exists.
  • Power increases as the true difference (effect size) increases.
  • Power increases as the significance level α\alpha increases (but this raises the risk of a Type I error).
  • Power increases as the sample size nn increases. Sample Size Determination: Engineers often calculate the minimum sample size needed to achieve a specific power (e.g., 80%) before running an expensive test.

Common Hypothesis Tests

1. Tests for a Single Population Mean (μ\mu)

Testing claims about the center of a population.
  • Z-Test (Variance Known): Rarely used in practice. Assumes population variance σ2\sigma^2 is known. Test statistic: Z=xˉμ0σ/nZ = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}.
  • t-Test (Variance Unknown): The standard test. Uses the sample standard deviation ss. Test statistic: t=xˉμ0s/nt = \frac{\bar{x} - \mu_0}{s/\sqrt{n}} with df=n1df = n-1.

2. Tests for Two Population Means (μ1μ2\mu_1 - \mu_2)

Comparing two different groups (e.g., compressive strength of Mix A vs. Mix B).
  • Independent Samples (Pooled t-Test): Assumes the two populations have equal (but unknown) variances. The sample variances are pooled to estimate a single standard error.
  • Independent Samples (Welch's t-Test): Does not assume equal variances. More robust and generally preferred.
  • Paired t-Test (Dependent Samples): Used when observations are naturally paired or matched (e.g., measuring the stiffness of the exact same beam before and after a retrofitting procedure). The test is performed on the differences between paired values, treating them as a single sample.

3. Tests for a Single Proportion (π\pi)

Testing categorical outcomes (e.g., percentage of defective items).
Uses the normal approximation (Z-test) if nπ05n\pi_0 \ge 5 and n(1π0)5n(1-\pi_0) \ge 5. Test statistic: Z=pπ0π0(1π0)/nZ = \frac{p - \pi_0}{\sqrt{\pi_0(1-\pi_0)/n}}.

4. Tests for a Single Variance (σ2\sigma^2)

Testing claims about the variability or consistency of a process.
Uses the Chi-square (χ2\chi^2) distribution. Highly sensitive to departures from normality in the population. Test statistic: χ2=(n1)s2σ02\chi^2 = \frac{(n-1)s^2}{\sigma_0^2}.

5. Goodness-of-Fit Tests

Checking if data follows a specific theoretical distribution.

Chi-Square Goodness-of-Fit Test

Used to determine whether a sample follows an expected probability distribution (e.g., "Is the arrival of cars at this intersection truly Poisson distributed?" or "Are these soil samples normally distributed?").
  • H0H_0: The data follows the specified distribution.
  • H1H_1: The data does not follow the specified distribution.
  • Test Statistic: χ2=(OiEi)2Ei\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}, where OiO_i are observed frequencies and EiE_i are expected frequencies under H0H_0.
  • A large χ2\chi^2 value means the observed data deviates significantly from what was expected, leading to rejection of H0H_0.

The Connection Between CIs and Hypothesis Tests

There is a direct, mathematical duality between Confidence Intervals and two-sided Hypothesis Tests. If a 95% CI for the mean μ\mu is [390, 410], then a two-sided hypothesis test (with α=0.05\alpha = 0.05) will:
  • Fail to reject H0:μ=400H_0: \mu = 400 (because 400 is inside the interval).
  • Reject H0:μ=380H_0: \mu = 380 (because 380 is outside the interval).

Hypothesis Testing Simulator

Test Statistic (Z)1.96
Conclusion
Fail to Reject H₀
The test statistic is within the acceptance region.
-Zα/2Zα/2Z = 1.96Standard Normal Distribution
Key Takeaways
  • H0H_0 and H1H_1: Formulate mutually exclusive hypotheses; H0H_0 contains equality.
  • P-value: The probability of the sample data assuming H0H_0 is true. Small P-values (typically α\le \alpha) trigger rejection of H0H_0.
  • Type I Error (α\alpha): False positive (rejecting true H0H_0).
  • Type II Error (β\beta): False negative (failing to reject false H0H_0).
  • Power (1β1-\beta): The probability of correctly identifying a real effect. Highly dependent on sample size.
  • Goodness-of-Fit (χ2\chi^2): Tests whether observed categorical data matches an expected distribution.
  • Duality: A 95% Confidence Interval contains all values of the parameter that would not be rejected by a two-sided test at α=0.05\alpha = 0.05.