Tests of Hypotheses

Tests of Hypotheses

Null and alternative hypotheses, Type I/II errors, P-values, statistical power, and tests for means, proportions, variances, and Goodness-of-Fit.

While estimation focuses on finding the value of a population parameter, hypothesis testing focuses on making decisions about a population parameter based on sample data. An engineer might ask: "Does this new steel alloy have a mean tensile strength greater than 400 MPa?" or "Is the variance in asphalt thickness less than 5 mm²?" Hypothesis testing provides a formal, objective framework to answer these yes-or-no questions.

The Framework of Hypothesis Testing

The formal steps required to set up and evaluate a statistical test.

1. Null Hypothesis ( $H_{0}$ )

The statement of the status quo, no effect, or no difference. It always contains an equality sign ( $=, \le, \ge$ ). We assume $H_0$ is true until the sample data provides overwhelming evidence to the contrary.

Example: $H_0: \mu \le 400$ MPa (The new alloy is no stronger than the old one).

2. Alternative Hypothesis ( $H_{1}$ or $H_{a}$ )

The statement we are trying to prove. It contradicts $H_0$ and never contains an equality sign ( $<, >, \neq$ ). If the sample data strongly supports $H_1$ , we "reject $H_0$ ."

Example: $H_1: \mu > 400$ MPa (The new alloy is stronger).

3. Test Statistic

A standardized value calculated from the sample data (e.g., a Z-score, t-score, or $\chi^2$ value) assuming $H_0$ is true. It measures how far our sample result is from the null hypothesis value, expressed in units of standard error.

4. P-Value

The probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample, assuming the null hypothesis is true.

A very small P-value (typically $\le 0.05$ ) indicates the observed data is highly unlikely under $H_0$ , leading us to reject $H_0$ .
A large P-value indicates the data is consistent with $H_0$ , so we "fail to reject $H_0$ ."

5. Significance Level ( $\alpha$ )

The predetermined threshold for rejecting $H_0$ . It is the maximum allowable probability of making a Type I Error. Common values are 0.05 (5%), 0.01 (1%), or 0.10 (10%).

Decision Rule: If P-value $\le \alpha$ , reject $H_0$ . If P-value $> \alpha$ , fail to reject $H_0$ .

Errors in Decision Making and Statistical Power

The risks inherent in statistical inference.

Because we rely on partial information (a sample), we can make mistakes.

Type I Error ( $\alpha$ )

Rejecting a true Null Hypothesis (a "false positive"). You conclude the new alloy is stronger when it actually isn't. The probability of a Type I error is precisely the significance level $\alpha$ .

Type II Error ( $\beta$ )

Failing to reject a false Null Hypothesis (a "false negative"). You conclude the new alloy is no better, but it actually is stronger.

Statistical Power ( $1 - \beta$ )

The probability of correctly rejecting a false Null Hypothesis. A highly powerful test is very likely to detect a real difference if one exists.

Power increases as the true difference (effect size) increases.
Power increases as the significance level $\alpha$ increases (but this raises the risk of a Type I error).
Power increases as the sample size $n$ increases. Sample Size Determination: Engineers often calculate the minimum sample size needed to achieve a specific power (e.g., 80%) before running an expensive test.

Common Hypothesis Tests

\mu

Testing claims about the center of a population.

Z-Test (Variance Known): Rarely used in practice. Assumes population variance $\sigma^2$ is known. Test statistic: $Z = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}$ .
t-Test (Variance Unknown): The standard test. Uses the sample standard deviation $s$ . Test statistic: $t = \frac{\bar{x} - \mu_0}{s/\sqrt{n}}$ with $df = n-1$ .

\mu_1 - \mu_2

Comparing two different groups (e.g., compressive strength of Mix A vs. Mix B).

Independent Samples (Pooled t-Test): Assumes the two populations have equal (but unknown) variances. The sample variances are pooled to estimate a single standard error.
Independent Samples (Welch's t-Test): Does not assume equal variances. More robust and generally preferred.
Paired t-Test (Dependent Samples): Used when observations are naturally paired or matched (e.g., measuring the stiffness of the exact same beam before and after a retrofitting procedure). The test is performed on the differences between paired values, treating them as a single sample.

\pi

Testing categorical outcomes (e.g., percentage of defective items).

Uses the normal approximation (Z-test) if $n\pi_0 \ge 5$ and $n(1-\pi_0) \ge 5$ . Test statistic: $Z = \frac{p - \pi_0}{\sqrt{\pi_0(1-\pi_0)/n}}$ .

\sigma^2

Testing claims about the variability or consistency of a process.

Uses the Chi-square ( $\chi^2$ ) distribution. Highly sensitive to departures from normality in the population. Test statistic: $\chi^2 = \frac{(n-1)s^2}{\sigma_0^2}$ .

5. Goodness-of-Fit Tests

Checking if data follows a specific theoretical distribution.

Chi-Square Goodness-of-Fit Test

Used to determine whether a sample follows an expected probability distribution (e.g., "Is the arrival of cars at this intersection truly Poisson distributed?" or "Are these soil samples normally distributed?").

$H_0$ : The data follows the specified distribution.
$H_1$ : The data does not follow the specified distribution.
Test Statistic: $\chi^2 = \sum \frac{(O_i - E_i)^2}{E_i}$ , where $O_i$ are observed frequencies and $E_i$ are expected frequencies under $H_0$ .
A large $\chi^2$ value means the observed data deviates significantly from what was expected, leading to rejection of $H_0$ .

The Connection Between CIs and Hypothesis Tests

There is a direct, mathematical duality between Confidence Intervals and two-sided Hypothesis Tests. If a 95% CI for the mean $\mu$ is [390, 410], then a two-sided hypothesis test (with $\alpha = 0.05$ ) will:

Fail to reject $H_0: \mu = 400$ (because 400 is inside the interval).
Reject $H_0: \mu = 380$ (because 380 is outside the interval).

Interact with the simulation below to explore hypothesis testing concepts.

Engineering Data Analysis

Hypothesis Testing Simulator

Test Type

Significance Level (

\alpha

)

Test Statistic (Z)1.96

Conclusion

Fail to Reject H₀

The test statistic falls in the acceptance region. There is insufficient evidence to reject H₀.

Visualize the relationships between the null distribution, critical value, significance level ( $\alpha$ ), Type I/II errors, and p-value by adjusting the sliders in the simulation below.

Engineering Data Analysis • Topic 10

p-Value vs. Significance Level (α) Visualizer

Hypothesis Direction

Significance Level (

\alpha

)0.050

p-Value0.035

Conclusion

Reject Null Hypothesis (H₀)

Since the p-value (0.035) is $\le$ significance level $\alpha$ (0.050), the result is statistically significant.

Key Takeaways

$H_0$ and $H_1$ : Formulate mutually exclusive hypotheses; $H_0$ contains equality.
P-value: The probability of the sample data assuming $H_0$ is true. Small P-values (typically $\le \alpha$ ) trigger rejection of $H_0$ .
Type I Error ( $\alpha$ ): False positive (rejecting true $H_0$ ).
Type II Error ( $\beta$ ): False negative (failing to reject false $H_0$ ).
Power ( $1-\beta$ ): The probability of correctly identifying a real effect. Highly dependent on sample size.
Goodness-of-Fit ( $\chi^2$ ): Tests whether observed categorical data matches an expected distribution.
Duality: A 95% Confidence Interval contains all values of the parameter that would not be rejected by a two-sided test at $\alpha = 0.05$ .

PreviousEstimation - Examples & Applications

Quiz Me

NextTests of Hypotheses - Examples & Applications

The Framework of Hypothesis Testing

1. Null Hypothesis ( $H_{0}$ )

2. Alternative Hypothesis ( $H_{1}$ or $H_{a}$ )

3. Test Statistic

4. P-Value

5. Significance Level ( $\alpha$ )

Errors in Decision Making and Statistical Power

Type I Error ( $\alpha$ )

Type II Error ( $\beta$ )

Statistical Power ( $1 - \beta$ )

Common Hypothesis Tests

1. Tests for a Single Population Mean ( $\mu$ )

2. Tests for Two Population Means ( $\mu_1 - \mu_2$ )

3. Tests for a Single Proportion ( $\pi$ )

4. Tests for a Single Variance ( $\sigma^2$ )

5. Goodness-of-Fit Tests

Chi-Square Goodness-of-Fit Test

The Connection Between CIs and Hypothesis Tests

Engineering Data Analysis

Engineering Data Analysis • Topic 10

Conclusion

Tests of Hypotheses

The Framework of Hypothesis Testing

1. Null Hypothesis (H0H_0H0​)

2. Alternative Hypothesis (H1H_1H1​ or HaH_aHa​)

3. Test Statistic

4. P-Value

5. Significance Level (α\alphaα)

Errors in Decision Making and Statistical Power

Type I Error (α\alphaα)

Type II Error (β\betaβ)

Statistical Power (1−β1 - \beta1−β)

Common Hypothesis Tests

1. Tests for a Single Population Mean (μ\muμ)

2. Tests for Two Population Means (μ1−μ2\mu_1 - \mu_2μ1​−μ2​)

3. Tests for a Single Proportion (π\piπ)

4. Tests for a Single Variance (σ2\sigma^2σ2)

5. Goodness-of-Fit Tests

Chi-Square Goodness-of-Fit Test

The Connection Between CIs and Hypothesis Tests

Engineering Data Analysis

Engineering Data Analysis • Topic 10

Conclusion

1. Null Hypothesis ( $H_{0}$ )

2. Alternative Hypothesis ( $H_{1}$ or $H_{a}$ )

5. Significance Level ( $\alpha$ )

Type I Error ( $\alpha$ )

Type II Error ( $\beta$ )

Statistical Power ( $1 - \beta$ )

1. Tests for a Single Population Mean ( $\mu$ )

2. Tests for Two Population Means ( $\mu_1 - \mu_2$ )

3. Tests for a Single Proportion ( $\pi$ )

4. Tests for a Single Variance ( $\sigma^2$ )