Statistical Hydrology

Applying probability theory to hydrologic events to predict return periods, risk, and frequencies.

Introduction

Hydrologic events (floods, droughts, storms) are stochastic (random) in nature. Statistical Hydrology uses probability theory to analyze historical data and predict the likelihood of future extreme events.

Return Period (TT)

Return Period (Recurrence Interval)

The average time interval between events equal to or exceeding a certain magnitude (xTx_T).

Return Period vs. Probability

T=1PT = \frac{1}{P}

Exceedance Probability (PP)

The probability that an event of magnitude x\ge x will occur in any given year. For example:
  • 100-year flood: T=100T = 100, so P=1/100=0.01P = 1/100 = 0.01 (1% chance of occurring in any single year).
  • 50-year flood: T=50T = 50, so P=0.02P = 0.02 (2% chance).

Risk (RR)

The probability that an event with return period TT will occur at least once in a project life of nn years.

Risk Equation

R=1(1P)n=1(11T)nR = 1 - (1 - P)^n = 1 - (1 - \frac{1}{T})^n

Reliability

The probability that the event will not occur in nn years.

Reliability Equation

Reliability=1R=(1P)n\text{Reliability} = 1 - R = (1 - P)^n

Risk and Reliability Calculator

Explore the relationship between Return Period (TT), Annual Exceedance Probability (PP), and the total Risk over a project's design life (nn).

P = 1/100 = 1.00% chance of occurring in any single year.

Total lifetime risk of the event occurring at least once: 39.50%.

Loading chart...

Frequency Analysis

Used to relate the magnitude of extreme events to their frequency of occurrence using probability distributions.

General Frequency Equation

xT=xˉ+Kσx_T = \bar{x} + K \cdot \sigma

Variables

  • xTx_T: Value of variate with return period TT (e.g., peak discharge).
  • xˉ\bar{x}: Mean of the data series.
  • σ\sigma: Standard deviation of the data series.
  • KK: Frequency factor (depends on the probability distribution and TT).

  1. Gumbel's Extreme Value Distribution (Type I)

Commonly used for flood frequency analysis.

Gumbel's Frequency Factor

K=yTyˉnSnK = \frac{y_T - \bar{y}_n}{S_n}

Reduced Variate (yTy_T)

yT=ln[ln(11T)]y_T = -\ln [-\ln (1 - \frac{1}{T})]

Note

Where yˉn\bar{y}_n and SnS_n are reduced mean and standard deviation, which depend only on sample size NN.

  1. Log-Pearson Type III Distribution

The standard method for flood frequency analysis in the United States (USGS Bulletin 17B/17C). It applies the general frequency equation to the logarithms of the discharge values (y=logxy = \log x).

Log-Pearson III Equation

logxT=logx+Kzσlogx\log x_T = \overline{\log x} + K_z \cdot \sigma_{\log x}

Note

Where KzK_z is a function of the return period TT and the skewness coefficient (CsC_s) of the log-transformed data.

  1. Log-Normal Distribution

A special case of the Log-Pearson Type III distribution where the skewness coefficient of the logarithmic data is exactly zero (Cs=0C_s = 0).

Log-Normal Equation

yT=yˉ+KzSyy_T = \bar{y} + K_z \cdot S_y

Note

Where y=lnxy = \ln x, yˉ\bar{y} is the mean of the logarithms, SyS_y is the standard deviation of the logarithms, and KzK_z is the standard normal deviate corresponding to return period TT (derived from normal probability tables). Finally, xT=eyTx_T = e^{y_T}.

Plotting Positions

To graphically plot a probability distribution from empirical data, the data points (e.g., annual peak floods) must be ranked in descending order (m=1m = 1 is the largest event). An empirical exceedance probability (PP) is then assigned to each rank using a plotting position formula.

Weibull Plotting Position

P=mN+1P = \frac{m}{N + 1}

Note

The Weibull formula is the most universally used, where NN is the total number of years of record. The corresponding Return Period is T=(N+1)/mT = (N+1)/m. Other formulas include Gringorten and Cunnane.

Confidence Limits

Statistical estimates have inherent uncertainty because they are based on a finite sample of historical data. Confidence limits provide a range within which the true value is expected to lie with a specified probability (e.g., 95% confidence).

Standard Error

The standard error of estimate quantifies the uncertainty in the calculated magnitude xTx_T. The confidence interval is typically xT±zcSex_T \pm z_c S_e, where zcz_c is the standard normal variate for the desired confidence level, and SeS_e is the standard error.

L-Moments in Hydrology

Traditional product moments (mean, variance, skewness) are highly sensitive to outliers in small datasets, which is common in flood records. L-moments are an advanced statistical tool used to estimate distribution parameters more robustly.

Advantages of L-Moments

L-moments are linear combinations of probability weighted moments (PWMs). Because they are linear, they do not square or cube the data values, making them far less susceptible to the influence of extreme outliers compared to traditional variance or skewness. They provide more reliable parameter estimates for distributions like the Generalized Extreme Value (GEV) distribution.

Probable Maximum Flood (PMF)

Probable Maximum Flood (PMF)

The most severe flood considered physically possible in a particular drainage basin, based on comprehensive hydrometeorological analysis of maximum precipitation and hydrologic factors favorable for maximum runoff.
Unlike a 100-year or 500-year flood derived from statistical frequency analysis, the PMF is an absolute theoretical upper bound. It is generated by routing the Probable Maximum Precipitation (PMP) through the basin's hydrologic model, assuming worst-case antecedent soil moisture conditions and peak snowmelt (if applicable).

Design Application

The PMF is strictly used for designing the spillways of high-hazard dams, where structural failure would result in unacceptable loss of human life and catastrophic downstream damage. By designing for the PMF, engineers ensure the dam will never overtop under any foreseeable physical conditions, effectively eliminating the risk of hydrologic failure.

Risk and Reliability

When designing hydraulic structures, engineers must assess the probability that a design event will be exceeded over the lifetime of the structure.

Risk Equation

R=1(1P)nR = 1 - (1 - P)^n

Variables

  • RR: Risk (probability that the event will occur at least once in nn years)
  • PP: Probability of occurrence in any single year (P=1/TP = 1/T)
  • nn: Design life of the structure (years)

Reliability

Reliability is the probability that the structure will not fail (i.e., the design event will not be exceeded) during its design life. It is simply 1R1 - R.
Key Takeaways
  • Hydrologic events cannot be predicted with absolute certainty due to their inherent randomness.
  • Statistical Hydrology applies probability theory to historical data to estimate the likelihood and magnitude of future extreme events (floods, droughts).
  • Return Period (TT) is the statistical average time interval between occurrences of an event of a specific magnitude.
  • It is the mathematical inverse of the Annual Exceedance Probability (PP): T=1/PT = 1/P.
  • Risk (RR) is the probability that an event will occur at least once during a project's design life (nn).
  • Even a 100-year flood has a 1% chance of occurring in any given year, meaning it could theoretically happen in consecutive years.
  • Frequency Analysis fits historical data to theoretical probability distributions to extrapolate extreme events beyond the recorded timeframe.
  • The General Frequency Equation (xT=xˉ+Kσx_T = \bar{x} + K \cdot \sigma) scales the mean by a frequency factor KK and standard deviation σ\sigma.
  • Gumbel's Extreme Value Type I is traditionally used for maximum annual flood series.
  • The Log-Pearson Type III distribution is the standard method mandated by US federal agencies for flood frequency analysis.
  • Plotting Positions like the Weibull Formula (P=m/(N+1)P = m/(N+1)) assign empirical probabilities to ranked historical data for graphical comparison against theoretical distributions.
  • Statistical estimates are uncertain because they rely on finite historical sample sizes.
  • Confidence Limits define a bound (e.g., 95%) within which the true magnitude of an event is expected to lie.
  • The width of the confidence interval depends on the Standard Error (SeS_e), which decreases as the length of the historical data record increases.
  • The Probable Maximum Flood (PMF) is the absolute physical upper limit of flooding for a basin, derived deterministically from the PMP, rather than statistically.
  • High-hazard dam spillways are designed to safely pass the PMF to ensure zero risk of catastrophic overtopping.