Numerical Differentiation - Theory & Concepts

Numerical Differentiation

Numerical differentiation deals with approximating the derivative of a mathematical function or discrete data. It is fundamentally derived from Taylor series expansions.

Derivation from Taylor Series

The mathematical foundation for numerical differentiation is the Taylor series. By expanding $f(x_{i+1})$ around $x_i$ , we get:

$f(x_{i+1}) = f(x_i) + f'(x_i)h + \frac{f''(x_i)}{2!}h^2 + O(h^3)$

Solving for $f'(x_i)$ yields the finite divided difference formulas. The truncated terms represent the theoretical truncation error of the approximation.

Finite Divided Differences for the First Derivative

The simplest approximations of the first derivative are based on truncating the Taylor series after the first derivative term.

Common Finite Differences

Forward difference: $f'(x_i) \approx \frac{f(x_{i+1}) - f(x_i)}{h}$ (Truncation Error $E_t = -\frac{f''(x_i)}{2}h = O(h)$ )
Backward difference: $f'(x_i) \approx \frac{f(x_i) - f(x_{i-1})}{h}$ (Truncation Error $E_t = \frac{f''(x_i)}{2}h = O(h)$ )
Centered difference: $f'(x_i) \approx \frac{f(x_{i+1}) - f(x_{i-1})}{2h}$ (Truncation Error $E_t = -\frac{f'''(x_i)}{6}h^2 = O(h^2)$ )

Note

The centered difference formula is mathematically more accurate, possessing a truncation error of $O(h^2)$ , compared to $O(h)$ for the standard forward and backward differences. This is because the $O(h^2)$ term in the Taylor series exactly cancels out when subtracting the forward and backward expansions.

Formulas for Higher-Order Derivatives

By manipulating Taylor series expansions for multiple points (e.g., $x_{i+1}, x_{i-1}, x_{i+2}$ ), we can derive formulas for higher-order derivatives. The centered difference formulas for the second and third derivatives are:

Higher-Order Centered Differences

Second Derivative: $f''(x_i) \approx \frac{f(x_{i+1}) - 2f(x_i) + f(x_{i-1})}{h^2}$ with error $O(h^2)$
Third Derivative: $f'''(x_i) \approx \frac{f(x_{i+2}) - 2f(x_{i+1}) + 2f(x_{i-1}) - f(x_{i-2})}{2h^3}$ with error $O(h^2)$

High-Accuracy Differentiation Formulas

Higher-accuracy formulas can be generated by including more terms from the Taylor series expansion. For example, a more accurate forward difference formula requires points $x_i$ , $x_{i+1}$ , and $x_{i+2}$ , achieving $O(h^2)$ without using a centered span.

Richardson Extrapolation

Richardson extrapolation is an elegant method to improve the accuracy of a derivative estimate by combining two less accurate estimates computed with different step sizes.

Generalized Richardson Extrapolation

If $D(h)$ is an approximation of order $O(h^p)$ , we can combine estimates using step sizes $h_1$ and $h_2$ (where $h_2 = h_1/r$ , typically $r=2$ ) to eliminate the leading error term and obtain a higher-order estimate $O(h^{p+1})$ : $D \approx D(h_2) + \frac{D(h_2) - D(h_1)}{r^p - 1}$ For centered differences ( $p=2$ ) and halving the step size ( $r=2$ ), this reduces to the classic formula: $D \approx \frac{4}{3}D(h_2) - \frac{1}{3}D(h_1)$ Which increases the accuracy from $O(h^2)$ to $O(h^4)$ .

Formulas for Unequally Spaced Data

In practice, experimental data points are often not evenly spaced. In this case, standard finite difference formulas cannot be applied. Instead, a Lagrange interpolating polynomial is fit through the adjacent points and then differentiated.

Unequally Spaced Differences

For three data points $x_{i-1}, x_i, x_{i+1}$ , let $h_{i-1} = x_i - x_{i-1}$ and $h_i = x_{i+1} - x_i$ . The derivative $f'(x_i)$ is given by: $f'(x_i) = \frac{-h_i}{h_{i-1}(h_{i-1} + h_i)}f(x_{i-1}) + \frac{h_i - h_{i-1}}{h_{i-1}h_i}f(x_i) + \frac{h_{i-1}}{h_i(h_{i-1} + h_i)}f(x_{i+1})$ When $h_{i-1} = h_i = h$ , this formula correctly collapses to the standard centered difference.

Errors in Numerical Differentiation and Condition Number

Numerical differentiation is inherently unstable and represents an ill-conditioned problem. While reducing the step size $h$ initially decreases the truncation error, it significantly magnifies the round-off error.

Condition Number and Loss of Significance

The numerical calculation of a derivative requires subtracting two nearly equal function values, $f(x+h) - f(x)$ , and dividing by a very small number, $h$ . This process is highly susceptible to loss of significance (subtractive cancellation). The condition number for numerical differentiation scales proportionally to $1/h$ . As $h \to 0$ , the condition number approaches infinity, meaning the problem becomes infinitely sensitive to the finite precision limits of floating-point arithmetic. Thus, total error forms a "U-shaped" curve: decreasing $h$ beyond an optimal point causes the round-off error to dominate, drastically degrading accuracy.

Caution

Because numerical differentiation amplifies error by a factor proportional to $1/h$ , differentiating raw, scattered data directly often leads to useless results. The noise completely masks the true derivative.

Data Smoothing Before Differentiation

Because of the ill-conditioned nature of numerical differentiation, raw experimental data must almost always be smoothed before derivatives are taken.

Procedure

STEP-BY-STEP

Visual Inspection: Plot the data to identify the level of noise and potential outliers.
Smoothing or Regression: Apply a low-pass filter (like a moving average) or fit a smooth curve (like a low-order polynomial regression or a smoothing spline) to the data.
Differentiation: Differentiate the fitted curve analytically, or apply numerical differentiation to the smoothed data points.

Partial Derivatives

For functions of multiple variables, partial derivatives are approximated by holding all other variables constant and applying the standard finite difference formulas to the variable of interest.

Advanced Differentiation Techniques

Beyond finite differences, modern engineering and machine learning heavily rely on computational differentiation techniques to avoid round-off errors and the need for analytical derivations.

Complex Step Differentiation

If $f(x)$ is an analytic function, passing a complex variable into it can remarkably compute the real derivative without subtractive cancellation (round-off) errors. The formula uses a very small complex step $ih$ : $f'(x) \approx \frac{\text{Im}(f(x + ih))}{h}$ Since there is no subtraction in the numerator, $h$ can be chosen as small as machine epsilon (e.g., $10^{-16}$ ) to achieve near-exact precision without catastrophic cancellation.

Automatic Differentiation (AD)

AD computes exact derivatives by systematically applying the chain rule to the elementary operations (addition, multiplication, trigonometric functions) that make up a computer program. It is neither symbolic differentiation (which produces massive equations) nor numerical differentiation (which suffers from truncation and round-off error). AD allows for exact gradient evaluations at the cost of one forward evaluation, which is the foundational technology behind modern deep learning frameworks.

Key Takeaways

Finite difference formulas are directly derived from the Taylor series expansion.
Derivatives can be approximated using forward, backward, or centered finite divided differences. Truncation errors are derived from the unused Taylor series terms.
For unequally spaced data, the derivative is derived from differentiating a Lagrange interpolating polynomial over adjacent points.
Centered differences are generally more accurate ( $O(h^2)$ ) than forward or backward differences ( $O(h)$ ) due to the cancellation of the $O(h^2)$ terms in the Taylor series.
Formulas for higher-order derivatives (2nd, 3rd) and high-accuracy formulas involve more neighboring data points.
Richardson extrapolation combines two estimates of lower accuracy to produce one of higher accuracy, with the generalized formula eliminating the leading error term.
Numerical differentiation is an inherently ill-conditioned problem because the condition number scales with $1/h$ . It suffers from severe loss of significance (subtractive cancellation) when $h$ becomes too small.
Because of noise amplification, data smoothing or regression must almost always precede the numerical differentiation of experimental data.
Complex step differentiation avoids subtractive cancellation entirely, allowing extremely small step sizes.
Automatic Differentiation (AD) provides exact analytical derivatives numerically by programmatically applying the chain rule to atomic operations.

PreviousCurve Fitting and Interpolation - Examples & Applications

NextNumerical Differentiation - Examples & Applications