Systems of Linear Equations - Theory & Concepts

Systems of Linear Equations

Solving systems of simultaneous linear algebraic equations is essential in many fields of engineering. A general system can be represented in matrix form as $[A]\{x\} = \{B\}$ .

Matrix Fundamentals and Cramer's Rule

Before employing numerical methods, it is crucial to understand basic matrix operations (addition, multiplication) and properties (determinant, trace). A system $[A]\{x\} = \{B\}$ has a unique solution if and only if the determinant of the coefficient matrix $[A]$ is non-zero ( $|A| \neq 0$ ).

Cramer's Rule

A direct analytical method that expresses the solution in terms of determinants. The unknown $x_k$ is found by: $x_k = \frac{|A_k|}{|A|}$ where $|A_k|$ is the determinant of the matrix formed by replacing the $k$ -th column of $[A]$ with the constant vector $\{B\}$ . While conceptually simple, Cramer's rule is computationally impractical for systems larger than $3 \times 3$ due to the factorial growth in computing determinants ( $O(n!)$ operations).

Gauss Elimination

Gauss elimination is a direct method that involves two phases: forward elimination and back substitution.

Procedure

STEP-BY-STEP

Forward Elimination: Manipulate the equations to eliminate unknowns and create an upper triangular system.
Back Substitution: Solve for the unknowns starting from the last equation and working backward to the first.

Computational Complexity (FLOPs)

To compare algorithms, we count the floating-point operations (FLOPs), primarily multiplications and divisions.

Forward Elimination: Requires approximately $\frac{2n^3}{3}$ FLOPs.
Back Substitution: Requires approximately $n^2$ FLOPs.
Total: Thus, Gauss Elimination is an $O(n^3)$ operation, specifically dominated by the $n^3/3$ multiplications. This cubic scaling means doubling the size of a system multiplies the execution time by eight.

Caution

Gauss elimination is susceptible to division by zero and round-off errors. Pivoting (swapping rows so that the largest available element is used as the pivot) is often necessary to avoid these issues.

Gauss Elimination Process

Step-by-step visualization of solving a 3x3 system of linear equations using naive Gauss elimination.

Initial Augmented Matrix [A|B]

3.0000	-0.1000	-0.2000	7.8500
0.1000	7.0000	-0.3000	-19.3000
0.3000	-0.2000	10.0000	71.4000

Step 1 of 7

iCurrent Action

Identify pivot element a_11 (3).

Solution Vector x

x_1

x_2

x_3

Pivoting Strategies

In Gauss elimination, division by zero or a very small number can lead to severe round-off errors. Pivoting is a technique to rearrange the rows or columns of the coefficient matrix to place a larger absolute value in the diagonal pivot position.

Types of Pivoting

Partial Pivoting: The most common approach. Before eliminating variables in column $k$ , it searches the remaining rows $i \ge k$ for the largest absolute element $|a_{ik}|$ and swaps row $k$ with that row. This ensures the multiplier is always $\le 1$ , minimizing error growth.
Complete Pivoting: Searches both the remaining rows and columns for the largest absolute element to use as the pivot. This involves swapping both rows and columns. While numerically more stable, the extensive search and bookkeeping make it computationally expensive and rarely used compared to partial pivoting.

Gauss-Jordan

A variation of Gauss elimination where, when an unknown is eliminated, it is eliminated from all other equations rather than just the subsequent ones. The result is an identity matrix, making back substitution unnecessary. However, the procedure requires roughly $50\%$ more operations ( $n^3$ multiplications) than standard Gauss Elimination, making it less efficient for solving standard systems.

Matrix Inversion

The inverse of a square matrix $[A]$ , denoted as $[A]^{-1}$ , is a matrix such that $[A][A]^{-1} = [I]$ , where $[I]$ is the identity matrix. If the inverse is known, the solution to the system is simply $\{x\} = [A]^{-1}\{B\}$ .

Note

Gauss-Jordan elimination can be used to determine the inverse of a matrix by augmenting the original matrix with the identity matrix and performing the elimination steps.

LU Decomposition

LU decomposition separates the time-consuming elimination of the matrix $[A]$ from the manipulations of the right-hand side $\{B\}$ . This is enormously advantageous when solving the same system for many different load vectors $\{B\}$ because the expensive $O(n^3)$ elimination step only happens once.

LU Decomposition Process

The matrix $[A]$ is decomposed into a lower triangular matrix $[L]$ and an upper triangular matrix $[U]$ : $[A] = [L][U]$ The system $[A]\{x\} = \{B\}$ becomes $[L][U]\{x\} = \{B\}$ . We can then solve for $\{d\}$ in $[L]\{d\} = \{B\}$ using forward substitution, and then solve for $\{x\}$ in $[U]\{x\} = \{d\}$ using back substitution. There are specific algorithms for computing the LU factors:

Doolittle's Method: Specifies that all diagonal elements of the lower triangular matrix $[L]$ must be $1$ (i.e., $l_{ii} = 1$ ).
Crout's Method: Specifies that all diagonal elements of the upper triangular matrix $[U]$ must be $1$ (i.e., $u_{ii} = 1$ ).

Cholesky Decomposition

If the matrix $[A]$ is symmetric and positive definite, it can be decomposed into a lower triangular matrix and its transpose: $[A] = [L][L]^T$ This method is highly efficient, requiring roughly $\frac{n^3}{6}$ operations, about half the execution time and storage of standard LU decomposition.

Singular Value Decomposition (SVD)

SVD is one of the most powerful matrix factorization techniques, particularly useful for non-square or ill-conditioned matrices where LU or Cholesky decomposition fails.

SVD Formula

Any $m \times n$ matrix $[A]$ can be factored as: $[A] = [U][\Sigma][V]^T$ Where:

$[U]$ is an $m \times m$ orthogonal matrix of left singular vectors.
$[\Sigma]$ is an $m \times n$ diagonal matrix containing the singular values $\sigma_i$ , which are non-negative and listed in descending order.
$[V]^T$ is the transpose of an $n \times n$ orthogonal matrix $[V]$ of right singular vectors.

SVD can easily compute the pseudoinverse of a matrix to solve least-squares problems and explicitly determine the rank, null space, and condition number of $[A]$ (where $\text{Cond}(A) = \sigma_{\text{max}}/\sigma_{\text{min}}$ ).

Thomas Algorithm for Tridiagonal Systems

Many engineering problems, particularly those involving 1D partial differential equations (like the heat equation) or cubic splines, result in tridiagonal systems of linear equations. A tridiagonal matrix has non-zero elements only on the main diagonal, the first diagonal below this, and the first diagonal above this.

Thomas Algorithm

The Thomas algorithm is a highly efficient, simplified form of Gaussian elimination specifically designed for tridiagonal systems. It operates in two passes:

Forward Elimination: Eliminates the lower diagonal elements.
Backward Substitution: Solves for the unknowns starting from the last equation.

Because it avoids operations on known zero elements, the computational cost is strictly proportional to the number of equations ( $O(n)$ ) rather than $O(n^3)$ for standard Gaussian elimination. This allows solving millions of equations in milliseconds.

Vector Norms

To quantify the "size" of a matrix, we first must define the "length" or "magnitude" of a vector in higher dimensions using a vector norm.

Common Vector Norms

A vector norm $||v||$ must satisfy basic scaling and triangle inequality rules. Common examples include:

Euclidean Norm ( $L_2$ norm): The standard physical length of a vector, defined as $\sqrt{\sum v_i^2}$ .
Sum Norm ( $L_1$ norm): The sum of the absolute values of the elements, $\sum |v_i|$ .
Max Norm ( $L_\infty$ norm): The single largest absolute value element in the vector, $\max |v_i|$ .

Matrix Norms

Building on vector norms, matrix norms quantify how much a matrix $[A]$ can stretch or amplify a vector. A matrix norm $||A||$ is a real number that provides a measure of the "size" or magnitude of the matrix.

Common Matrix Norms

Row-Sum Norm ( $||A||_{\infty}$ ): The maximum of the sums of the absolute values of the elements in each row.
Column-Sum Norm ( $||A||_1$ ): The maximum of the sums of the absolute values of the elements in each column.
Spectral Norm ( $||A||_2$ ): The square root of the largest eigenvalue of $A^TA$ .

The formal Condition Number is defined as $\text{Cond}(A) = ||A|| \cdot ||A^{-1}||$ . If $\text{Cond}(A) \approx 1$ , the matrix is well-conditioned. If $\text{Cond}(A) \gg 1$ , it is ill-conditioned.

Ill-Conditioning

A system is ill-conditioned if small changes in the coefficients (e.g., due to round-off errors) result in large changes in the solution. This implies the matrix is close to being singular (determinant near zero). The condition number quantifies this: a large condition number indicates an ill-conditioned system. The condition number is formally connected to matrix norms as $\text{Cond}(A) = ||A|| \cdot ||A^{-1}||$ , and in SVD terms, as the ratio of the largest to smallest singular value. Techniques like pivoting and using higher precision arithmetic are required to mitigate these effects.

Sparse Matrices

A matrix is sparse if a vast majority of its elements are zero. Typical examples arise in the finite difference and finite element methods (like tridiagonal and banded matrices).

Sparse Matrix Techniques

Instead of storing $n^2$ elements, sparse matrix representations (like Compressed Sparse Row, CSR) store only the non-zero elements and their indices. This drastically reduces memory usage from $O(n^2)$ to $O(k)$ where $k$ is the number of non-zeros. Iterative methods (like Conjugate Gradient or Gauss-Seidel) are particularly well-suited for sparse systems because they only require matrix-vector multiplication, preserving the sparsity structure.

Iterative Refinement

Iterative refinement is a technique used to improve the accuracy of solutions obtained by direct methods (like LU decomposition) when dealing with ill-conditioned systems, essentially using double precision to polish a single-precision result.

Procedure

STEP-BY-STEP

Calculate an initial approximate solution $\{ \tilde{x} \}$ using a direct method.
Compute the residual vector $\{R\} = \{B\} - [A]\{\tilde{x}\}$ using higher precision arithmetic.
Solve the correction system $[A]\{\Delta x\} = \{R\}$ using the already computed LU factors.
Update the solution: $\{x_{\text{new}}\} = \{\tilde{x}\} + \{\Delta x\}$ .

This process can be repeated until the correction $\{\Delta x\}$ becomes negligible.

Iterative Methods

Unlike direct methods, iterative methods start with a guess and continually refine the estimate. They are particularly useful for large systems containing many zero elements (sparse matrices).

Jacobi and Gauss-Seidel Methods

Jacobi Method: Computes all new values $x_i^{k+1}$ using only the old values $x^k$ from the previous iteration.
Gauss-Seidel Method: Uses the most recently computed values $x_j^{k+1}$ as soon as they are available, generally converging faster than the Jacobi method.

$x_i^{k+1} = \frac{b_i - \sum_{j=1}^{i-1} a_{ij}x_j^{k+1} - \sum_{j=i+1}^n a_{ij}x_j^k}{a_{ii}}$

Note

For both methods to converge reliably, the system must typically be diagonally dominant (the absolute value of the diagonal element is greater than the sum of the absolute values of the other elements in the row). Furthermore, the rigorous mathematical condition for the convergence of any linear stationary iterative method (like Jacobi or Gauss-Seidel) is that the spectral radius $\rho$ (the maximum absolute eigenvalue) of the iteration matrix must be strictly less than $1$ ( $\rho < 1$ ).

Successive Over-Relaxation (SOR)

To accelerate the convergence of the Gauss-Seidel method, relaxation techniques apply a weighting factor $\omega$ . $x_i^{\text{new}} = \omega x_i^{\text{calculated}} + (1 - \omega)x_i^{\text{old}}$ If $1 < \omega < 2$ , the method is called over-relaxation and accelerates convergence.

Key Takeaways

Cramer's Rule provides an exact analytical solution but is computationally inefficient for large systems ( $O(n!)$ operations).
Gauss elimination and LU decomposition are direct methods that require $O(n^3/3)$ operations. They yield exact solutions barring round-off errors.
Partial Pivoting swaps rows to place the largest element on the diagonal, which is essential in direct methods to prevent division by zero and minimize round-off errors.
LU decomposition is highly efficient for multiple right-hand sides since the elimination is only performed once. Cholesky decomposition optimizes this for symmetric positive-definite matrices ( $O(n^3/6)$ ).
Singular Value Decomposition (SVD) provides a robust factorization for ill-conditioned or non-square matrices and directly yields the pseudoinverse.
Thomas Algorithm solves tridiagonal systems efficiently in $O(n)$ time.
Sparse Matrices store only non-zero elements, dramatically reducing memory usage and making iterative solvers highly attractive.
Ill-conditioned systems are highly sensitive to round-off errors and have large condition numbers, often quantified using Vector and Matrix Norms (L1, L2, L-infinity).
Iterative Refinement can polish solutions from direct methods by using higher precision on the residual.
Jacobi and Gauss-Seidel are iterative methods effective for large, sparse, diagonally dominant systems. Convergence requires the spectral radius of the iteration matrix to be less than 1.
Successive Over-Relaxation (SOR) accelerates the convergence of iterative methods.

PreviousRoots of Equations - Examples & Applications

NextSystems of Linear Equations - Examples & Applications