Perspective Projection

Perspective projection is the map from the 3D view frustum to the normalized device coordinate cube. It is not a linear map — it involves a rational function of depth — and understanding its construction explains several otherwise puzzling properties of the rendering pipeline: why depth precision degrades at distance, why shadow maps need a bias, and why the NDC morph must be constructed in frustum space rather than clip space.

The Frustum

The view frustum is the region of space visible to the camera. It is defined by six planes: near, far, left, right, top, bottom. In camera eye space — with the camera at the origin looking along $-z$ — the frustum is a truncated pyramid:

\text{near plane: } z = -n \qquad \text{far plane: } z = -f

\text{left: } x = \frac{l \cdot z}{-n} \qquad \text{right: } x = \frac{r \cdot z}{-n}

\text{bottom: } y = \frac{b \cdot z}{-n} \qquad \text{top: } y = \frac{t \cdot z}{-n}

where $n, f > 0$ and $l, r, b, t$ are the extents at the near plane.

Similar Triangles

The core of perspective projection is the observation that a point $(x, y, z)$ in eye space projects onto the near plane at:

x' = \frac{-n \cdot x}{z} \qquad y' = \frac{-n \cdot y}{z}

This follows from similar triangles: the ratio of $x'$ to $n$ equals the ratio of $x$ to $-z$ (the sign arises because $z < 0$ in eye space). The division by $z$ is the perspective divide — the non-linear step that shrinks far objects and expands near ones.

The Projection Matrix

The GPU performs the perspective divide implicitly via homogeneous coordinates. The projection matrix $P$ encodes the transform so that:

\mathbf{p}_{\text{clip}} = P \cdot \mathbf{p}_{\text{eye}} = \begin{pmatrix} x_c \\ y_c \\ z_c \\ w_c \end{pmatrix}

and the perspective divide is applied after:

\mathbf{p}_{\text{ndc}} = \frac{\mathbf{p}_{\text{clip}}}{w_c} = \begin{pmatrix} x_c / w_c \\ y_c / w_c \\ z_c / w_c \end{pmatrix}

The matrix is derived by requiring that:

$x_{\text{ndc}} = \frac{2x'}{r - l} - \frac{r + l}{r - l}$ maps $[l, r]$ at the near plane to $[-1, 1]$
$y_{\text{ndc}} = \frac{2y'}{t - b} - \frac{t + b}{t - b}$ maps $[b, t]$ to $[-1, 1]$
$z_{\text{ndc}}$ maps $z \in [-n, -f]$ to $[-1, 1]$ (OpenGL / WebGL convention)
$w_c = -z$ so that dividing by $w_c$ performs the perspective divide

Substituting $x' = -nx/z$ into condition 1 and setting $w_c = -z$ :

x_c = \frac{2n}{r-l} x + \frac{r+l}{r-l} z

Since we need $x_c = w_c \cdot x_{\text{ndc}}$ and $w_c = -z$ , the matrix column-major form is:

P = \begin{pmatrix} \frac{2n}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\ 0 & \frac{2n}{t-b} & \frac{t+b}{t-b} & 0 \\ 0 & 0 & -\frac{f+n}{f-n} & -\frac{2fn}{f-n} \\ 0 & 0 & -1 & 0 \end{pmatrix}

In column-major storage (OpenGL / GLSL convention), the flat array reads column by column — see Foundations / Matrices for the layout derivation.

For a symmetric frustum ( $r = -l$ , $t = -b$ ) parameterized by vertical field of view $\phi$ and aspect ratio $a$ :

P = \begin{pmatrix} \frac{1}{a \tan(\phi/2)} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan(\phi/2)} & 0 & 0 \\ 0 & 0 & -\frac{f+n}{f-n} & -\frac{2fn}{f-n} \\ 0 & 0 & -1 & 0 \end{pmatrix}

This is the matrix p5’s perspective(fov, aspect, near, far) produces.

Depth Non-Linearity

The $z$ mapping is the critical one. From the matrix:

z_{\text{ndc}} = \frac{z_c}{w_c} = \frac{-\frac{f+n}{f-n} z - \frac{2fn}{f-n}}{-z} = \frac{f+n}{f-n} + \frac{2fn}{(f-n)z}

This is a rational function of $z$ . It is not linear — most of the NDC depth range is consumed by geometry near the camera, leaving very little precision for distant geometry.

Concretely, for $n = 1$ , $f = 1000$ :

Eye z	NDC z
$-1$ (near)	$-1.000$
$-10$	$-0.978$
$-100$	$-0.820$
$-500$	$-0.601$
$-1000$ (far)	$+1.000$

Half the NDC range ( $-1$ to $0$ ) covers the first 1% of the frustum depth. This is why depth buffer precision matters, and why a large $f/n$ ratio causes z-fighting at distance.

The Perspective Divide is Irreversible Without $w$

After the divide $\mathbf{p}_{\text{ndc}} = \mathbf{p}_{\text{clip}} / w_c$ , the original $z$ is gone — $z_{\text{ndc}}$ encodes it non-linearly, and recovering $z$ requires the original $n$ and $f$ . This is why the NDC morph in Spaces / Perspective to NDC is constructed in frustum eye space rather than clip space: clip space positions have already been scaled by $w_c$ , and the interpolation $\texttt{mix}(p, p', d)$ would be in the wrong space.

Connection to the Bias Matrix

The depth map in shadow mapping is stored in $[0, 1]$ , not $[-1, 1]$ . The bias matrix remaps NDC to texture coordinates:

B = \begin{pmatrix} 0.5 & 0 & 0 & 0.5 \\ 0 & 0.5 & 0 & 0.5 \\ 0 & 0 & 0.5 & 0.5 \\ 0 & 0 & 0 & 1 \end{pmatrix}

The composition $W = B \cdot P_L \cdot V_L$ maps world positions to shadow map UV coordinates in a single matrix multiply. See Shading / Shadow Mapping for the full pipeline.

Historical Note

The perspective projection matrix in its homogeneous form was established in the early 1960s alongside the development of computer graphics as a field. Ivan Sutherland’s Sketchpad (1963) operated in a projective framework; the specific $4 \times 4$ form with $w_c = -z$ became standard with the OpenGL specification (Silicon Graphics, 1992), which fixed the column-major, right-handed, $z \in [-1, 1]$ convention that WebGL inherits. The non-linearity of the depth mapping was known from the beginning — it is an unavoidable consequence of the rational nature of the perspective transform.

Proof: The $z$ Mapping is Rational

Claim: No affine map $z_{\text{ndc}} = az + b$ can simultaneously satisfy $z_{\text{ndc}}(-n) = -1$ and $z_{\text{ndc}}(-f) = 1$ while being derived from a matrix that produces $w_c = -z$ .

Proof: The $z$ row of $P$ must produce $z_c = Az + B$ for some constants $A, B$ (since $P$ is linear). After dividing by $w_c = -z$ :

z_{\text{ndc}} = \frac{Az + B}{-z} = -A - \frac{B}{z}

This is affine in $1/z$ , not in $z$ . The only way to make it affine in $z$ would require $B = 0$ , which gives $z_{\text{ndc}} = -A$ — a constant, independent of $z$ . No depth information would be preserved. The non-linearity is therefore unavoidable. $\square$

References

Ahn, S. H. OpenGL Projection Matrix. Detailed derivation with diagrams.
Shirley, P. & Marschner, S. Fundamentals of Computer Graphics. CRC Press. §7.3.
Lengyel, E. Mathematics for 3D Game Programming and Computer Graphics. Course Technology. Chapter 4.
Sutherland, I. Sketchpad: A Man-Machine Graphical Communication System. MIT PhD thesis, 1963.