Skip to content
Perspective Projection

Perspective Projection

Perspective projection is the map from the 3D view frustum to the normalized device coordinate cube. It is not a linear map — it involves a rational function of depth — and understanding its construction explains several otherwise puzzling properties of the rendering pipeline: why depth precision degrades at distance, why shadow maps need a bias, and why the NDC morph must be constructed in frustum space rather than clip space.

The Frustum

The view frustum is the region of space visible to the camera. It is defined by six planes: near, far, left, right, top, bottom. In camera eye space — with the camera at the origin looking along z-z — the frustum is a truncated pyramid:

near plane: z=nfar plane: z=f\text{near plane: } z = -n \qquad \text{far plane: } z = -f

left: x=lznright: x=rzn\text{left: } x = \frac{l \cdot z}{-n} \qquad \text{right: } x = \frac{r \cdot z}{-n}

bottom: y=bzntop: y=tzn\text{bottom: } y = \frac{b \cdot z}{-n} \qquad \text{top: } y = \frac{t \cdot z}{-n}

where n,f>0n, f > 0 and l,r,b,tl, r, b, t are the extents at the near plane.

Similar Triangles

The core of perspective projection is the observation that a point (x,y,z)(x, y, z) in eye space projects onto the near plane at:

x=nxzy=nyzx' = \frac{-n \cdot x}{z} \qquad y' = \frac{-n \cdot y}{z}

This follows from similar triangles: the ratio of xx' to nn equals the ratio of xx to z-z (the sign arises because z<0z < 0 in eye space). The division by zz is the perspective divide — the non-linear step that shrinks far objects and expands near ones.

The Projection Matrix

The GPU performs the perspective divide implicitly via homogeneous coordinates. The projection matrix PP encodes the transform so that:

pclip=Ppeye=(xcyczcwc)\mathbf{p}_{\text{clip}} = P \cdot \mathbf{p}_{\text{eye}} = \begin{pmatrix} x_c \\ y_c \\ z_c \\ w_c \end{pmatrix}

and the perspective divide is applied after:

pndc=pclipwc=(xc/wcyc/wczc/wc)\mathbf{p}_{\text{ndc}} = \frac{\mathbf{p}_{\text{clip}}}{w_c} = \begin{pmatrix} x_c / w_c \\ y_c / w_c \\ z_c / w_c \end{pmatrix}

The matrix is derived by requiring that:

  1. xndc=2xrlr+lrlx_{\text{ndc}} = \frac{2x'}{r - l} - \frac{r + l}{r - l} maps [l,r][l, r] at the near plane to [1,1][-1, 1]
  2. yndc=2ytbt+btby_{\text{ndc}} = \frac{2y'}{t - b} - \frac{t + b}{t - b} maps [b,t][b, t] to [1,1][-1, 1]
  3. zndcz_{\text{ndc}} maps z[n,f]z \in [-n, -f] to [1,1][-1, 1] (OpenGL / WebGL convention)
  4. wc=zw_c = -z so that dividing by wcw_c performs the perspective divide

Substituting x=nx/zx' = -nx/z into condition 1 and setting wc=zw_c = -z:

xc=2nrlx+r+lrlzx_c = \frac{2n}{r-l} x + \frac{r+l}{r-l} z

Since we need xc=wcxndcx_c = w_c \cdot x_{\text{ndc}} and wc=zw_c = -z, the matrix column-major form is:

P=(2nrl0r+lrl002ntbt+btb000f+nfn2fnfn0010)P = \begin{pmatrix} \frac{2n}{r-l} & 0 & \frac{r+l}{r-l} & 0 \\ 0 & \frac{2n}{t-b} & \frac{t+b}{t-b} & 0 \\ 0 & 0 & -\frac{f+n}{f-n} & -\frac{2fn}{f-n} \\ 0 & 0 & -1 & 0 \end{pmatrix}

In column-major storage (OpenGL / GLSL convention), the flat array reads column by column — see Foundations / Matrices for the layout derivation.

For a symmetric frustum (r=lr = -l, t=bt = -b) parameterized by vertical field of view ϕ\phi and aspect ratio aa:

P=(1atan(ϕ/2)00001tan(ϕ/2)0000f+nfn2fnfn0010)P = \begin{pmatrix} \frac{1}{a \tan(\phi/2)} & 0 & 0 & 0 \\ 0 & \frac{1}{\tan(\phi/2)} & 0 & 0 \\ 0 & 0 & -\frac{f+n}{f-n} & -\frac{2fn}{f-n} \\ 0 & 0 & -1 & 0 \end{pmatrix}

This is the matrix p5’s perspective(fov, aspect, near, far) produces.

Depth Non-Linearity

The zz mapping is the critical one. From the matrix:

zndc=zcwc=f+nfnz2fnfnz=f+nfn+2fn(fn)zz_{\text{ndc}} = \frac{z_c}{w_c} = \frac{-\frac{f+n}{f-n} z - \frac{2fn}{f-n}}{-z} = \frac{f+n}{f-n} + \frac{2fn}{(f-n)z}

This is a rational function of zz. It is not linear — most of the NDC depth range is consumed by geometry near the camera, leaving very little precision for distant geometry.

Concretely, for n=1n = 1, f=1000f = 1000:

Eye zNDC z
1-1 (near)1.000-1.000
10-100.978-0.978
100-1000.820-0.820
500-5000.601-0.601
1000-1000 (far)+1.000+1.000

Half the NDC range (1-1 to 00) covers the first 1% of the frustum depth. This is why depth buffer precision matters, and why a large f/nf/n ratio causes z-fighting at distance.

The Perspective Divide is Irreversible Without ww

After the divide pndc=pclip/wc\mathbf{p}_{\text{ndc}} = \mathbf{p}_{\text{clip}} / w_c, the original zz is gone — zndcz_{\text{ndc}} encodes it non-linearly, and recovering zz requires the original nn and ff. This is why the NDC morph in Spaces / Perspective to NDC is constructed in frustum eye space rather than clip space: clip space positions have already been scaled by wcw_c, and the interpolation mix(p,p,d)\texttt{mix}(p, p', d) would be in the wrong space.

Connection to the Bias Matrix

The depth map in shadow mapping is stored in [0,1][0, 1], not [1,1][-1, 1]. The bias matrix remaps NDC to texture coordinates:

B=(0.5000.500.500.5000.50.50001)B = \begin{pmatrix} 0.5 & 0 & 0 & 0.5 \\ 0 & 0.5 & 0 & 0.5 \\ 0 & 0 & 0.5 & 0.5 \\ 0 & 0 & 0 & 1 \end{pmatrix}

The composition W=BPLVLW = B \cdot P_L \cdot V_L maps world positions to shadow map UV coordinates in a single matrix multiply. See Shading / Shadow Mapping for the full pipeline.

Historical Note

The perspective projection matrix in its homogeneous form was established in the early 1960s alongside the development of computer graphics as a field. Ivan Sutherland’s Sketchpad (1963) operated in a projective framework; the specific 4×44 \times 4 form with wc=zw_c = -z became standard with the OpenGL specification (Silicon Graphics, 1992), which fixed the column-major, right-handed, z[1,1]z \in [-1, 1] convention that WebGL inherits. The non-linearity of the depth mapping was known from the beginning — it is an unavoidable consequence of the rational nature of the perspective transform.

Proof: The zz Mapping is Rational

Claim: No affine map zndc=az+bz_{\text{ndc}} = az + b can simultaneously satisfy zndc(n)=1z_{\text{ndc}}(-n) = -1 and zndc(f)=1z_{\text{ndc}}(-f) = 1 while being derived from a matrix that produces wc=zw_c = -z.

Proof: The zz row of PP must produce zc=Az+Bz_c = Az + B for some constants A,BA, B (since PP is linear). After dividing by wc=zw_c = -z:

zndc=Az+Bz=ABzz_{\text{ndc}} = \frac{Az + B}{-z} = -A - \frac{B}{z}

This is affine in 1/z1/z, not in zz. The only way to make it affine in zz would require B=0B = 0, which gives zndc=Az_{\text{ndc}} = -A — a constant, independent of zz. No depth information would be preserved. The non-linearity is therefore unavoidable. \square

References

  1. Ahn, S. H. OpenGL Projection Matrix. Detailed derivation with diagrams.
  2. Shirley, P. & Marschner, S. Fundamentals of Computer Graphics. CRC Press. §7.3.
  3. Lengyel, E. Mathematics for 3D Game Programming and Computer Graphics. Course Technology. Chapter 4.
  4. Sutherland, I. Sketchpad: A Man-Machine Graphical Communication System. MIT PhD thesis, 1963.