← Back to calculus 3

Chain Rule - Several Variables

Chain Rule (Several Variables)

The chain rule is the main mechanism that turns derivatives of a function into derivatives of a composition.

When an input depends on another variable, the output changes through every path in the dependency graph.


1. Chain rule for f(x,y)f(x,y) with x=x(t)x=x(t) and y=y(t)y=y(t)

Let f:R2Rf:\mathbb{R}^2\to\mathbb{R} be differentiable, and let x=x(t)x=x(t) and y=y(t)y=y(t) be differentiable functions of a single variable tt. Then the composite function

F(t)=f(x(t),y(t))F(t)=f(x(t),y(t))

is differentiable, and the chain rule says

ddtf(x(t),y(t))=fx(x(t),y(t))x(t)+fy(x(t),y(t))y(t).\frac{d}{dt}f(x(t),y(t)) = f_x(x(t),y(t))\,x'(t) + f_y(x(t),y(t))\,y'(t).

This is the multivariable version of the one-variable chain rule from Calculus of the Curves, where a curve is differentiated by following how the parameter moves the point.

If f=(fx,fy)\nabla f=(f_x,f_y) denotes the gradient of ff and r(t)=(x(t),y(t))\mathbf r(t)=(x(t),y(t)) denotes the path, then the chain rule can be written compactly as a dot product:

ddtf(r(t))=f(r(t))r(t).\frac{d}{dt}f(\mathbf r(t)) = \nabla f(\mathbf r(t)) \cdot \mathbf r'(t).

This notation is common in courses that use linear algebra, but it says exactly the same thing as the formula above.


2. Interactive 3D graph

Consider

z=3x2+4xy+5y2,x=cost,y=sint.z = 3x^2 + 4xy + 5y^2, \qquad x=\cos t, \qquad y=\sin t.

Substituting the path into the surface gives the composed function

f(x(t),y(t))=3cos2t+4costsint+5sin2t.f(x(t),y(t)) = 3\cos^2 t + 4\cos t\sin t + 5\sin^2 t.

The graph below shows the surface together with the red curve traced by the path t(cost,sint)t \mapsto (\cos t, \sin t).

Surface z = 3x^2 + 4xy + 5y^2 with the red curve x = cos t, y = sin t

Differentiate the composition with respect to tt:

ddtf(x(t),y(t))=fx(x(t),y(t))(sint)+fy(x(t),y(t))(cost).\frac{d}{dt}f(x(t),y(t)) = f_x(x(t),y(t))(-\sin t) + f_y(x(t),y(t))(\cos t).

For this specific surface,

fx=6x+4y,fy=4x+10y,f_x = 6x + 4y, \qquad f_y = 4x + 10y,

so along the curve

ddtf(x(t),y(t))=(6cost+4sint)(sint)+(4cost+10sint)(cost).\frac{d}{dt}f(x(t),y(t)) = (6\cos t + 4\sin t)(-\sin t) + (4\cos t + 10\sin t)(\cos t).

3. Chain rule for f(x,y)f(x,y) with x=x(s,t)x=x(s,t) and y=y(s,t)y=y(s,t)

Now let x=x(s,t)x=x(s,t) and y=y(s,t)y=y(s,t) be differentiable functions of two variables, and define

F(s,t)=f(x(s,t),y(s,t)).F(s,t)=f(x(s,t),y(s,t)).

Then the partial derivatives of the composite function are

Fs(s,t)=fx(x,y)xs(s,t)+fy(x,y)ys(s,t),F_s(s,t) = f_x(x,y)\,x_s(s,t) + f_y(x,y)\,y_s(s,t), Ft(s,t)=fx(x,y)xt(s,t)+fy(x,y)yt(s,t).F_t(s,t) = f_x(x,y)\,x_t(s,t) + f_y(x,y)\,y_t(s,t).

The important pattern is unchanged: each output derivative is a sum over all the paths that feed into it.

The two formulas above can be written as a single matrix equation. If we arrange the partial derivatives into a row vector on the left and a matrix on the right, we get the Jacobian form:

(FsFt)=(fxfy)(xsxtysyt).\begin{pmatrix} F_s & F_t \end{pmatrix} = \begin{pmatrix} f_x & f_y \end{pmatrix} \begin{pmatrix} x_s & x_t \\ y_s & y_t \end{pmatrix}.

Each column of this matrix product reproduces one of the scalar chain rule formulas above.


4. Practical example: change from Cartesian to polar coordinates

Set

x=rcosθ,y=rsinθ.x = r\cos\theta, \qquad y = r\sin\theta.

If f(x,y)f(x,y) is a function of two variables, then the chain rule gives

fr=fxcosθ+fysinθ,\frac{\partial f}{\partial r} = f_x\cos\theta + f_y\sin\theta, fθ=rfxsinθ+rfycosθ.\frac{\partial f}{\partial\theta} = -r f_x\sin\theta + r f_y\cos\theta.

This is the most common worked example of a two-variable chain rule because the coordinate change appears everywhere later in multivariable calculus.

The two formulas above are the rows of a matrix product known as the Jacobian of the polar change of variables:

(frfθ)=(fxfy)(cosθrsinθsinθrcosθ).\begin{pmatrix} \frac{\partial f}{\partial r} & \frac{\partial f}{\partial\theta} \end{pmatrix} = \begin{pmatrix} f_x & f_y \end{pmatrix} \begin{pmatrix} \cos\theta & -r\sin\theta \\ \sin\theta & r\cos\theta \end{pmatrix}.

The 2×22\times 2 matrix on the right is the Jacobian matrix of the transformation (r,θ)(x,y)(r,\theta)\mapsto(x,y).


5. Example: z=exsinyz = e^x\sin y

Now substitute the polar coordinate change into

z=exsiny.z = e^x\sin y.

Then

z(r,θ)=ercosθsin(rsinθ).z(r,\theta) = e^{r\cos\theta}\sin(r\sin\theta).

Differentiate with respect to rr:

zr=ercosθsin(rsinθ)cosθ+ercosθcos(rsinθ)sinθ.z_r = e^{r\cos\theta}\sin(r\sin\theta)\cos\theta + e^{r\cos\theta}\cos(r\sin\theta)\sin\theta.

You can also write that as

zr=ercosθsin(rsinθ+θ).z_r = e^{r\cos\theta}\sin(r\sin\theta + \theta).

Differentiate with respect to θ\theta:

zθ=rercosθsinθsin(rsinθ)+rercosθcosθcos(rsinθ).z_\theta = -r e^{r\cos\theta}\sin\theta\,\sin(r\sin\theta) + r e^{r\cos\theta}\cos\theta\,\cos(r\sin\theta).

Equivalently,

zθ=rercosθcos(rsinθ+θ).z_\theta = r e^{r\cos\theta}\cos(r\sin\theta + \theta).

This is a clean example of how the chain rule turns a simple expression in xx and yy into a more complicated but still manageable expression in rr and θ\theta.


6. Most general version of the chain rule

Suppose uu is a differentiable function of nn variables x1,x2,,xnx_1, x_2, \ldots, x_n, and each xjx_j is a differentiable function of mm variables t1,t2,,tmt_1, t_2, \ldots, t_m.

Then uu is a function of t1,t2,,tmt_1, t_2, \ldots, t_m and

uti=ux1x1ti+ux2x2ti++uxnxnti\frac{\partial u}{\partial t_i} = \frac{\partial u}{\partial x_1}\frac{\partial x_1}{\partial t_i} + \frac{\partial u}{\partial x_2}\frac{\partial x_2}{\partial t_i} + \cdots + \frac{\partial u}{\partial x_n}\frac{\partial x_n}{\partial t_i}

for each i=1,2,,mi = 1, 2, \ldots, m.

In other words: to find the partial derivative of uu with respect to any one parameter tit_i, multiply the partial derivative of uu with respect to each intermediate variable xjx_j by the partial derivative of that xjx_j with respect to tit_i, then add all those products together.

All earlier versions of the chain rule are special cases. For n=2n=2 and m=1m=1 (a single parameter tt) this becomes the formula in Section 1. For n=2n=2 and m=2m=2 it becomes the formulas in Section 3.

The same formula can be expressed as a single matrix multiplication. Let g:RmRng:\mathbb{R}^m\to\mathbb{R}^n and f:RnRkf:\mathbb{R}^n\to\mathbb{R}^k be differentiable, and let DD denote the matrix of all partial derivatives (the Jacobian). Then

D(fg)(u)=Df(g(u))Dg(u).D(f\circ g)(\mathbf u) = Df(g(\mathbf u))\,Dg(\mathbf u).

For scalar-valued ff this is a row vector times a matrix; for vector-valued ff it is matrix multiplication. Written out component by component:

(fig)uj==1nfix(g(u))guj(u).\frac{\partial (f_i\circ g)}{\partial u_j} = \sum_{\ell=1}^n \frac{\partial f_i}{\partial x_\ell}(g(\mathbf u)) \frac{\partial g_\ell}{\partial u_j}(\mathbf u).

This is identical to the scalar sum formula above with the indices renamed.


7. Implicit differentiation of F(x,y)F(x,y)

Suppose an equation

F(x,y)=0F(x,y)=0

defines yy as a function of xx near a point where Fy0F_y\neq 0. Write that local solution as y=y(x)y=y(x).

Now differentiate the identity

F(x,y(x))=0F(x,y(x)) = 0

with respect to xx.

By the chain rule,

ddxF(x,y(x))=Fx(x,y(x))+Fy(x,y(x))dydx.\frac{d}{dx}F(x,y(x)) = F_x(x,y(x)) + F_y(x,y(x))\frac{dy}{dx}.

Since the left side is the derivative of the constant function 00, it must equal 00. Therefore

Fx+Fydydx=0,F_x + F_y\frac{dy}{dx} = 0,

and so

dydx=FxFy.\boxed{\frac{dy}{dx} = -\frac{F_x}{F_y}.}

This formula is not separate from the chain rule. It is the chain rule applied to an implicit relation.


8. Example: x2+y2=1x^2 + y^2 = 1

Let

F(x,y)=x2+y21.F(x,y) = x^2 + y^2 - 1.

Then

Fx=2x,Fy=2y,F_x = 2x, \qquad F_y = 2y,

so the implicit derivative is

dydx=FxFy=xy.\frac{dy}{dx} = -\frac{F_x}{F_y} = -\frac{x}{y}.

On the upper-right part of the circle, use the point

(22,22).\left(\frac{\sqrt2}{2},\frac{\sqrt2}{2}\right).

At that point the slope is

dydx=1.\frac{dy}{dx} = -1.

So the tangent line is

y22=(x22),y - \frac{\sqrt2}{2} = -\left(x - \frac{\sqrt2}{2}\right),

or equivalently

y=x+2.y = -x + \sqrt2.

Unit circle with the tangent line y = -x + sqrt(2) at t = pi/4

Now compare this with the parametric curve

x=cost,y=sint.x=\cos t, \qquad y=\sin t.

The derivative from Calculus of the Curves is

dydx=dy/dtdx/dt=costsint=cott.\frac{dy}{dx} = \frac{dy/dt}{dx/dt} = \frac{\cos t}{-\sin t} = -\cot t.

At t=π/4t=\pi/4, this gives

dydx=1,\frac{dy}{dx} = -1,

which matches the implicit derivative. The two viewpoints are the same calculation written in different languages.


9. Exercises

Let

f(x,y)=x2y+y3,x=t2,y=sint.f(x,y) = x^2y + y^3, \qquad x=t^2, \qquad y=\sin t.

Compute ddtf(x(t),y(t))\dfrac{d}{dt}f(x(t),y(t)).

Answer:

First compute the partial derivatives:

fx=2xy,fy=x2+3y2.f_x = 2xy, \qquad f_y = x^2 + 3y^2.

Also,

x(t)=2t,y(t)=cost.x'(t)=2t, \qquad y'(t)=\cos t.

So

ddtf(x(t),y(t))=2xy(2t)+(x2+3y2)cost,\frac{d}{dt}f(x(t),y(t)) = 2xy(2t) + (x^2 + 3y^2)\cos t,

and substituting x=t2x=t^2, y=sinty=\sin t gives

4t3sint+(t4+3sin2t)cost.4t^3\sin t + (t^4 + 3\sin^2 t)\cos t.

Let

f(x,y)=x2+y2.f(x,y)=x^2+y^2.

Find frf_r and fθf_\theta after the substitution x=rcosθx=r\cos\theta, y=rsinθy=r\sin\theta.

Answer:

Using fx=2xf_x=2x and fy=2yf_y=2y,

fr=2xcosθ+2ysinθ=2r,f_r = 2x\cos\theta + 2y\sin\theta = 2r,

and

fθ=2rxsinθ+2rycosθ=0.f_\theta = -2rx\sin\theta + 2ry\cos\theta = 0.

So this function depends only on rr, not on θ\theta.

Differentiate the curve

x2+xy+y2=3x^2 + xy + y^2 = 3

implicitly and solve for dydx\dfrac{dy}{dx}.

Answer:

Let F(x,y)=x2+xy+y23F(x,y)=x^2+xy+y^2-3. Then

Fx=2x+y,Fy=x+2y.F_x = 2x+y, \qquad F_y = x+2y.

Therefore

dydx=2x+yx+2y.\frac{dy}{dx} = -\frac{2x+y}{x+2y}.

Use the parameterization x=costx=\cos t, y=sinty=\sin t to find the tangent line at t=π/4t=\pi/4.

Answer:

At t=π/4t=\pi/4 the point is (2/2,2/2)(\sqrt2/2,\sqrt2/2). The slope is

dydx=costsint=1.\frac{dy}{dx} = \frac{\cos t}{-\sin t} = -1.

Thus the tangent line is

y=x+2.y = -x + \sqrt2.

Note: This exercise uses Jacobian matrices and requires a linear algebra background.

Let g:R2R2g:\mathbb R^2\to\mathbb R^2 and f:R2Rf:\mathbb R^2\to\mathbb R be differentiable. Explain why the derivative of fgf\circ g is a product of a gradient row vector and a Jacobian matrix.

Answer:

If g(s,t)=(x(s,t),y(s,t))g(s,t)=(x(s,t),y(s,t)), then

Dg=(xsxtysyt).Dg = \begin{pmatrix}x_s & x_t \\ y_s & y_t\end{pmatrix}.

If f=f(x,y)f=f(x,y), then

Df=(fxfy).Df = \begin{pmatrix}f_x & f_y\end{pmatrix}.

So the derivative of the composition is

D(fg)=Df(g)Dg,D(f\circ g)=Df(g)\,Dg,

which expands to the two partial chain rule formulas from section 3.