152. Differentiation of functions of several variables.

So far we have been concerned exclusively with functions of a single variable \(x\), but there is nothing to prevent us applying the notion of differentiation to functions of several variables \(x\), \(y\), ….

Suppose then that \(f(x, y)\) is a function of two1 real variables \(x\) and \(y\), and that the limits \[\lim_{h\to 0}\frac{f(x + h, y) – f(x, y)}{h},\quad \lim_{k\to 0}\frac{f(x, y + k) – f(x, y)}{k}\] exist for all values of \(x\) and \(y\) in question, that is to say that \(f(x, y)\) possesses a derivative \(df/dx\) or \(D_{x}f(x, y)\) with respect to \(x\) and a derivative \(df/dy\) or \(D_{y}f(x, y)\) with respect to \(y\). It is usual to call these derivatives the partial differential coefficients of \(f\), and to denote them by \[\frac{\partial f}{\partial x},\quad \frac{\partial f}{\partial y}\] or \[f_{x}'(x, y),\quad f_{y}'(x, y)\] or simply \(f_{x}’\)\(f_{y}’\) or \(f_{x}\)\(f_{y}\). The reader must not suppose, however, that these new notations imply any essential novelty of idea: ‘partial differentiation’ with respect to \(x\) is exactly the same process as ordinary differentiation, the only novelty lying in the presence in \(f\) of a second variable \(y\) independent of \(x\).

In what precedes we have supposed \(x\) and \(y\) to be two real variables entirely independent of one another. If \(x\) and \(y\) were connected by a relation the state of affairs would be very different. In this case our definition of \(f_{x}’\) would fail entirely, as we could not change \(x\) into \(x + h\) without at the same time changing \(y\). But then \(f(x, y)\) would not really be a function of two variables at all. A function of two variables, as we defined it in Ch. II, is essentially a function of two independent variables. If \(y\) depends on \(x\), \(y\) is a function of \(x\), say \(y = \phi(x)\); and then \[f(x, y) = f\{x, \phi(x)\}\] is really a function of the single variable \(x\). Of course we may also represent it as a function of the single variable \(y\). Or, as is often most convenient, we may regard \(x\) and \(y\) as functions of a third variable \(t\), and then \(f(x, y)\), which is of the form \(f\{\phi(t), \psi(t)\}\), is a function of the single variable \(t\).

Example LX

1. Prove that if \(x = r\cos\theta\), \(y = r\sin\theta\), so that \(r = \sqrt{x^{2} + y^{2}}\), \(\theta = \arctan(y/x)\), then \[\begin{aligned} \frac{\partial r}{\partial x} &= \frac{x}{\sqrt{x^{2} + y^{2}}}, &\frac{\partial r}{\partial y} &= \frac{y}{\sqrt{x^{2} + y^{2}}}, &\frac{\partial \theta}{\partial x} &= -\frac{y}{x^{2} + y^{2}}, &\frac{\partial \theta}{\partial y} &= \frac{x}{x^{2} + y^{2}},\\  \frac{\partial x}{\partial r} &= \cos\theta, &\frac{\partial y}{\partial r} &= \sin\theta, &\frac{\partial x}{\partial \theta} &= -r\sin\theta, &\frac{\partial y}{\partial \theta} &= r\cos\theta.\end{aligned}\]

2. Account for the fact that \(\dfrac{\partial r}{\partial x}\neq 1\bigg/\biggl(\dfrac{\partial x}{\partial r}\biggr)\) and \(\dfrac{\partial \theta}{\partial x}\neq 1\bigg/\biggl(\dfrac{\partial x}{\partial \theta}\biggr)\). [When we were considering a function \(y\) of one variable \(x\) it followed from the definitions that \(dy/dx\) and \(dx/dy\) were reciprocals. This is no longer the case when we are dealing with functions of two variables. Let \(P\) (Fig. 46) be the point \((x, y)\) or \((r, \theta)\). To find \(\partial r/\partial x\) we must increase \(x\), say by an increment \(MM_{1} = \delta x\), while keeping \(y\) constant. This brings \(P\) to \(P_{1}\). If along \(OP_{1}\) we take \(OP’ = OP\), the increment of \(r\) is \(P’P_{1} = \delta r\), say; and \(\partial r/\partial x = \lim(\delta r/\delta x)\). If on the other hand we want to calculate \(\partial x/\partial r\), \(x\) and \(y\) being now regarded as functions of \(r\) and \(\theta\), we must increase \(r\) by \(\Delta r\), say, keeping \(\theta\) constant. This brings \(P\) to \(P_{2}\), where \(PP_{2} = \Delta r\): the corresponding increment of \(x\) is \(MM_{1} = \Delta x\), say; and \[\partial x/\partial r = \lim(\Delta x/\Delta r).\] Now \(\Delta x = \delta x\):2 but \(\Delta r \neq \delta r\). Indeed it is easy to see from the figure that \[\lim (\delta r/\delta x) = \lim (P’P_{1}/PP_{1}) = \cos\theta,\] but \[\lim (\Delta r/\Delta x) = \lim (PP_{2}/PP_{1}) = \sec\theta,\] so that \[\lim (\delta r/\Delta r) = \cos^{2}\theta.\]

The fact is of course that \(\partial x/\partial r\) and \(\partial r/\partial x\) are not formed upon the same hypothesis as to the variation of \(P\).]

3. Prove that if \(z = f(ax + by)\) then \(b(\partial z/\partial x) = a(\partial z/\partial y)\).

4. Find \(\partial X/\partial x\), \(\partial X/\partial y\), … when \(X + Y = x\), \(Y = xy\). Express \(x\)\(y\) as functions of \(X\)\(Y\) and find \(\partial x/\partial X\), \(\partial x/\partial Y\), ….

5. Find \(\partial X/\partial x\), … when \(X + Y + Z = x\), \(Y + Z = xy\), \(Z = xyz\); express \(x\)\(y\)\(z\) in terms of \(X\)\(Y\)\(Z\) and find \(\partial x/\partial X\), ….

[There is of course no difficulty in extending the ideas of the last section to functions of any number of variables. But the reader must be careful to impress on his mind that the notion of the partial derivative of a function of several variables is only determinate when all the independent variables are specified. Thus if \(u = x + y + z\), \(x\)\(y\), and \(z\) being the independent variables, then \(\partial u/\partial x = 1\). But if we regard \(u\) as a function of the variables \(x\), \(x + y = \eta\), and \(x + y + z = \zeta\), so that \(u = \zeta\), then \(\partial u/\partial x = 0\).]

 

153. Differentiation of a function of two functions.

There is a theorem concerning the differentiation of a function of one variable, known generally as the Theorem of the Total Differential Coefficient, which is of very great importance and depends on the notions explained in the preceding section regarding functions of two variables. This theorem gives us a rule for differentiating \[f\{\phi(t), \psi(t)\},\] with respect to \(t\).

Let us suppose, in the first instance, that \(f(x, y)\) is a function of the two variables \(x\) and \(y\), and that \(f_{x}’\)\(f_{y}’\) are continuous functions of both variables (§ 107) for all of their values which come in question. And now let us suppose that the variation of \(x\) and \(y\) is restricted in that \((x, y)\) lies on a curve \[x = \phi(t),\quad y = \psi(t),\] where \(\phi\) and \(\psi\) are functions of \(t\) with continuous differential coefficients \(\phi'(t)\)\(\psi’ (t)\). Then \(f(x, y)\) reduces to a function of the single variable \(t\), say \(F(t)\). The problem is to determine \(F'(t)\).

Suppose that, when \(t\) changes to \(t + \tau\), \(x\) and \(y\) change to \(x + \xi\) and \(y + \eta\). Then by definition \[\begin{aligned} \frac{dF(t)}{dt} &= \lim_{\tau\to 0} \frac{1}{\tau}[f\{\phi(t + \tau), \psi(t + \tau)\} – f\{\phi(t), \psi(t)\}]\\ &= \lim \frac{1}{\tau}\{f(x + \xi, y + \eta) – f(x, y)\} \\ &= \lim \left[ \frac{f(x + \xi, y + \eta) – f(x, y + \eta)}{\xi}\, \frac{\xi}{\tau} + \frac{f(x, y + \eta) – f(x, y)}{\eta}\, \frac{\eta}{\tau} \right].\end{aligned}\]

But, by the Mean Value Theorem, \[\begin{aligned} \{f(x + \xi, y + \eta) – f (x, y + \eta)\}/\xi &= f_{x}'(x + \theta\xi, y + \eta),\\ \{f(x, y + \eta) – f(x, y)\}/\eta &= f_{y}'(x, y + \theta’\eta),\end{aligned}\] where \(\theta\) and \(\theta’\) each lie between \(0\) and \(1\). As \(\tau \to 0\), \(\xi \to 0\) and \(\eta \to 0\), and \(\xi/\tau \to \phi'(t)\), \(\eta/\tau \to \psi'(t)\): also \[f_{x}'(x + \theta\xi, y + \eta) \to f_{x}'(x, y),\quad f_{y}'(x, y + \theta’\eta) \to f_{y}'(x, y).\] Hence \[F'(t) = D_{t}f \{\phi(t), \psi(t)\} = f_{x}'(x, y)\phi'(t) + f_{y}'(x, y)\psi'(t),\] where we are to put \(x = \phi(t)\), \(y = \psi(t)\) after carrying out the differentiations with respect to \(x\) and \(y\). This result may also be expressed in the form \[\frac{df}{dt} = \frac{\partial f}{\partial x}\, \frac{dx}{dt} + \frac{\partial f}{\partial y}\, \frac{dy}{dt}.\]

Example LXI

1. Suppose \(\phi(t) = (1 – t^{2})/(1 + t^{2})\), \(\psi(t) = 2t/(1 + t^{2})\), so that the locus of \((x, y)\) is the circle \(x^{2} + y^{2} = 1\). Then \[\begin{aligned} \phi'(t) &= -4t/(1 + t^{2})^{2},\quad \psi'(t) = 2(1 – t^{2})/(1 + t^{2})^{2},\\ F'(t) &= \{-4t/(1 + t^{2})^{2}\}f_{x}’ + \{2(1 – t^{2})/(1 + t^{2})^{2}\}f_{y}’,\end{aligned}\] where \(x\) and \(y\) are to be put equal to \((1 – t^{2})/(1 + t^{2})\) and \(2t/(1 + t^{2})\) after carrying out the differentiations.

We can easily verify this formula in particular cases. Suppose, e.g., that \(f(x, y) = x^{2} + y^{2}\). Then \(f_{x}’ = 2x\), \(f_{y}’ = 2y\), and it is easily verified that \(F'(t) = 2x\phi'(t) + 2y\psi'(t) = 0\), which is obviously correct, since \(F(t) = 1\).

2. Verify the theorem in the same way when (a) \(x = t^{m}\), \(y = 1 – t^{m}\), \(f(x, y) = x + y\); (b) \(x = a\cos t\), \(y = a\sin t\), \(f(x, y) = x^{2} + y^{2}\).

3. One of the most important cases is that in which \(t\) is \(x\) itself. We then obtain \[D_{x}f\{x, \psi(x)\} = D_{x}f(x, y) + D_{y}f(x, y)\psi'(x).\] where \(y\) is to be replaced by \(\psi(x)\) after differentiation.

It was this case which led to the introduction of the notation \(\partial f/\partial x\), \(\partial f/\partial y\). For it would seem natural to use the notation \(df/dx\) for either of the functions \(D_{x}f\{x, \psi(x)\}\) and \(D_{x}f(x, y)\), in one of which \(y\) is put equal to \(\psi(x)\) before and in the other after differentiation. Suppose for example that \(y = 1 – x\) and \(f(x, y) = x + y\). Then \(D_{x}f(x, 1 – x) = D_{x}1 = 0\), but \(D_{x}f(x, y) = 1\).

The distinction between the two functions is adequately shown by denoting the first by \(df/dx\) and the second by \(\partial f/\partial x\), in which case the theorem takes the form \[\frac{df}{dx} = \frac{\partial f}{\partial x} + \frac{\partial f}{\partial y}\, \frac{dy}{dx};\] though this notation is also open to objection, in that it is a little misleading to denote the functions \(f\{x, \psi(x)\}\) and \(f(x, y)\), whose forms as functions of \(x\) are quite different from one another, by the same letter \(f\) in \(df/dx\) and \(\partial f/\partial x\).

4. If the result of eliminating \(t\) between \(x = \phi(t)\), \(y = \psi(t)\) is \(f(x, y) = 0\), then \[\frac{\partial f}{\partial x}\, \frac{dx}{dt} + \frac{\partial f}{\partial y}\, \frac{dy}{dt} = 0.\]

5. If \(x\) and \(y\) are functions of \(t\), and \(r\) and \(\theta\) are the polar coordinates of \((x, y)\), then \(r’ = (xx’ + yy’)/r\), \(\theta’ = (xy’ – yx’)/r^{2}\), dashes denoting differentiations with respect to \(t\).

 

154. The Mean Value Theorem for functions of two variables.

Many of the results of the last chapter depended upon the Mean Value Theorem, expressed by the equation \[\phi(x + h) – \phi(x) = hf'(x + \theta h),\] or as it may be written, if \(y = \phi(x)\), \[\delta y = f'(x + \theta\, \delta x)\, \delta x.\]

Now suppose that \(z = f(x, y)\) is a function of the two independent variables \(x\) and \(y\), and that \(x\) and \(y\) receive increments \(h\)\(k\) or \(\delta x\)\(\delta y\) respectively: and let us attempt to express the corresponding increment of \(z\), viz. \[\delta z = f(x + h, y + k) – f(x, y),\] in terms of \(h\)\(k\) and the derivatives of \(z\) with respect to \(x\) and \(y\).

Let \(f(x + ht, y + kt) = F(t)\). Then \[f(x + h, y + k) – f(x, y) = F(1) – F(0) = F'(\theta),\] where \(0 < \theta < 1\). But, by § 153, \[\begin{aligned} F’ (t) &= D_{t} f(x + ht, y + kt)\\ &= hf_{x}'(x + ht, y + kt) + kf_{y}'(x + ht, y + kt).\end{aligned}\] Hence finally \[\delta z = f(x + h, y + k) – f(x, y) = hf_{x}'(x + \theta h, y + \theta k) + kf_{y}'(x + \theta h, y + \theta k),\] which is the formula desired. Since \(f_{x}’\)\(f_{y}’\) are supposed to be continuous functions of \(x\) and \(y\), we have \[\begin{aligned} f_{x}'(x + \theta h, y + \theta k) &= f_{x}'(x, y) + \epsilon_{h, k},\\ f_{y}'(x + \theta h, y + \theta k) &= f_{y}'(x, y) + \eta_{h, k},\end{aligned}\] where \(\epsilon_{h, k}\) and \(\eta_{h, k}\) tend to zero as \(h\) and \(k\) tend to zero. Hence the theorem may be written in the form \[\begin{equation*} \delta z = (f_{x}’ + \epsilon)\, \delta x + (f_{y}’ + \eta)\, \delta y, \tag{1} \end{equation*}\] where \(\epsilon\) and \(\eta\) are small when \(\delta x\) and \(\delta y\) are small.

The result embodied in (1) may be expressed by saying that the equation \[\delta z = f_{x}’\, \delta x + f_{y}’\, \delta y\] is approximately true; i.e. that the difference between the two sides of the equation is small in comparison with the larger of \(\delta x\) and \(\delta y\).3 We must say ‘the larger of \(\delta x\) and \(\delta y\)’ because one of them might be small in comparison with the other; we might indeed have \(\delta x = 0\) or \(\delta y = 0\).

It should be observed that if any equation of the form \(\delta z = \lambda\, \delta x + \mu\, \delta y\) is ‘approximately true’ in this sense, we must have \(\lambda = f_{x}’\), \(\mu = f_{y}’\). For we have \[\delta z – f_{x}’\, \delta x – f_{y}’\, \delta y = \epsilon\, \delta x + \eta\, \delta y,\quad \delta z – \lambda\, \delta x – \mu\, \delta y = \epsilon’\, \delta x + \eta’\, \delta y\] where \(\epsilon\)\(\eta\), \(\epsilon’\)\(\eta’\) all tend to zero as \(\delta x\) and \(\delta y\) tend to zero; and so \[(\lambda – f_{x}’)\, \delta x + (\mu – f_{y}’)\, \delta y = \rho\, \delta x + \rho’\, \delta y\] where \(\rho\) and \(\rho’\) tend to zero. Hence, if \(\zeta\) is any assigned positive number, we can choose \(\sigma\) so that \[|(\lambda – f_{x}’)\, \delta x + (\mu – f_{y}’)\, \delta y| < \zeta(|\delta x| + |\delta y|)\] for all values of \(\delta x\) and \(\delta y\) numerically less than \(\sigma\). Taking \(\delta y = 0\) we obtain \(|(\lambda – f_{x}’)\, \delta x| < \zeta|\delta x|\), or \(|\lambda – f_{x}’| < \zeta\), and, as \(\zeta\) may be as small as we please, this can only be the case if \(\lambda = f_{x}’\). Similarly \(\mu = f_{y}’\).


  1. The new points which arise when we consider functions of several variables are illustrated sufficiently when there are two variables only. The generalisations of our theorems for three or more variables are in general of an obvious character.↩︎
  2. Of course the fact that \(\Delta x = \delta x\) is due merely to the particular value of \(\Delta r\) that we have chosen (viz. \(PP_{2}\)). Any other choice would give us values of \(\Delta x\)\(\Delta r\) proportional to those used here.↩︎
  3. Or with \(|\delta x| + |\delta y|\) or \(\sqrt{\delta x^{2} + \delta y^{2}}\).↩︎

$\leftarrow$ 151. The contact of plane curves Main Page 155. Differentials $\rightarrow$