152. Differentiation of functions of several variables.
So far we have been concerned exclusively with functions of a single variable \(x\), but there is nothing to prevent us applying the notion of differentiation to functions of several variables \(x\), \(y\), ….
Suppose then that \(f(x, y)\) is a function of two1 real variables \(x\) and \(y\), and that the limits \[\lim_{h\to 0}\frac{f(x + h, y) – f(x, y)}{h},\quad \lim_{k\to 0}\frac{f(x, y + k) – f(x, y)}{k}\] exist for all values of \(x\) and \(y\) in question, that is to say that \(f(x, y)\) possesses a derivative \(df/dx\) or \(D_{x}f(x, y)\) with respect to \(x\) and a derivative \(df/dy\) or \(D_{y}f(x, y)\) with respect to \(y\). It is usual to call these derivatives the partial differential coefficients of \(f\), and to denote them by \[\frac{\partial f}{\partial x},\quad \frac{\partial f}{\partial y}\] or \[f_{x}'(x, y),\quad f_{y}'(x, y)\] or simply \(f_{x}’\), \(f_{y}’\) or \(f_{x}\), \(f_{y}\). The reader must not suppose, however, that these new notations imply any essential novelty of idea: ‘partial differentiation’ with respect to \(x\) is exactly the same process as ordinary differentiation, the only novelty lying in the presence in \(f\) of a second variable \(y\) independent of \(x\).
In what precedes we have supposed \(x\) and \(y\) to be two real variables entirely independent of one another. If \(x\) and \(y\) were connected by a relation the state of affairs would be very different. In this case our definition of \(f_{x}’\) would fail entirely, as we could not change \(x\) into \(x + h\) without at the same time changing \(y\). But then \(f(x, y)\) would not really be a function of two variables at all. A function of two variables, as we defined it in Ch. II, is essentially a function of two independent variables. If \(y\) depends on \(x\), \(y\) is a function of \(x\), say \(y = \phi(x)\); and then \[f(x, y) = f\{x, \phi(x)\}\] is really a function of the single variable \(x\). Of course we may also represent it as a function of the single variable \(y\). Or, as is often most convenient, we may regard \(x\) and \(y\) as functions of a third variable \(t\), and then \(f(x, y)\), which is of the form \(f\{\phi(t), \psi(t)\}\), is a function of the single variable \(t\).
153. Differentiation of a function of two functions.
There is a theorem concerning the differentiation of a function of one variable, known generally as the Theorem of the Total Differential Coefficient, which is of very great importance and depends on the notions explained in the preceding section regarding functions of two variables. This theorem gives us a rule for differentiating \[f\{\phi(t), \psi(t)\},\] with respect to \(t\).
Let us suppose, in the first instance, that \(f(x, y)\) is a function of the two variables \(x\) and \(y\), and that \(f_{x}’\), \(f_{y}’\) are continuous functions of both variables (§ 107) for all of their values which come in question. And now let us suppose that the variation of \(x\) and \(y\) is restricted in that \((x, y)\) lies on a curve \[x = \phi(t),\quad y = \psi(t),\] where \(\phi\) and \(\psi\) are functions of \(t\) with continuous differential coefficients \(\phi'(t)\), \(\psi’ (t)\). Then \(f(x, y)\) reduces to a function of the single variable \(t\), say \(F(t)\). The problem is to determine \(F'(t)\).
Suppose that, when \(t\) changes to \(t + \tau\), \(x\) and \(y\) change to \(x + \xi\) and \(y + \eta\). Then by definition \[\begin{aligned} \frac{dF(t)}{dt} &= \lim_{\tau\to 0} \frac{1}{\tau}[f\{\phi(t + \tau), \psi(t + \tau)\} – f\{\phi(t), \psi(t)\}]\\ &= \lim \frac{1}{\tau}\{f(x + \xi, y + \eta) – f(x, y)\} \\ &= \lim \left[ \frac{f(x + \xi, y + \eta) – f(x, y + \eta)}{\xi}\, \frac{\xi}{\tau} + \frac{f(x, y + \eta) – f(x, y)}{\eta}\, \frac{\eta}{\tau} \right].\end{aligned}\]
But, by the Mean Value Theorem, \[\begin{aligned} \{f(x + \xi, y + \eta) – f (x, y + \eta)\}/\xi &= f_{x}'(x + \theta\xi, y + \eta),\\ \{f(x, y + \eta) – f(x, y)\}/\eta &= f_{y}'(x, y + \theta’\eta),\end{aligned}\] where \(\theta\) and \(\theta’\) each lie between \(0\) and \(1\). As \(\tau \to 0\), \(\xi \to 0\) and \(\eta \to 0\), and \(\xi/\tau \to \phi'(t)\), \(\eta/\tau \to \psi'(t)\): also \[f_{x}'(x + \theta\xi, y + \eta) \to f_{x}'(x, y),\quad f_{y}'(x, y + \theta’\eta) \to f_{y}'(x, y).\] Hence \[F'(t) = D_{t}f \{\phi(t), \psi(t)\} = f_{x}'(x, y)\phi'(t) + f_{y}'(x, y)\psi'(t),\] where we are to put \(x = \phi(t)\), \(y = \psi(t)\) after carrying out the differentiations with respect to \(x\) and \(y\). This result may also be expressed in the form \[\frac{df}{dt} = \frac{\partial f}{\partial x}\, \frac{dx}{dt} + \frac{\partial f}{\partial y}\, \frac{dy}{dt}.\]
154. The Mean Value Theorem for functions of two variables.
Many of the results of the last chapter depended upon the Mean Value Theorem, expressed by the equation \[\phi(x + h) – \phi(x) = hf'(x + \theta h),\] or as it may be written, if \(y = \phi(x)\), \[\delta y = f'(x + \theta\, \delta x)\, \delta x.\]
Now suppose that \(z = f(x, y)\) is a function of the two independent variables \(x\) and \(y\), and that \(x\) and \(y\) receive increments \(h\), \(k\) or \(\delta x\), \(\delta y\) respectively: and let us attempt to express the corresponding increment of \(z\), viz. \[\delta z = f(x + h, y + k) – f(x, y),\] in terms of \(h\), \(k\) and the derivatives of \(z\) with respect to \(x\) and \(y\).
Let \(f(x + ht, y + kt) = F(t)\). Then \[f(x + h, y + k) – f(x, y) = F(1) – F(0) = F'(\theta),\] where \(0 < \theta < 1\). But, by § 153, \[\begin{aligned} F’ (t) &= D_{t} f(x + ht, y + kt)\\ &= hf_{x}'(x + ht, y + kt) + kf_{y}'(x + ht, y + kt).\end{aligned}\] Hence finally \[\delta z = f(x + h, y + k) – f(x, y) = hf_{x}'(x + \theta h, y + \theta k) + kf_{y}'(x + \theta h, y + \theta k),\] which is the formula desired. Since \(f_{x}’\), \(f_{y}’\) are supposed to be continuous functions of \(x\) and \(y\), we have \[\begin{aligned} f_{x}'(x + \theta h, y + \theta k) &= f_{x}'(x, y) + \epsilon_{h, k},\\ f_{y}'(x + \theta h, y + \theta k) &= f_{y}'(x, y) + \eta_{h, k},\end{aligned}\] where \(\epsilon_{h, k}\) and \(\eta_{h, k}\) tend to zero as \(h\) and \(k\) tend to zero. Hence the theorem may be written in the form \[\begin{equation*} \delta z = (f_{x}’ + \epsilon)\, \delta x + (f_{y}’ + \eta)\, \delta y, \tag{1} \end{equation*}\] where \(\epsilon\) and \(\eta\) are small when \(\delta x\) and \(\delta y\) are small.
The result embodied in (1) may be expressed by saying that the equation \[\delta z = f_{x}’\, \delta x + f_{y}’\, \delta y\] is approximately true; i.e. that the difference between the two sides of the equation is small in comparison with the larger of \(\delta x\) and \(\delta y\).3 We must say ‘the larger of \(\delta x\) and \(\delta y\)’ because one of them might be small in comparison with the other; we might indeed have \(\delta x = 0\) or \(\delta y = 0\).
It should be observed that if any equation of the form \(\delta z = \lambda\, \delta x + \mu\, \delta y\) is ‘approximately true’ in this sense, we must have \(\lambda = f_{x}’\), \(\mu = f_{y}’\). For we have \[\delta z – f_{x}’\, \delta x – f_{y}’\, \delta y = \epsilon\, \delta x + \eta\, \delta y,\quad \delta z – \lambda\, \delta x – \mu\, \delta y = \epsilon’\, \delta x + \eta’\, \delta y\] where \(\epsilon\), \(\eta\), \(\epsilon’\), \(\eta’\) all tend to zero as \(\delta x\) and \(\delta y\) tend to zero; and so \[(\lambda – f_{x}’)\, \delta x + (\mu – f_{y}’)\, \delta y = \rho\, \delta x + \rho’\, \delta y\] where \(\rho\) and \(\rho’\) tend to zero. Hence, if \(\zeta\) is any assigned positive number, we can choose \(\sigma\) so that \[|(\lambda – f_{x}’)\, \delta x + (\mu – f_{y}’)\, \delta y| < \zeta(|\delta x| + |\delta y|)\] for all values of \(\delta x\) and \(\delta y\) numerically less than \(\sigma\). Taking \(\delta y = 0\) we obtain \(|(\lambda – f_{x}’)\, \delta x| < \zeta|\delta x|\), or \(|\lambda – f_{x}’| < \zeta\), and, as \(\zeta\) may be as small as we please, this can only be the case if \(\lambda = f_{x}’\). Similarly \(\mu = f_{y}’\).
- The new points which arise when we consider functions of several variables are illustrated sufficiently when there are two variables only. The generalisations of our theorems for three or more variables are in general of an obvious character.↩︎
- Of course the fact that \(\Delta x = \delta x\) is due merely to the particular value of \(\Delta r\) that we have chosen (viz. \(PP_{2}\)). Any other choice would give us values of \(\Delta x\), \(\Delta r\) proportional to those used here.↩︎
- Or with \(|\delta x| + |\delta y|\) or \(\sqrt{\delta x^{2} + \delta y^{2}}\).↩︎
$\leftarrow$ 151. The contact of plane curves | Main Page | 155. Differentials $\rightarrow$ |