Directional Derivatives

We have learned that the rates of change of a function \(f\) along the coordinate axes are its first partial derivatives. Now we want to generalize this concept and find the rate of change of \(f\) along any arbitrary direction.

Consider a function \(z=f(x,y)\) and a unit vector \(\mathbf{v}=v_1\mathbf{i}+v_2\mathbf{j}\). Let’s approach to the point \((x_0,y_0)\) along a ray that is parallel to \(\mathbf{v}\). Then the rate of change of \(f\) at the point \((x_0,y_0)\) with respect to distance is called the directional derivative of \(f\) at \((x_0,y_0)\) in the direction \(\mathbf{v}\).

For a geometrical interpretation, consider the point \(P=(x_0,y_0,f(x_0,y_0))\) on the surface \(z=f(x,y)\) and the points \(Q=(x_0,y_0,0)\) and \(R=(x_0+tv_1,y_0+tv_2,0)\) in the \(xy\)-plane, as illustrated in in Fig. 1. Note that \(\overrightarrow{QR}\) is parallel to \(\mathbf{v}\). The plane \(\Omega\) through \(Q\) and \(R\) and perpendicular to the \(xy\)-plane1 intersects the surface \(z=f(x,u)\) in the curve \(C\). The slope of the tangent line to the curve \(C\) at the point \(P\) in the plane \(\Omega\) is the directional derivative of \(f\) in the direction \(\mathbf{v}\), and is denoted by \(D_{\mathbf{v}} f(x_0,y_0)\).

Figure 1.

 

Definition 1. The directional derivative of a function \(f\) at \((x_0,y_0)\) in the direction of \(\mathbf{v}\), denoted by \(D_{\mathbf{v}} f(x_0,y_0)\) is given by:
\[D_{\mathbf{v}}f(x_0,y_0)=\lim_{t\to 0}\frac{f(x_0+t v_1,y_0+t v_2)-f(x_0,y_0)}{t},\]
whenever the limit on the right hand side exists.

  • The above definition is also meaningful when \(\mathbf{v}\) is not a unit vector. However, when \(|\mathbf{v}|\neq 0\), then \(D_{\mathbf{v}}f(x_0,y_0)\) is NOT equal to the rate of change of \(f\) in the direction \(\mathbf{v}\). The rate of change of \(f\) in the direction \(\mathbf{v}\) or the slope of tangent line to the curve \(C\) at \((x_0,y_0)\) is equal to \(D_{\mathbf{v}/|\mathbf{v}|}f(x_0,y_0)\).

The rate of change of \(f\) at \((x_0,y_0)\) along an arbitrary line parallel to \(\mathbf{v}\) (\(\mathbf{v}\neq\mathbf{0}\)) is equal to: \[D_{\mathbf{v}/|\mathbf{v}|}f(x_0,y_0)\]

  • If \(\mathbf{v}=\mathbf{0}\), Definition 1 gives:
    \[D_{\mathbf{0}}f(x,y)=\lim_{t\to 0}\frac{f(x+t\times 0 ,y+t\times 0)-f(x,y)}{t}=0,\]
    for every \((x,y)\) in the domain of \(f\).
  • If \(\mathbf{v}=\mathbf{i}\), we have from the above definition
    \[D_{\mathbf{i}}f(x_0,y_0)=\lim_{t\to 0}\frac{f(x_0+t ,y_0)-f(x_0,y_0)}{t},\]
    which is clearly the partial derivative of \(f\) with respect to \(x\), \(f_x(x_0,y_0)=\frac{\partial f}{\partial x}(x_0,y_0)\).Similarly, if \(\mathbf{v}=\mathbf{j}\), the directional derivative of \(f\) in the direction of \(\mathbf{j}\) is the partial derivative of \(f\) with respect to \(y\). So
    \[\begin{align} &D_{\mathbf{i}} f=f_x=\frac{\partial f}{\partial x},\\ &D_{\mathbf{j}} f=f_y=\frac{\partial f}{\partial y}.\end{align}\]
Example 1

Given \(f(x,y)=x^2-2y^2\) and \(\mathbf{v}=\mathbf{i}-3\mathbf{j}\), find \(D_{\mathbf{v}}f(x,y)\).

Solution

It follows from Definition 1 that
\[\begin{align} D_{\mathbf{v}}f(x,y)&=\lim_{t\to 0}\frac{f(x+t,y-3t)-f(x,y)}{t}\\ &=\lim_{t\to 0}\frac{(x+t)^2-2(y-3t)^2-(x^2-2y^2)}{t}\\ &=\lim_{t\to 0}\frac{x^2+2xt+t^2-2(y^2-6yt+9t^2)-x^2+2y^2}{t}\\ &=\lim_{t\to 0}\frac{2xt-12yt-17t^2}{t}=2x-12y\end{align}\]

  • If \(g(t)=f(x_0+tv_1,y_0+tv_2)\), then \(g'(0)=D_{\mathbf{v}}f(x_0,y_0)\).
    To prove, we use the definition of \(g'(0)\):
    \[\begin{align} g'(0)&=\lim_{t\to 0}\frac{g(t)-g(0)}{t}\\
    &=\lim_{t\to 0}\frac{f(x_0+tv_1,y_0+tv_2)-f(x,y)}{t}\\
    &=D_{\mathbf{v}}f(x_0,y_0).\end{align}\]
    Also we note \(g(t)=f(x(t),y(t))\) where \[x(t)=x_0+t\,v_1,\quad\text{and}\quad y(t)=y_0+t\ v_2.\] To find \(g'(t)\) we can use the chain rule:
    \[\begin{align} \frac{dg}{dt}&=\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}\\ &=\frac{\partial f}{\partial x}\ v_1+\frac{\partial f}{\partial y}\ v_2.\end{align}\]
    When \(t=0\), we have \(x(0)=x_0\) and \(y(0)=y_0\). Thus:
    \[\begin{align} g'(0)&=\left.\frac{dg}{dt}\right|_{t=0}=\left.\frac{\partial f}{\partial x}\right|_{(x_0,y_0)} v_1+\left.\frac{\partial f}{\partial y}\right|_{(x_0,y_0)}v_2\\ &=\frac{\partial f}{\partial x}(x_0,y_0)\ v_1+\frac{\partial f}{\partial y}(x_0,y_0)\ v_2.\end{align}\]
    Therefore, we could prove the following theorem.

Theorem 1. If\(f\) is a differentiable function at \((x_0,y_0)\), and \(\mathbf{v}=v_1\mathbf{i}+v_2\mathbf{j}\), then
\[D_{\mathbf{v}}f(x_0,y_0)=\frac{\partial f}{\partial x}(x_0,y_0)\ v_1+\frac{\partial f}{\partial y}(x_0,y_0) v_2.\]

Example 2

Given \(f(x,y)=x^2-2y^2\) and \(\mathbf{v}=\mathbf{i}-3\mathbf{j}\), find \(D_{\mathbf{v}}f(x,y)\) using Theorem 1.

Solution

First we need to calculate \(\frac{\partial f}{\partial x}\) and \(\frac{\partial f}{\partial y}\).
\[\frac{\partial f}{\partial x}=2x,\quad \frac{\partial f}{\partial y}=-4y.\]
Therefore:
\[\begin{align}D_{\mathbf{v}}f(x,y)&=\underbrace{\frac{\partial f}{\partial x}}_{=2x} \underbrace{v_1}_{=1}+\underbrace{\frac{\partial f}{\partial x}}_{=-4y} \underbrace{v_1}_{=-3}\\
&=(2x)(1)+(-4y)(-3)\\
&=2x+12y.\end{align}\]

Does the Existence of Directional Derivatives in All Directions Guarantee Differentiability?

We learned that if the first partial derivatives of a function are continuous in a neighborhood of a point, the function is differentiable at that point However, the mere existence of the first partial derivatives does not imply that the function is differentiable. We may face this question: Will a stronger condition that the directional derivatives of \(f\) in all directions (not just along the coordinate axes) exist guarantee the differentiability of \(f\)? The answer is still “no.” Consider the following example.

Example 3

Let
\[z=f(x,y)=\left\{\begin{array}{ll} \dfrac{xy^2}{x^2+y^2} & \text{if }(x,y)\neq(0,0),\\ 0 & \text{if }(x,y)=(0,0). \end{array}\right.\]
Find \(D_{\mathbf{v}}f(0,0)\) for every unit vector \(\mathbf{v}\). Is \(f\) differentiable at the origin?

Graph of $z=\dfrac{xy^2}{x^2+y^2}$
Solution

Let \(\mathbf{v}=\cos\theta\mathbf{i}+\sin\theta\mathbf{j}\) be any unit vector. If we use the polar coordinates, we have \[z=\frac{r \cos\theta\ r^2\sin^2\theta}{r^2}=r\cos\theta\sin^2\theta.\] This means along any ray from the origin making an angle \(\theta\) with the positive side of the \(x\)-axes, the graph is the straight line of slope \(\cos\theta\sin^2\theta\). Therefore, \(D_{\mathbf{v}}f(0,0)=\cos\theta\sin^2\theta\). If this argument has not been convincing for you, let’s calculate \(D_{\mathbf{v}}f(0,0)\):
\[\begin{align} D_{(\cos\theta\mathbf{i}+\sin\theta\mathbf{j})}f(0,0)&=\lim_{t\to 0}\frac{f(0+t\cos\theta,0+t\sin\theta)-\overbrace{f(0,0)}^{=0}}{t}\\ &=\lim_{t\to 0}\frac{1}{t}\frac{t\cos\theta\ t^2 \sin^2\theta}{t^2\underbrace{\left(\cos^2\theta+\sin^2\theta\right)}_{=1}}\\
&=\cos\theta\sin^2\theta\end{align}\]

Because the function is constant (\(z=0\)) on the \(x-\) and \(y-\)axes, we conclude \(f_x(0,0)=f_y(0,0)=0\). Therefore, if the function were differentiable, according to 1 we should have \(D_{\mathbf{v}}f(0,0)=f_x(0,0)\cos\theta+f_y(0,0)\sin\theta=0.\) However, we showed that \(D_{\mathbf{v}}f(0,0)=\cos\theta\sin^2\theta\) which is not zero if \(\theta\neq k\frac{\pi}{2}\) (for \(k\in\mathbb{Z}\)). Therefore we conclude the function cannot be differentiable at the origin.

Even if a function has finite directional derivatives in all directions, it may fail to be continuous, let alone be differentiable. The following example illustrates such a situation.

Example 4

Let
\[z=f(x,y)=\left\{\begin{array}{ll} \dfrac{xy^2}{x^2+y^4} & \text{if }x\neq 0,\\ 0 & \text{ if }x=0. \end{array}\right.\]
Find \(D_{\mathbf{v}}f(0,0)\) for every unit vector \(\mathbf{v}\).

Solution

Let \(\mathbf{v}=\cos\theta\mathbf{i}+\sin\theta\mathbf{j}\) be any unit vector. Because we don’t know whether or not \(f\) is differentiable, to find \(D_{\mathbf{v}}f(0,0)\) we have to use the definition of the directional derivative:
\[\begin{align}
D_{\mathbf{v}}f(0,0)&=\lim_{t\to 0}\frac{f(0+t\cos\theta,0+t\sin\theta)-\overbrace{f(0,0)}^{=0}}{t}\\[6pt] &=\lim_{t\to 0}\frac{1}{t}\frac{t\cos\theta\ t^2 \sin^2\theta}{t^2\left(\cos^2\theta+t^2\sin^4\theta\right)}\\[6pt] &=\lim_{t\to 0}\frac{\cos\theta\ \sin^2\theta}{\left(\cos^2\theta+t^2\sin^4\theta\right)}\\[6pt] &=\left\{\begin{array}{ll} \dfrac{\sin^2\theta}{\cos\theta} & \text{if }\cos\theta\neq 0,\\
\\ 0 & \text{if }\cos\theta=0. \end{array}\right.\end{align}\]

In fact, if we approach the origin along the line \(y=mx\), we have:
\[z=\lim_{x\to 0}\frac{x (m^2 x^2)}{x^2(1+m^4 x^2)}=\lim_{x\to 0}\frac{m^2 x}{1+m^4 x^2}=0.\]
and if \((x,y)\) tends to \((0,0)\) along \(y=\sqrt{x}\), we have:
\[z=\lim_{x\to 0}\frac{x (\sqrt{x})^2}{x^2+(\sqrt{x})^4}=\lim_{x\to 0}\frac{x^2}{2x^2}=\frac{1}{2}.\]
Therefore \(\lim_{(x,y)\to(0,0)}f(x,y)\) does not exist and the function is not even continuous at the origin, let alone be differentiable.

Gradients

Now let’s go back to Theorem 1. We learned if \(f\) is a differentiable function at \((x,y)\), and \(\mathbf{v}=v_1\mathbf{i}+v_2\mathbf{j}\), then
\[D_{\mathbf{v}}f(x,y)=\frac{\partial f}{\partial x}(x,y)\ v_1+\frac{\partial f}{\partial y}(x,y) v_2.\]
The right hand side of the above expression can be written as the dot product of two vecots:
\[\frac{\partial f}{\partial x}(x,y)\ v_1+\frac{\partial f}{\partial y}(x,y) v_2=\left(\frac{\partial f}{\partial x}(x,y)\ \mathbf{i}+\frac{\partial f}{\partial y}(x,y)\ \mathbf{j}\right)\boldsymbol{\cdot}\underbrace{\left(v_1\mathbf{i}+v_2\mathbf{j}\right)}_{\mathbf{v}}.\]
Therefore:
\[D_{\mathbf{v}}f(x,y)=\left(\frac{\partial f}{\partial x}(x,y)\ \mathbf{i}+\frac{\partial f}{\partial y}(x,y)\ \mathbf{j}\right)\boldsymbol{\cdot} \mathbf{v}.\]
The first vector on the right hand side is called the “gradient of \(f\)” and is denoted by “\(\overrightarrow{\nabla} f\)” or “\({\rm grad} f\).” The notation “\(\overrightarrow{\nabla}\)” is the inverted capital delta, \(\Delta\), and is read “del” or “nabla.” We can also write: \(\overrightarrow{\nabla} f=(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})\). Gradients have many applications that we will discuss in this chapter.

Definition 2. If \(f\) is a function of two variables \(x\) and \(y\) and if \(\frac{\partial f}{\partial x}(x,y)\) and \(\frac{\partial f}{\partial y}(x,y)\) exist, the gradient of \(f\), denoted by \(\overrightarrow{\nabla} f\) or \({\rm grad} f\) is defined by
\[\overrightarrow{\nabla} f(x,y)=\frac{\partial f}{\partial x}(x,y)\ \mathbf{i}+\frac{\partial f}{\partial y}(x,y)\ \mathbf{j}.\]

 If \(f\) is a differentiable function at \((x,y)\), then
\[D_{\mathbf{v}}f(x,y)=\mathbf{v}\boldsymbol{\cdot}\overrightarrow{\nabla} f(x,y).\]

    • Recall that dot product is commutative. That is for two vectors \(\mathbf{a}\) and \(\mathbf{b}\), we have: \(\mathbf{a}\boldsymbol{\cdot}\mathbf{b}=\mathbf{b}\boldsymbol{\cdot}\mathbf{a}\).
Example 5

Let \(f(x,y)=x^2-y^2+xy.\) Find \(\overrightarrow{\nabla} f(1,-2)\).

Solution

\[f_x(x,y)=2x+y\Rightarrow f_x(1,-2)=2\times 1-2=0\]
\[f_y(x,y)=-2y+x\Rightarrow f_y(1,-2)=-2\times (-2)+1=5\] Thus:
\[\overrightarrow{\nabla} f(1,-2)=5\mathbf{j}=(0,5)\]

Example 6

If \(f\) is a function of \(x\) and \(y\), find \(\overrightarrow{\nabla} f\) in polar coordinates.

Solution

We know \(\overrightarrow{\nabla} f=f_x \mathbf{i}+f_y \mathbf{j}\). Not only should we write \(f_x\) and \(f_y\) in terms of \(f_r\) and \(f_\theta\) but also we need to write \(\mathbf{i}\) and \(\mathbf{j}\) in terms of the unit vectors for polar coordinates \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\) (the subscripts for these two unit vectors do not mean differentiation).

From the chain rule, we know:
\[\frac{\partial f}{\partial x}=\frac{\partial f}{\partial r}\frac{\partial r}{\partial x}+\frac{\partial f}{\partial \theta}\frac{\partial \theta}{\partial x},\quad \frac{\partial f}{\partial y}=\frac{\partial f}{\partial r}\frac{\partial r}{\partial y}+\frac{\partial f}{\partial \theta}\frac{\partial \theta}{\partial y}\]
In this example , we had shown:
\[\begin{bmatrix} \dfrac{\partial r}{\partial x} & \dfrac{\partial r}{\partial y}\\ \\ \dfrac{\partial \theta}{\partial x} & \dfrac{\partial \theta}{\partial y} \end{bmatrix}=\begin{bmatrix} \dfrac{x}{r} & \dfrac{y}{r}\\ \\ -\dfrac{y}{r^2} & \dfrac{x}{r^2} \end{bmatrix}\]
Therefore:
\[\frac{\partial f}{\partial x}=\frac{\partial f}{\partial r}\underbrace{\frac{x}{r}}_{=\cos\theta}+\frac{\partial f}{\partial \theta}\underbrace{\frac{-y}{r^2}}_{=\frac{-\sin\theta}{r}}\]
\[\frac{\partial f}{\partial x}=\frac{\partial f}{\partial r}\underbrace{\frac{y}{r}}_{=\sin\theta}+\frac{\partial f}{\partial \theta}\underbrace{\frac{x}{r^2}}_{=\frac{\cos\theta}{r}}\]

On the other hand, \(\mathbf{i}\) and \(\mathbf{j}\) should be written in terms of \(\mathbf{e}_r\) and \(\mathbf{e}_\theta\). Using geometry we have:
\[\mathbf{i}=\mathbf{e}_r \cos\theta-\mathbf{e}_\theta \sin\theta\]
\[\mathbf{j}=\mathbf{e}_r \sin\theta+\mathbf{e}_\theta \cos\theta\]

\[\begin{align}
\overrightarrow{\nabla} f&=\frac{\partial f}{\partial x}\mathbf{i}+\frac{\partial f}{\partial y}\mathbf{j}\\
&=\left(\cos\theta \frac{\partial f}{\partial r}-\frac{\sin\theta}{r}\frac{\partial f}{\partial \theta}\right)\left(\mathbf{e}_r \cos\theta-\mathbf{e}_\theta \sin\theta\right)\\
&\quad+\left(\sin\theta\frac{\partial f}{\partial r}+\frac{\cos\theta}{r}\frac{\partial f}{\partial \theta}\right)\left(\mathbf{e}_r \sin\theta+\mathbf{e}_\theta \cos\theta\right)\\ &=\Bigg[\frac{\partial f}{\partial r}\underbrace{\left(\cos^2\theta+\sin^2\theta\right)}_{=1}+\frac{\partial f}{\partial \theta}\underbrace{\left(-\sin\theta\cos\theta+\sin\theta\cos\theta\right)}_{=0}\Bigg]\mathbf{e}_r\\ &\quad +\Bigg[\frac{\partial f}{\partial r}\underbrace{\left(-\sin\theta\cos\theta+\sin\theta\cos\theta\right)}_{=0}+\frac{1}{r}\frac{\partial f}{\partial \theta}\underbrace{\left(\frac{\sin^2\theta}{r}+\frac{\cos^2\theta}{r}\right)}_{=\frac{1}{r}}\Bigg]\mathbf{e}_\theta\end{align}\]

Thus the gradient of \(f\) in polar coordinates is:

\[\overrightarrow{\nabla} f=\frac{\partial f}{\partial r}\mathbf{e}_r+\frac{1}{r}\frac{\partial
f}{\partial \theta}\mathbf{e}_\theta\]

  • In the above example, we showed that the gradient in polar coordinates\[\bbox[#F2F2F2,5px,border:2px solid black] {\begin{align}\overrightarrow{\nabla} f&=\frac{\partial f}{\partial r}\mathbf{e}_r+\frac{1}{r}\frac{\partial f}{\partial \theta}\mathbf{e}_\theta\\ \text{or}\qquad &\\
    \overrightarrow{\nabla} f&=\left(\mathbf{e}_r\frac{\partial}{\partial r}+\mathbf{e}_\theta\frac{1}{r}\frac{\partial }{\partial \theta}\right)f \end{align}}\]

Directional Derivatives and Gradients in 3- and n-Space

The extension of the concept of the directional derivative and the gradient when
\(f\) is a function of three variables or more is easy. For example if \(f:U\subseteq\mathbb{R}^3\to\mathbb{R}\), its directional derivative in the direction of a vector \(\mathbf{v}=v_1\mathbf{i}+v_2\mathbf{j}+v_3\mathbf{j}\) is:
\[\begin{align}
D_{\mathbf{v}}f(x,y,z)&=\lim_{t\to 0}\frac{f(x+tv_1,y+tv_2,z+tv_3)-f(x,y,z)}{t}\\
&=\frac{d}{dt}f(x+tv_1,y+tv_2,z+tv_z)\Bigg|_{t=0}
\end{align}\] Note that \((x+tv_1,y+tv_2,z+tv_3)=(x,y,z)+t (v_1,v_2,v_3)=(x,y,z)+t\mathbf{v}\). The general definition of the directional derivative is as follows.

Definition 3. Consider a function \(f:U\subseteq\mathbb{R}^n\to\mathbb{R}\). The directional derivative of \(f\) at \(\mathbf{x}\in\mathbb{R}^n\) in the direction \(\mathbf{v}\in\mathbb{R}^n\), denoted by \(D_{\mathbf{v}} f(\mathbf{x})\), is defined by:
\[D_{\mathbf{v}} f(\mathbf{x})=\lim_{t\to 0}\frac{ f(\mathbf{x}+t\mathbf{v})-f(\mathbf{x})}{t}\]
whenever the limit on the right hand side exists.

The gradient of \(f(x,y,z)\) is
\[\overrightarrow{\nabla} f(x,y,z)=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}\right)=\frac{\partial f}{\partial x}\mathbf{i}+\frac{\partial f}{\partial y}\mathbf{j}+\frac{\partial f}{\partial z}\mathbf{k}\]
and for the general case \(f(x_1,\dots,x_n)\) we have:

Definition 4. Consider a function \(f:U\subseteq\mathbb{R}^n\to\mathbb{R}\) such that \(\frac{\partial f}{\partial x_1}(\mathbf{x}),\dots,\frac{\partial f}{\partial x_n}(\mathbf{x})\) exist. Then the gradient of \(f\), denoted by \(\overrightarrow{\nabla} f\) or \({\rm grad} f\), is the vector
\[\overrightarrow{\nabla} f(\mathbf{x})=\left(\frac{\partial f}{\partial x_1}(\mathbf{x}), \dots, \frac{\partial f}{\partial x_n}(\mathbf{x})\right).\]

And

Theorem 2. If \(f\) is a differentiable function at \(\mathbf{x}=(x_1,\dots,x_n)\), then
\[D_{\mathbf{v}}f(\mathbf{x})=\mathbf{v}\boldsymbol{\cdot}\overrightarrow{\nabla} f(\mathbf{x}),\]
for a vector \(\mathbf{v}\in\mathbb{R}^n\).

Example 7

Let \(f(x,y,z)=\rho=\sqrt{x^2+y^2+z^2}\) be a function that gives the distance from \(\mathbf{0}\) to \((x,y,z)\). Find \(\overrightarrow{\nabla} f(1,-2,-1)\).

Solution

\[\begin{align} \overrightarrow{\nabla} f(x,y,z)&=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}\right)\\ &=\left(\frac{x}{\sqrt{x^2+y^2+z^2}},\frac{y}{\sqrt{x^2+y^2+z^2}},\frac{z}{\sqrt{x^2+y^2+z^2}}\right)\\
&=\left(\frac{x}{\rho},\frac{y}{\rho},\frac{z}{\rho}\right)\\ &=\frac{1}{\rho}(x,y,z)\end{align}\]

This means \(\overrightarrow{\nabla} f\) is a unit vector in the direction \((x,y,z)\). It is unit because we divide the vector \((x,y,z)\) by its length \(\rho\).

To find \(\overrightarrow{\nabla} f(1,-2,-1)\), we find the length of the vector \((1,-2,-1)\): \[\rho=|(1,-2,-1)|=\sqrt{1^2+(-2)^2+(-1)^2}=\sqrt{6},\] and therefore:
\[\overrightarrow{\nabla} f(1,-2,-1)=\frac{1}{\sqrt{6}}(1,-2,-1)=\frac{1}{\sqrt{6}}\mathbf{i}-\frac{2}{\sqrt{6}}\mathbf{j}-\frac{1}{\sqrt{6}}\mathbf{k}.\]

Example 8

Let \(f(x,y,z)=x e^{yz}+3x^2-yz\). Find the rate of change of \(f\) at \((1,0,-3)\) in the direction of the vector \((2,0,-1)\).

Solution

We know the rate of change of \(f\) at a point in the direction of a vector \(\mathbf{v}\) is \(D_{\mathbf{v}/|\mathbf{v}|}f\) evaluated at that point (for this example at \((1,0,-3)\)). On the other hand we know:
\[D_{\mathbf{v}/|\mathbf{v}|}f=\overrightarrow{\nabla} f \boldsymbol{\cdot} \frac{\mathbf{v}}{|\mathbf{v}|}.\]
[Actually we have replaced \(\mathbf{v}\) in Theorem 2 by the vector \(\frac{\mathbf{v}}{|\mathbf{v}|}\)]

So we need to (1) find \(\overrightarrow{\nabla} f(1,0,-3)\) (2) normalize \(\mathbf{v}\) by its length and (3) do dot product the two vectors we got in step 1 and step 2.

Step 1:
\[f(x,y,z)=x e^{yz}+3x^2-yz\]\[\Rightarrow \overrightarrow{\nabla} f(x,y,z)=(f_x,f_y,f_z)=(e^{yz},xz e^{yz}-z,xy e^{yz}-y)\]
\[\Rightarrow \overrightarrow{\nabla} f(1,0,-3)=(e^0,-3 e^0+3,0)=(1,0,0)\]

Step 2: Here \(\mathbf{v}=(2,0,-1)\) and \(|\mathbf{v}|=\sqrt{2^2+1}=\sqrt{5}\).
\[\frac{\mathbf{v}}{|\mathbf{v}|}=\left(\frac{2}{\sqrt{5}},0,\frac{-1}{\sqrt{5}}\right)\]

Step3:
\[D_{\mathbf{v}/|\mathbf{v}|}f(1,0,-3)=(1,0,0)\boldsymbol{\cdot}\left(\frac{2}{\sqrt{5}},0,\frac{-1}{\sqrt{5}}\right)=\frac{2}{\sqrt{5}}.\]

Therefore, the rate of change of \(f\) at \((1,0,-3)\) in the direction of the vector \((2,0,-1)\) is \({2}/{\sqrt{5}}\).

Properties of the Gradient

Properties of the gradient of a function are similar to the properties of regular derivative of functions of single variable. If \(f\) and \(g\) are differentiable functions from an open set \(U\subseteq\mathbb{R}^n\) to \(\mathbb{R}\) then:

  1. \(\overrightarrow{\nabla} f(\mathbf{x})=\mathbf{0}\) for every $\mathbf{x}$ in $U$ if and only if $f(\mathbf{x})=$ constant on  $U.$
  2. \(\overrightarrow{\nabla} (f+g)(\mathbf{x})=\overrightarrow{\nabla} f(\mathbf{x})+\overrightarrow{\nabla} g(\mathbf{x}).\)
  3. \(\overrightarrow{\nabla} (cf)(\mathbf{x})=c\overrightarrow{\nabla} f(\mathbf{x}),\quad c\text{ is a constant}.\)
  4. \(\overrightarrow{\nabla} (fg)(\mathbf{x})=g(\mathbf{x})\,\overrightarrow{\nabla} f(\mathbf{x})+f(\mathbf{x})\,\overrightarrow{\nabla} g(\mathbf{x}).\)
  5. \(\overrightarrow{\nabla}\left(\dfrac{f}{g}\right)(\mathbf{x})=\dfrac{g(\mathbf{x}) \overrightarrow{\nabla} f(\mathbf{x})-f(\mathbf{x}) \overrightarrow{\nabla} g(\mathbf{x})}{g^2(\mathbf{x})},\quad \text{at points at which }g(\mathbf{x})\neq 0.\)
  6. Let \(\mathbf{r}:I\subseteq\mathbb{R}\to\mathbb{R}^n\) be a function that maps an interval \(I\) into the domain of \(f\). Assume \(\mathbf{r}'(t)\) exists and \(f\) is differentiable at \(\mathbf{r}(t)\). If we define \(\phi:I\subseteq\mathbb{R}\to\mathbb{R}\) such that \(\phi(t)=f(\mathbf{r}(t))\), then using the chain rule we conclude \(\phi'(t)\) exists and is equal to:
    \[\phi'(t)=\overrightarrow{\nabla} f(\mathbf{r}(t))\boldsymbol{\cdot} \mathbf{r}'(t). \label{GradProp-6}\tag{$\dagger$}\]
    Note that (\(\dagger\)) is not something new. It is just a new way of writing what we saw before. For example, when \(n=3\) and \(\mathbf{r}(t)=\left(x(t),y(t),z(t)\right)\), (\(\dagger\)) is the same as the following:
    \[\phi'(t)=\frac{d\phi(t)}{dt}=\frac{\partial f}{\partial x}\frac{dx(t)}{dt}+\frac{\partial f}{\partial y}\frac{dy(t)}{dt}+\frac{\partial f}{\partial z}\frac{dz(t)}{dt}.\]

 

Example 9
(Multi-dimensional version of the Mean-Value Theorem) Let $U$ be an open set containing two points $\mathbf{a}$ and $\mathbf{b}$ and the line segment joining them. Show that if $f$ and its first partial derivatives are continuous on $U$, then there is a point $\mathbf{c}$ on the line segment joining $\mathbf{a}$ and $\mathbf{b}$ such that \[f(\mathbf{b})-f(\mathbf{a})=\overrightarrow{\nabla}f(\mathbf{c})\boldsymbol{\cdot}(\mathbf{b}-\mathbf{a})\]

[Hint: Express the line segment in parametric form oand use the Mean-Value Theorem for functions of one variable.]

Solution

The line segment joining $\mathbf{a}$ and $\mathbf{b}$ can be parametrized by means of the equation

\[\mathbf{r}(t)=\mathbf{a}+t(\mathbf{b}-\mathbf{a})\quad 0\leq t\leq 1.\]

Let $g$ be a function from $[0,1]$ to $\mathbb{R}$ with

\[g(t)=f(\mathbf{r}(t)).\]

Because $g$ is continuous on $[0,1]$ and differentiable on $(0,1)$, it follows from the Mean-Value Theorem that there is a number $t_0$ between $0$ and $1$ such that

\[g(1)-g(0)=g'(t_0)(1-0).\]

Because \[g(0)=f(\mathbf{r}(0))=f(\mathbf{a}),\quad g(1)=f(\mathbf{r}(1))=f(\mathbf{b}),\] and

\begin{align*}
g'(t)=&\overrightarrow{\nabla}f(\mathbf{r}(t))\boldsymbol{\cdot}\mathbf{r}'(t)\\
=&\overrightarrow{\nabla}f(\mathbf{r}(t))\boldsymbol{\cdot}(\mathbf{b}-\mathbf{a}),
\end{align*}
we have

\[f(\mathbf{b})-f(\mathbf{a})=\overrightarrow{\nabla}f(\mathbf{r}(t_0))\boldsymbol{\cdot}(\mathbf{b}-\mathbf{a})\]

or

\[f(\mathbf{b})-f(\mathbf{a})=\overrightarrow{\nabla}f(\mathbf{c})\boldsymbol{\cdot}(\mathbf{b}-\mathbf{a}),\]

where $\mathbf{c}=\mathbf{r}(t_0)$.

Differentiability and Gradient (Optional)

 

Read the optional part

Suppose \(f:U\subseteq\mathbb{R}^n\to\mathbb{R}\) is differentiable at \(\mathbf{x}_0\). According to Definition 3.10.2 we have:
\[f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x})+\frac{\partial f}{\partial x_1}(\mathbf{x}_0)h_1+\cdots+\frac{\partial f}{\partial x_n}(\mathbf{x}_0)h_n+|\mathbf{h}|\varepsilon(\mathbf{h}),\tag{*}\]
where \(\mathbf{h}=(h_1,\dots,h_n)\) and as regular \(|\mathbf{h}|=\sqrt{h_1^2+\cdots+h_n^2}\). Because \[\frac{\partial f}{\partial x_1}(\mathbf{x}_0)h_1+\cdots+\frac{\partial f}{\partial x_n}(\mathbf{x}_0)h_n=\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h},\]  we can rewrite the above expression as:
\[f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x})+\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}+|\mathbf{h}|\varepsilon(\mathbf{h}).\]
Rearranging the terms:
\[\Rightarrow f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}=|\mathbf{h}|\varepsilon(\mathbf{h}).\]
Take the absolute value of both sides
\[\Rightarrow \left|f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}\right|=|\mathbf{h}|\left|\varepsilon(\mathbf{h})\right|,\]
and dividing both sides by \(|\mathbf{h}|\) (if \(\mathbf{h}\neq\mathbf{0}\)), we have:
\[\frac{\left|f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}\right|}{|\mathbf{h}|}=\left|\varepsilon(\mathbf{h})\right|.\]
Differentiability of \(f\) means \(\lim_{\mathbf{h}\to \mathbf{0}}\varepsilon(\mathbf{h})=0\), therefore \(\lim_{\mathbf{h}\to \mathbf{0}}\left|\varepsilon(\mathbf{h})\right|=0\), and finally
\[\lim_{\mathbf{h}\to\mathbf{0}}\frac{\left|f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}\right|}{|\mathbf{h}|}=0.\tag{**}\]
Try to prove that if (**) holds, then we have (*). Consequently, we can conclude that
\(f\) is differentiable if and only if we have:
\[\lim_{\mathbf{h}\to\mathbf{0}}\frac{\left|f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\cdot\mathbf{h}\right|}{|\mathbf{h}|}=0,\tag{***}\]
and this can be used as an alternative definition of differentiability.

 

Gradient Vector Field

Suppose \(f\) is a function of \(x\) and \(y\). The gradient of \(f\) assigns a two dimensional vector \((f_x,f_y)\) to each point in the \(\mathbb{R}^2\) plane wherever the partial derivatives exist. An association that associates a vector to each point in the two- or three-dimensional space is called vector field. As such, \(\overrightarrow{\nabla} f\) is referred to as a gradient vector field. Other examples of vector fields in physics and engineering include the velocity of (steady) wind or water currents, gravitational field, electric and magnetic fields, and displacement field of a deformable body under external forces.

To visualize a vector field in two or three dimensions, at each point (actually at some points) we draw a vector that the vector field gives us at that point. The lengths of vectors are often scaled to be able to show more vectors in the plane. This is an effective way of representing a gradient field. For example, if \(f:U\subseteq\mathbb{R}^2\to\mathbb{R}\), its gradient is a function from \(U\subseteq\mathbb{R}^2\) to \(\mathbb{R}^2\), and therefore its graph would be a set of \((\mathbf{x},\overrightarrow{\nabla} f(\mathbf{x}))=\left(x,y,\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\), which is a subset of \(\mathbb{R}^4\) and impossible to plot.


[1] or equivalently parallel to the z-axis