Directional Derivatives
We have learned that the rates of change of a function \(f\) along the coordinate axes are its first partial derivatives. Now we want to generalize this concept and find the rate of change of \(f\) along any arbitrary direction.
Consider a function \(z=f(x,y)\) and a unit vector \(\mathbf{v}=v_1\mathbf{i}+v_2\mathbf{j}\). Let’s approach to the point \((x_0,y_0)\) along a ray that is parallel to \(\mathbf{v}\). Then the rate of change of \(f\) at the point \((x_0,y_0)\) with respect to distance is called the directional derivative of \(f\) at \((x_0,y_0)\) in the direction \(\mathbf{v}\).
For a geometrical interpretation, consider the point \(P=(x_0,y_0,f(x_0,y_0))\) on the surface \(z=f(x,y)\) and the points \(Q=(x_0,y_0,0)\) and \(R=(x_0+tv_1,y_0+tv_2,0)\) in the \(xy\)-plane, as illustrated in in Fig. 1. Note that \(\overrightarrow{QR}\) is parallel to \(\mathbf{v}\). The plane \(\Omega\) through \(Q\) and \(R\) and perpendicular to the \(xy\)-plane1 intersects the surface \(z=f(x,u)\) in the curve \(C\). The slope of the tangent line to the curve \(C\) at the point \(P\) in the plane \(\Omega\) is the directional derivative of \(f\) in the direction \(\mathbf{v}\), and is denoted by \(D_{\mathbf{v}} f(x_0,y_0)\).
Definition 1. The directional derivative of a function \(f\) at \((x_0,y_0)\) in the direction of \(\mathbf{v}\), denoted by \(D_{\mathbf{v}} f(x_0,y_0)\) is given by:
\[D_{\mathbf{v}}f(x_0,y_0)=\lim_{t\to 0}\frac{f(x_0+t v_1,y_0+t v_2)-f(x_0,y_0)}{t},\]
whenever the limit on the right hand side exists.
- The above definition is also meaningful when \(\mathbf{v}\) is not a unit vector. However, when \(|\mathbf{v}|\neq 0\), then \(D_{\mathbf{v}}f(x_0,y_0)\) is NOT equal to the rate of change of \(f\) in the direction \(\mathbf{v}\). The rate of change of \(f\) in the direction \(\mathbf{v}\) or the slope of tangent line to the curve \(C\) at \((x_0,y_0)\) is equal to \(D_{\mathbf{v}/|\mathbf{v}|}f(x_0,y_0)\).
The rate of change of \(f\) at \((x_0,y_0)\) along an arbitrary line parallel to \(\mathbf{v}\) (\(\mathbf{v}\neq\mathbf{0}\)) is equal to: \[D_{\mathbf{v}/|\mathbf{v}|}f(x_0,y_0)\]
- If \(\mathbf{v}=\mathbf{0}\), Definition 1 gives:
\[D_{\mathbf{0}}f(x,y)=\lim_{t\to 0}\frac{f(x+t\times 0 ,y+t\times 0)-f(x,y)}{t}=0,\]
for every \((x,y)\) in the domain of \(f\). - If \(\mathbf{v}=\mathbf{i}\), we have from the above definition
\[D_{\mathbf{i}}f(x_0,y_0)=\lim_{t\to 0}\frac{f(x_0+t ,y_0)-f(x_0,y_0)}{t},\]
which is clearly the partial derivative of \(f\) with respect to \(x\), \(f_x(x_0,y_0)=\frac{\partial f}{\partial x}(x_0,y_0)\).Similarly, if \(\mathbf{v}=\mathbf{j}\), the directional derivative of \(f\) in the direction of \(\mathbf{j}\) is the partial derivative of \(f\) with respect to \(y\). So
\[\begin{align} &D_{\mathbf{i}} f=f_x=\frac{\partial f}{\partial x},\\ &D_{\mathbf{j}} f=f_y=\frac{\partial f}{\partial y}.\end{align}\]
- If \(g(t)=f(x_0+tv_1,y_0+tv_2)\), then \(g'(0)=D_{\mathbf{v}}f(x_0,y_0)\).
To prove, we use the definition of \(g'(0)\):
\[\begin{align} g'(0)&=\lim_{t\to 0}\frac{g(t)-g(0)}{t}\\
&=\lim_{t\to 0}\frac{f(x_0+tv_1,y_0+tv_2)-f(x,y)}{t}\\
&=D_{\mathbf{v}}f(x_0,y_0).\end{align}\]Also we note \(g(t)=f(x(t),y(t))\) where \[x(t)=x_0+t\,v_1,\quad\text{and}\quad y(t)=y_0+t\ v_2.\] To find \(g'(t)\) we can use the chain rule:
\[\begin{align} \frac{dg}{dt}&=\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}\\ &=\frac{\partial f}{\partial x}\ v_1+\frac{\partial f}{\partial y}\ v_2.\end{align}\]
When \(t=0\), we have \(x(0)=x_0\) and \(y(0)=y_0\). Thus:
\[\begin{align} g'(0)&=\left.\frac{dg}{dt}\right|_{t=0}=\left.\frac{\partial f}{\partial x}\right|_{(x_0,y_0)} v_1+\left.\frac{\partial f}{\partial y}\right|_{(x_0,y_0)}v_2\\ &=\frac{\partial f}{\partial x}(x_0,y_0)\ v_1+\frac{\partial f}{\partial y}(x_0,y_0)\ v_2.\end{align}\]
Therefore, we could prove the following theorem.
Theorem 1. If\(f\) is a differentiable function at \((x_0,y_0)\), and \(\mathbf{v}=v_1\mathbf{i}+v_2\mathbf{j}\), then
\[D_{\mathbf{v}}f(x_0,y_0)=\frac{\partial f}{\partial x}(x_0,y_0)\ v_1+\frac{\partial f}{\partial y}(x_0,y_0) v_2.\]
Does the Existence of Directional Derivatives in All Directions Guarantee Differentiability?
We learned that if the first partial derivatives of a function are continuous in a neighborhood of a point, the function is differentiable at that point However, the mere existence of the first partial derivatives does not imply that the function is differentiable. We may face this question: Will a stronger condition that the directional derivatives of \(f\) in all directions (not just along the coordinate axes) exist guarantee the differentiability of \(f\)? The answer is still “no.” Consider the following example.
Even if a function has finite directional derivatives in all directions, it may fail to be continuous, let alone be differentiable. The following example illustrates such a situation.
Gradients
Now let’s go back to Theorem 1. We learned if \(f\) is a differentiable function at \((x,y)\), and \(\mathbf{v}=v_1\mathbf{i}+v_2\mathbf{j}\), then
\[D_{\mathbf{v}}f(x,y)=\frac{\partial f}{\partial x}(x,y)\ v_1+\frac{\partial f}{\partial y}(x,y) v_2.\]
The right hand side of the above expression can be written as the dot product of two vecots:
\[\frac{\partial f}{\partial x}(x,y)\ v_1+\frac{\partial f}{\partial y}(x,y) v_2=\left(\frac{\partial f}{\partial x}(x,y)\ \mathbf{i}+\frac{\partial f}{\partial y}(x,y)\ \mathbf{j}\right)\boldsymbol{\cdot}\underbrace{\left(v_1\mathbf{i}+v_2\mathbf{j}\right)}_{\mathbf{v}}.\]
Therefore:
\[D_{\mathbf{v}}f(x,y)=\left(\frac{\partial f}{\partial x}(x,y)\ \mathbf{i}+\frac{\partial f}{\partial y}(x,y)\ \mathbf{j}\right)\boldsymbol{\cdot} \mathbf{v}.\]
The first vector on the right hand side is called the “gradient of \(f\)” and is denoted by “\(\overrightarrow{\nabla} f\)” or “\({\rm grad} f\).” The notation “\(\overrightarrow{\nabla}\)” is the inverted capital delta, \(\Delta\), and is read “del” or “nabla.” We can also write: \(\overrightarrow{\nabla} f=(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y})\). Gradients have many applications that we will discuss in this chapter.
Definition 2. If \(f\) is a function of two variables \(x\) and \(y\) and if \(\frac{\partial f}{\partial x}(x,y)\) and \(\frac{\partial f}{\partial y}(x,y)\) exist, the gradient of \(f\), denoted by \(\overrightarrow{\nabla} f\) or \({\rm grad} f\) is defined by
\[\overrightarrow{\nabla} f(x,y)=\frac{\partial f}{\partial x}(x,y)\ \mathbf{i}+\frac{\partial f}{\partial y}(x,y)\ \mathbf{j}.\]
If \(f\) is a differentiable function at \((x,y)\), then The extension of the concept of the directional derivative and the gradient when Definition 3. Consider a function \(f:U\subseteq\mathbb{R}^n\to\mathbb{R}\). The directional derivative of \(f\) at \(\mathbf{x}\in\mathbb{R}^n\) in the direction \(\mathbf{v}\in\mathbb{R}^n\), denoted by \(D_{\mathbf{v}} f(\mathbf{x})\), is defined by: The gradient of \(f(x,y,z)\) is Definition 4. Consider a function \(f:U\subseteq\mathbb{R}^n\to\mathbb{R}\) such that \(\frac{\partial f}{\partial x_1}(\mathbf{x}),\dots,\frac{\partial f}{\partial x_n}(\mathbf{x})\) exist. Then the gradient of \(f\), denoted by \(\overrightarrow{\nabla} f\) or \({\rm grad} f\), is the vector And Theorem 2. If \(f\) is a differentiable function at \(\mathbf{x}=(x_1,\dots,x_n)\), then Properties of the gradient of a function are similar to the properties of regular derivative of functions of single variable. If \(f\) and \(g\) are differentiable functions from an open set \(U\subseteq\mathbb{R}^n\) to \(\mathbb{R}\) then: Suppose \(f:U\subseteq\mathbb{R}^n\to\mathbb{R}\) is differentiable at \(\mathbf{x}_0\). According to Definition 3.10.2 we have: Suppose \(f\) is a function of \(x\) and \(y\). The gradient of \(f\) assigns a two dimensional vector \((f_x,f_y)\) to each point in the \(\mathbb{R}^2\) plane wherever the partial derivatives exist. An association that associates a vector to each point in the two- or three-dimensional space is called vector field. As such, \(\overrightarrow{\nabla} f\) is referred to as a gradient vector field. Other examples of vector fields in physics and engineering include the velocity of (steady) wind or water currents, gravitational field, electric and magnetic fields, and displacement field of a deformable body under external forces. To visualize a vector field in two or three dimensions, at each point (actually at some points) we draw a vector that the vector field gives us at that point. The lengths of vectors are often scaled to be able to show more vectors in the plane. This is an effective way of representing a gradient field. For example, if \(f:U\subseteq\mathbb{R}^2\to\mathbb{R}\), its gradient is a function from \(U\subseteq\mathbb{R}^2\) to \(\mathbb{R}^2\), and therefore its graph would be a set of \((\mathbf{x},\overrightarrow{\nabla} f(\mathbf{x}))=\left(x,y,\frac{\partial f}{\partial x},\frac{\partial f}{\partial y}\right)\), which is a subset of \(\mathbb{R}^4\) and impossible to plot. [1] or equivalently parallel to the z-axis
\[D_{\mathbf{v}}f(x,y)=\mathbf{v}\boldsymbol{\cdot}\overrightarrow{\nabla} f(x,y).\]
\overrightarrow{\nabla} f&=\left(\mathbf{e}_r\frac{\partial}{\partial r}+\mathbf{e}_\theta\frac{1}{r}\frac{\partial }{\partial \theta}\right)f \end{align}}\] Directional Derivatives and Gradients in 3- and n-Space
\(f\) is a function of three variables or more is easy. For example if \(f:U\subseteq\mathbb{R}^3\to\mathbb{R}\), its directional derivative in the direction of a vector \(\mathbf{v}=v_1\mathbf{i}+v_2\mathbf{j}+v_3\mathbf{j}\) is:
\[\begin{align}
D_{\mathbf{v}}f(x,y,z)&=\lim_{t\to 0}\frac{f(x+tv_1,y+tv_2,z+tv_3)-f(x,y,z)}{t}\\
&=\frac{d}{dt}f(x+tv_1,y+tv_2,z+tv_z)\Bigg|_{t=0}
\end{align}\]
Note that \((x+tv_1,y+tv_2,z+tv_3)=(x,y,z)+t (v_1,v_2,v_3)=(x,y,z)+t\mathbf{v}\). The general definition of the directional derivative is as follows.
\[D_{\mathbf{v}} f(\mathbf{x})=\lim_{t\to 0}\frac{ f(\mathbf{x}+t\mathbf{v})-f(\mathbf{x})}{t}\]
whenever the limit on the right hand side exists.
\[\overrightarrow{\nabla} f(x,y,z)=\left(\frac{\partial f}{\partial x},\frac{\partial f}{\partial y},\frac{\partial f}{\partial z}\right)=\frac{\partial f}{\partial x}\mathbf{i}+\frac{\partial f}{\partial y}\mathbf{j}+\frac{\partial f}{\partial z}\mathbf{k}\]
and for the general case \(f(x_1,\dots,x_n)\) we have:
\[\overrightarrow{\nabla} f(\mathbf{x})=\left(\frac{\partial f}{\partial x_1}(\mathbf{x}), \dots, \frac{\partial f}{\partial x_n}(\mathbf{x})\right).\]
\[D_{\mathbf{v}}f(\mathbf{x})=\mathbf{v}\boldsymbol{\cdot}\overrightarrow{\nabla} f(\mathbf{x}),\]
for a vector \(\mathbf{v}\in\mathbb{R}^n\).Properties of the Gradient
\[\phi'(t)=\overrightarrow{\nabla} f(\mathbf{r}(t))\boldsymbol{\cdot} \mathbf{r}'(t). \label{GradProp-6}\tag{$\dagger$}\]
Note that (\(\dagger\)) is not something new. It is just a new way of writing what we saw before. For example, when \(n=3\) and \(\mathbf{r}(t)=\left(x(t),y(t),z(t)\right)\), (\(\dagger\)) is the same as the following:
\[\phi'(t)=\frac{d\phi(t)}{dt}=\frac{\partial f}{\partial x}\frac{dx(t)}{dt}+\frac{\partial f}{\partial y}\frac{dy(t)}{dt}+\frac{\partial f}{\partial z}\frac{dz(t)}{dt}.\] Differentiability and Gradient (Optional)
Read the optional part
Hide the optional part
\[f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x})+\frac{\partial f}{\partial x_1}(\mathbf{x}_0)h_1+\cdots+\frac{\partial f}{\partial x_n}(\mathbf{x}_0)h_n+|\mathbf{h}|\varepsilon(\mathbf{h}),\tag{*}\]
where \(\mathbf{h}=(h_1,\dots,h_n)\) and as regular \(|\mathbf{h}|=\sqrt{h_1^2+\cdots+h_n^2}\). Because \[\frac{\partial f}{\partial x_1}(\mathbf{x}_0)h_1+\cdots+\frac{\partial f}{\partial x_n}(\mathbf{x}_0)h_n=\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h},\] we can rewrite the above expression as:
\[f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x})+\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}+|\mathbf{h}|\varepsilon(\mathbf{h}).\]
Rearranging the terms:
\[\Rightarrow f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}=|\mathbf{h}|\varepsilon(\mathbf{h}).\]
Take the absolute value of both sides
\[\Rightarrow \left|f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}\right|=|\mathbf{h}|\left|\varepsilon(\mathbf{h})\right|,\]
and dividing both sides by \(|\mathbf{h}|\) (if \(\mathbf{h}\neq\mathbf{0}\)), we have:
\[\frac{\left|f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}\right|}{|\mathbf{h}|}=\left|\varepsilon(\mathbf{h})\right|.\]
Differentiability of \(f\) means \(\lim_{\mathbf{h}\to \mathbf{0}}\varepsilon(\mathbf{h})=0\), therefore \(\lim_{\mathbf{h}\to \mathbf{0}}\left|\varepsilon(\mathbf{h})\right|=0\), and finally
\[\lim_{\mathbf{h}\to\mathbf{0}}\frac{\left|f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\boldsymbol{\cdot}\mathbf{h}\right|}{|\mathbf{h}|}=0.\tag{**}\]
Try to prove that if (**) holds, then we have (*). Consequently, we can conclude that
\(f\) is differentiable if and only if we have:
\[\lim_{\mathbf{h}\to\mathbf{0}}\frac{\left|f(\mathbf{x}_0+\mathbf{h})-f(\mathbf{x})-\overrightarrow{\nabla} f(\mathbf{x}_0)\cdot\mathbf{h}\right|}{|\mathbf{h}|}=0,\tag{***}\]
and this can be used as an alternative definition of differentiability. Gradient Vector Field