3.17 Taylor's Theorem (Optional)

Table of Contents

Review of Taylor’s Formula for Functions of a Single Variable

Let’s review the Taylor series for single variable functions. Suppose $y=f(x)$ is differentiable at $x_0$, then it has a linear approximation at $x_0$, and we have:
\[f(x)=f(x_0)+f'(x_0)(x-x_0)+(x-x_0) \varepsilon_1(x-x_0),\] with\[\lim_{x\to x_0}\varepsilon_1(x-x_0)=0.\] Therefore, \[P_{1,x_0}(x)=f(x_0)+f'(x_0)(x-x_0)\] is the linear approximation and the error in this approximation is: \[R_{1,x_0}(x)=(x-x_0) \varepsilon_1(x-x_0).\] The subscripts $1$ and $x_0$ in $P_{1,x_0}$ and $R_{1,x_0}$ show the maximum power of $x$ that appears in the polynomial and the point about which we approximate the function $f$.

Note that as $x\to x_0$, $R_{1,x_0}(x)$ tends to 0 faster than $(x-x_0)$ because \[R_{1,x_0}(x)/(x-x_0)=\varepsilon_{1,x_0}(x-x_0)\to 0.\] Mathematically we write \[R_{1,x_0}(x)={\color{red} o}\left(x-x_0\right).\]

If we want a better approximation of $f$, instead of a linear function, we may use a quadratic function \[P_{2,x_0}(x)=a_0+a_1 x+a_2 x^2.\] To determine the coefficients $a_0, a_1$ and $a_2$, we match the values of the functions and their first and second derivatives at $x=x_0$. This means the graph of $P_{2,x_0}(x)$ has the same value, the same slope and the same concavity as the graph of $f$ at $x=x_0$. The quadratic polynomial reads:
\[P_{2,x_0}(x)=f(x_0)+f'(x_0)(x-x_0)+\frac{1}{2!} f”(x_0)(x-x_0)^2,\]
and the error is
\[R_{2,x_0}(x)=(x-x_0)^2\varepsilon_{2,x_0}(x-x_0), \quad \text{where}\quad \lim_{x\to x_0}\varepsilon_{2,x_0}(x-x_0)=0.\]

We still can improve approximations of $f$ at $x=x_0$ by using higher order polynomials and matching more derivatives at the selected base point. If we use a polynomial of order $m$, we can prove (see the following theorem) that the error in the approximation goes to zero faster than $(x-x_0)^m$ as $x\to x_0$. Mathematically we write $R_{m,x_0}(x)=o\left((x-x_0)^m\right)$ which means $R_{m,x_0}(x)/(x-x_0)^m\to 0$ as $x\to x_0$. In general, we have the following theorem:

Theorem 1. Suppose $f$ is a function that is $m(\geq1)$ times differentiable at $x_0$; that is \[f'(x_0), f^{\prime\prime}(x_0), \cdots, f^{(m)}(x_0)\] all exist. Let \[P_{m,x_0}(x)=a_0+a_1(x-x_0)+\cdots+a_m(x-x_0)^m,\] where \[a_k=\frac{f^{(k)}(x_0)}{k!}, \quad 0\leq k\leq m,\] and \[\varepsilon_{m,x_0}(x)=\frac{f(x)-P_{m,x_0}(x)}{(x-x_0)^m}.\] Then $\lim_{x\to x_0}\varepsilon_{m,x_0}(x)=0$.

Remark that $f^{(0)}(x_0)=f(x_0)$, $f^{(1)}(x_0)=f^{\prime}(x_0)$, $f^{(2)}(x_0)=f^{\prime \prime}(x_0)$, $…$, and $0!=1$.

Show the proof

Proof: Notice that
\[\begin{align}\lim_{x\to x_0}\varepsilon_{m,x_0}(x)&=\lim_{x\to x_0}\frac{f(x)-P_{m,x_0}(x)}{(x-x_0)^m}\\&=\lim_{x\to x_0}\frac{f(x)-f(x_0)-\frac{f'(x_0)}{1!}(x-x_0)-\cdots \frac{f^{(m)}(x_0)}{m!}(x-x_0)^m}{(x-x_0)^m}=\frac{0}{0}.\end{align}\]
If we apply l’Hôpital’s rule $m$ times, we obtain:
\[\lim_{x\to x_0}\frac{f^{(m)}(x)-m!\frac{f^{(m)}(x_0)}{m!}}{m!}=\lim_{x\to x_0}\frac{f^{(m)}(x)-f^{(m)}(x_0)}{m!}=0.\ \blacksquare\]

Definitions of the “Taylor polynomial” and “remainder”

$P_{m,x_0}(x)$ is called the Taylor polynomial of degree $m$ for $f(x)$ at $x_0$. The error $R_{m,x_0}(x)$ is also called the remainder term. You should verify that
\begin{align*}
P_{m,x_0}(x_0)&=f(x_0),\\
P’_{m,x_0}(x_0)&=f'(x_0),\\
P^{\prime\prime}_{m,x_0}(x_0)&=f^{\prime\prime}(x_0), \\
&\vdots\\
P^{(m)}_{m,x_0}(x_0)&=f^{(m)}(x_0).
\end{align*}
To estimate the error of this approximation, we would like to have an expression for the remainder $R_{m,x_0}(x)$. Various expressions under stronger regularity assumptions on $f$ exist in the literature. We mention one of them which is called the Lagrange form of the remainder term, after the great mathematician Joseph-Louis Lagrange.

Theorem 2. If $f^{(m+1)}$ is continuous on an open interval $I$ that contains $x_0$ and $x\in I$, then there exists a number $\xi$ between $x$ and $x_0$ such that \[R_{m,x_0}(x)=\frac{f^{(m+1)}(\xi)}{(m+1)!}(x-x_0)^{(m+1)}.\]

Show the proof

Proof: For clarity we replace $x$ by $b$ and show that there is some $\xi$ between $x_0$ and $b$ such that \[R_{m,x_0}(b)=\frac{f^{(m+1)}(\xi)}{(m+1)!}(b-x_0)^{m+1}.\] We choose $K$ such that \[f(b)=P_{m,x_0}(b)+K(b-x_0)^{m+1} \tag{*}\] and define
\[\begin{align} g(x)&=f(x)-P_{m,x_0}(x)-K(x-x_0)^{m+1}\\ &=f(x)-P_{m,x_0}(x)-\frac{f(b)-P_{m,x_0}(b)}{(b-x_0)^{m+1}}(x-x_0)^{m+1}.\end{align}\]
Notice that
\[g(x_0)=g'(x_0)=g^{\prime\prime}(x_0)=\cdots =g^{(m)}(x_0)=0\] because we constructed $P_{m,x_0}(x)$ by matching the first $m$ derivatives of $P_{m,x_0}(x)$ and $f(x)$ at $x=x_0$, and the first $n$ derivatives of $K(x-x_0)^{m+1}$ are all zero. Also note that we chose $K$ such that $g(b)=0$.

According to Rolle’s theorem, there is $\xi_1$ between $x_0$ and $b$ such that $g'(\xi_1)=0$.

Again because $g'(x_0)=g'(\xi)=0$, by Rolle’s theorem there is a number $\xi_2$ between $x_0$ and $\xi_1$ such that $g”(\xi_2)=0$. We can repeat this argument, finding a $\xi_m$ between $x_0$ and $\xi_{m-1}$ such that $g'(\xi_m)=0$. If we use this argument again, there is a number $\xi_{m+1}$ between $x_0$ and $\xi_m$ such that $g'(\xi_{m+1})=0$. Let’s evaluate $g'(x)$: \[g^{(m+1)}(x)=f^{(m+1)}(x)-(m+1)!K .\] Thus:
\[g^{(m+1)}(\xi_{m+1})=f^{(m+1)}(\xi_{m+1})-(m+1)!K=0 \Rightarrow K=\frac{f^{(m+1)}(\xi_{m+1})}{(m+1)!}.\]
Let’s put $\xi=\xi_{m+1}$. If we use the definition of $K$ in (*), we have:
\[R_{m,x_0}(b)=f(b)-P_{m,x_0}(b)=K (b-x_0)^{m+1}=\frac{f^{(m+1)}(\xi)}{(m+1)!}(b-x_0)^{m+1}.\ \blacksquare\]

Rolle’s theorem says if $f(a)=f(b)$ for $b\neq a$ and $f$ is differentiable between $a$ and $b$ and continuous on $[a,b]$, then there is at least a number $c$ such that $f^{\prime}(c)=0$. This theorem is very intuitive just by looking at the following figure.

Rolle's Theorem

We don’t know anything about $\xi$ except that $\xi$ is between $x_0$ and $x$.

If we place $x=x_0+h$, we have:

\[\bbox[#F2F2F2,5px,border:2px solid black] {\begin{align} \label{Eq:Taylor-1D} f(x_0+h)=&f(x_0)+\frac{f'(x_0)}{1!}h+\frac{f^{\prime\prime}(x_0)}{2!}h^2+\cdots+\frac{f^{(m)}(x_0)}{m!}h^m+\frac{f^{(m+1)}(x_0+\theta h)}{(m+1)!}h^{m+1}\\ \nonumber =&\sum_{k=1}^m \frac{f^{(k)}(x_0)}{k!}h^k+\frac{f^{(m+1)}(x_0+\theta h)}{(m+1)!}h^{m+1}\\ \text{for some } 0< & \theta<1.\end{align}}\]

Taylor’s Formula for Functions of Several Variables

Now we wish to extend the polynomial expansion to functions of several variables. We learned that if $f(x,y)$ is differentiable at $(x_0,y_0)$, we can approximate it with a linear function (or more accurately an affine function), $P_{1,(x_0,y_0)}(x,y)=a_0+a_1x+a_2y$. Matching the value and first partial derivatives and placing $x=x_0+h$ and $y=y_0+k$ result in \[P_{1,(x_0,y_0)}(x,y)=f(x_0,y_0)+f_x(x_0,y_0)h+f_y(x_0,y_0)k.\] For a better approximation we consider $P_{2,(x_0,y_0)}=a_0+a_1x+a_2y+b_1 x^2+b_2 xy+b_3 y^2$. Matching the zero, first, and second partial derivatives results in
\[P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+h f_x+k f_y +\frac{1}{2!}\left[h^2 f_{xx}+2 hk f_{xy}+k^2 f_{yy}\right],\]
where the partial derivatives are evaluated at $(x_0,y_0)$. The above expression can also be written as:
\[P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right) f(x,y)\right]_{{x=x_0}\atop{y=y_0}}+\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2 f(x,y)\right]_{{x=x_0}\atop{y=y_0}},\]
where
\[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2=h^2\frac{\partial^2}{\partial x^2}+2hk\frac{\partial^2}{\partial x\partial y}+k^2\frac{\partial^2}{\partial y^2}.\]

Another form of writing $P_{2,(x_0,y_0)}$ is:
\[P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+\begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y}\end{bmatrix} \begin{bmatrix} h\\ k \end{bmatrix}+\frac{1}{2!}\begin{bmatrix} h & k \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y}\\ \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial y^2} \end{bmatrix} \begin{bmatrix} h\\ k \end{bmatrix},\]

Hessian Matrix

where again the partial derivatives are evaluated at $(x_0,y_0)$. The $2\times 2$ matrix in the above expression is called the Hessian matrix and is denoted by $H(x_0,y_0)$. We will talk about it later in this section.

We can use the results for functions of a single variable to derive formulas for the Taylor polynomial and the remainder of a function $f$ of two or more variables. Assume $f(x,y)$ is continuous and has continuous partial derivatives at $(x_0,y_0)$. Let \[x=x_0+ht,\quad y=y_0+kt\] where $x_0, y_0, h$ and $k$ are treated as constants and $t$ is a variable. Then \[F(t)=f(x(t),y(t))=f(x_0+ht+y_0+kt)\] By Taylor’s formula, we have:
\[\begin{align}F(t)=&F(0)+F'(0)t+\frac{1}{2!}F^{\prime\prime}(0) t^2+\frac{1}{3!}F^{\prime\prime\prime}(0) t+\cdots\\ +& \frac{1}{m!}F^{(m)}(t) t^m+\frac{1}{(m+1)!}F^{(m+1)}(\xi)\ t^{m+1}\tag{$\dagger$}\end{align}\]
where $\xi$ is a number between 0 and $t$. Using the chain rule, as we saw before, we have:
\[\begin{align} F'(t)=&\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}\\ &=h \frac{\partial f}{\partial x}+k\frac{\partial f}{\partial y}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)f\end{align}\]
We can show
\[\begin{align} F^{\prime\prime}(t)&=h^2 \frac{\partial^2 f}{\partial x^2}+2hk\frac{\partial^2 f}{\partial x \partial y}+k^2\frac{\partial^2 f}{\partial y^2}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2f,\end{align}\]
the third derivative is
\[\begin{align} F^{\prime\prime\prime}(t)&=h^3 \frac{\partial^3 f}{\partial x^3}+3h^2k\frac{\partial^3 f}{\partial x^2 \partial y}+3hk^2\frac{\partial^3 f}{\partial x\partial y^2}+k^3\frac{\partial^3 f}{\partial y^3}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^3f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^3f\end{align}\]
and in general:
\[\begin{align} F^{(m)}(t)&=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^mf \tag{**}\end{align}\]

This may be proved by induction.

Therefore

Induction has two steps: (1) We prove (**) holds true for $n=1$, (2) We prove that if (**) is true for any value $m=k$, then it is also true for $m=k+1$

\[F^{(m)}(0)=\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf(x,y)\right]_{{x=x_0}\atop{y=y_0}}=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^m f(x_0,y_0),\]and
\[ F^{(m+1)}(\xi)=\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^{m+1}f(x,y)\right]_{{x=x_0+\xi h}\atop{y=y_0+\xi k}}=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^{m+1} f(x_0+\xi h,y_0+\xi k)\]
Substituting in ($\dagger$), we have
\[\begin{align} F(t)=&\Big[f\Big]_{{x=x_0}\atop{y=y_0}}+t \Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)f\Bigg]_{{x=x_0}\atop{y=y_0}}+\cdots+\frac{t^m}{m!}\Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf\Bigg]_{{x=x_0}\atop{y=y_0}}\\ &\hspace{0.5cm}+\frac{t^{m+1}}{(m+1)!}\Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^{m+1}f\Bigg]_{{x=x_0+\xi h}\atop{y=y_0+\xi k}}\\
=&f(x_0,y_0)+t \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)f(x_0,y_0)+\cdots+\frac{t^m}{m!} \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)^m f(x_0,y_0)\\ &+\frac{t^{m+1}}{(m+1)!} \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)^{m+1} f(x_0+\xi h,y_0+\xi k) \end{align}\]
for some $\xi$ between 0 and $t$. Because this is true for all values of $t$, we can plug $t=1$ in and find the Taylor’s formula for functions of two variables. Here we proved the following theorem for $n=2$, but generalization is easy.

Theorem 3. Let $f:U\to\mathbb{R}$ where $U\subseteq \mathbb{R}^n$ is an open set. Suppose $f$ has continuous partial derivatives up (at least) to order $m+1$, and consider $\mathbf{x}_0\in U$ and $\mathbf{h}\in\mathbb{R}^n$ such that $\mathbf{x}_0+t\mathbf{h}\in U$ for $0\leq t\leq 1$. Then there is a number $0<\theta<1$ such that
\[\begin{align} f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x})&+\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)f(\mathbf{x}_0)+\frac{1}{2!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2 f(\mathbf{x}_0)+\cdots \\ &+\frac{1}{m!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^m f(\mathbf{x}_0)+\frac{1}{(m+1)!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^{m+1} f(\mathbf{x}_0+\theta \mathbf{h})\end{align}\]

Another form of writing Taylor’s formula is:

for some $0<\theta<1$.

If we place $\mathbf{h}=\mathbf{x}-\mathbf{x}_0$ in the above formula, the polynomial that we obtain is called the polynomial approximation of $f$ of degree $m$ at $\mathbf{x}_0$.

Example 1

Given $f(x,y)=\sin x\ e^{y-2x}$, find a second degree polynomial approximation to $f$ near the point $(\frac{\pi}{2},\pi)$ and use it to estimate the value $f(0.95\frac{\pi}{2},1.1\pi)$

Solution

For the polynomial approximation of degree 2, we need to find the first and second partial derivatives of $f$.

\[\begin{align} &f(x,y)=\sin x\ e^{y-2x} \Rightarrow f\left(\frac{\pi}{2},\pi\right)=1,\\ &f_{x}(x,y)=\cos x\ e^{y-2x}-2\sin x\ e^{y-2x}\Rightarrow f_x\left(\frac{\pi}{2},\pi\right)=-2,\\ &f_y(x,y)=\sin x\ e^{y-2x}\Rightarrow f_y\left(\frac{\pi}{2},\pi\right)=1,\\ &f_{xx}(x,y)=3\sin x\ e^{y-2x}-4 \cos x\ e^{y-2x}\Rightarrow f_{xx}\left(\frac{\pi}{2},\pi\right)=3,\\ &f_{xy}(x,y)=\cos x\ e^{y-2x}-2\sin x\ e^{y-2x}\Rightarrow f_{xy}\left(\frac{\pi}{2},\pi\right)=-2,\\ &f_{yy}(x,y)=\sin x\ e^{y-2x} \Rightarrow f_{yy}\left(\frac{\pi}{2},\pi\right)=1.\end{align}\]
Thus
\[f\left(\frac{\pi}{2}+h,\pi+k\right)\approx 1-2 h+k+\frac{1}{2!}\left(3 h^2-2\times 2 h k+k^2\right).\]
If we place $x=\frac{\pi}{2}+h$ and $y=\pi+k$, we obtain the second degree polynomial approximation of $f$ near $(\pi/2,\pi)$
\[\begin{align}f(x,y)\approx & P_{2,\left(\frac{\pi}{2},\pi\right)}(x,y)=1-2\left(x-\frac{\pi}{2}\right)+(y-\pi)+\\ &\frac{3}{2}\left(x-\frac{\pi}{2}\right)^2-2\left(x-\frac{\pi}{2}\right)(y-\pi)+\frac{1}{2}(y-\pi)^2\end{align}\]
Therefore:
\[\begin{align}f\left(\frac{0.95}{2}\pi,1.1\pi\right)\approx & 1-2(-0.025\pi)+0.1\pi+\frac{3}{2}(-0.025\pi)^2 \\ & -2(-0.025\pi)(0.1\pi)+\frac{1}{2}(0.1\pi)^2\approx 1.57919.\end{align}\]
The exact value of $f\left(\frac{0.95}{2}\pi,1.1\pi\right)$ is 1.59704. The error of this approximation is $\approx 1.12\%$, while if we used the linear approximation $P_{1,\left(\frac{\pi}{2},\pi\right)}=1-2\left(x-\frac{\pi}{2}\right)+(y-\pi)$, the error would be $\approx 7.88\%$.

The quadratic term $\sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf{x}_0)h_i h_j$ can be written as
\[\begin{bmatrix} h_1 & h_2 & \cdots & h_n\end{bmatrix} \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n}\\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n}\\ \vdots & \vdots &\ddots & \vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} \begin{bmatrix} h_1 \\ h_2 \\ \\ \vdots \\ \\ h_n\end{bmatrix}\]
Similar to the case of $n=2$, the $n\times n$ matrix of the second order derivatives $\left[\frac{\partial^2 f}{\partial x_i \partial x_j}\right]_{n\times n}$ is called the Hessian matrix and is denoted by $H(\mathbf{x})$. Therefore, we can write:
\[\sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf{x}_0)h_i h_j=\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2 f(\mathbf{x}_0)=\mathbf{h}^T H(\mathbf{x}_0) \mathbf{h},\]
where $\mathbf{h}$ is considered an $n\times 1$ column matrix.¹ Using this notation, the linear expansion of $f$ can be written as:
\[f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x}_0)+\mathbf{h}\boldsymbol{\cdot} \nabla f(\mathbf{x}_0)+\mathbf{h}^T H(\mathbf{x}_0+\theta \mathbf{h})\, \mathbf{h}.\]

¹ Some books consider $\mathbf{h}$ as a $1 × n$ row matrix. When there is any ambiguity, it is better to write it as$\langle \mathbf{h}\vert H(\mathbf{x}_0)\vert \mathbf{h}\rangle$↩

Review of Taylor’s Formula for Functions of a Single Variable

Show the proof

Hide the proof

Definitions of the “Taylor polynomial” and “remainder”

Show the proof

Hide the proof

Taylor’s Formula for Functions of Several Variables

Hessian Matrix