Review of Taylor’s Formula for Functions of a Single Variable

Let’s review the Taylor series for single variable functions. Suppose \(y=f(x)\) is differentiable at \(x_0\), then it has a linear approximation at \(x_0\), and we have:
\[f(x)=f(x_0)+f'(x_0)(x-x_0)+(x-x_0) \varepsilon_1(x-x_0),\] with\[\lim_{x\to x_0}\varepsilon_1(x-x_0)=0.\] Therefore, \[P_{1,x_0}(x)=f(x_0)+f'(x_0)(x-x_0)\] is the linear approximation and the error in this approximation is: \[R_{1,x_0}(x)=(x-x_0) \varepsilon_1(x-x_0).\] The subscripts \(1\) and \(x_0\) in \(P_{1,x_0}\) and \(R_{1,x_0}\) show the maximum power of \(x\) that appears in the polynomial and the point about which we approximate the function \(f\).

Note that as \(x\to x_0\), \(R_{1,x_0}(x)\) tends to 0 faster than \((x-x_0)\) because \[R_{1,x_0}(x)/(x-x_0)=\varepsilon_{1,x_0}(x-x_0)\to 0.\] Mathematically we write \[R_{1,x_0}(x)={\color{red} o}\left(x-x_0\right).\]


If we want a better approximation of \(f\), instead of a linear function, we may use a quadratic function \[P_{2,x_0}(x)=a_0+a_1 x+a_2 x^2.\] To determine the coefficients \(a_0, a_1\) and \(a_2\), we match the values of the functions and their first and second derivatives at \(x=x_0\). This means the graph of \(P_{2,x_0}(x)\) has the same value, the same slope and the same concavity as the graph of \(f\) at \(x=x_0\). The quadratic polynomial reads:
\[P_{2,x_0}(x)=f(x_0)+f'(x_0)(x-x_0)+\frac{1}{2!} f^{\prime\prime}(x_0)(x-x_0)^2,\]
and the error is
\[R_{2,x_0}(x)=(x-x_0)^2\varepsilon_{2,x_0}(x-x_0), \quad \text{where}\quad \lim_{x\to x_0}\varepsilon_{2,x_0}(x-x_0)=0.\]

We still can improve approximations of \(f\) at \(x=x_0\) by using higher order polynomials and matching more derivatives at the selected base point. If we use a polynomial of order \(m\), we can prove (see the following theorem) that the error in the approximation goes to zero faster than \((x-x_0)^m\) as \(x\to x_0\). Mathematically we write \(R_{m,x_0}(x)=o\left((x-x_0)^m\right)\) which means \(R_{m,x_0}(x)/(x-x_0)^m\to 0\) as \(x\to x_0\). In general, we have the following theorem:

Theorem 1. Suppose \(f\) is a function that is \(m(\geq1)\) times differentiable at \(x_0\); that is \[f'(x_0), f^{\prime\prime}(x_0), \cdots, f^{(m)}(x_0)\] all exist. Let \[P_{m,x_0}(x)=a_0+a_1(x-x_0)+\cdots+a_m(x-x_0)^m,\] where \[a_k=\frac{f^{(k)}(x_0)}{k!}, \quad 0\leq k\leq m,\] and \[\varepsilon_{m,x_0}(x)=\frac{f(x)-P_{m,x_0}(x)}{(x-x_0)^m}.\] Then \(\lim_{x\to x_0}\varepsilon_{m,x_0}(x)=0\).
  • Remark that \(f^{(0)}(x_0)=f(x_0)\), \(f^{(1)}(x_0)=f^{\prime}(x_0)\), \(f^{(2)}(x_0)=f^{\prime \prime}(x_0)\), \(…\), and \(0!=1\).

Show the proof

Proof: Notice that
\[\begin{align}\lim_{x\to x_0}\varepsilon_{m,x_0}(x)&=\lim_{x\to x_0}\frac{f(x)-P_{m,x_0}(x)}{(x-x_0)^m}\\&=\lim_{x\to x_0}\frac{f(x)-f(x_0)-\frac{f'(x_0)}{1!}(x-x_0)-\cdots \frac{f^{(m)}(x_0)}{m!}(x-x_0)^m}{(x-x_0)^m}=\frac{0}{0}.\end{align}\]
If we apply l’Hôpital’s rule \(m\) times, we obtain:
\[\lim_{x\to x_0}\frac{f^{(m)}(x)-m!\frac{f^{(m)}(x_0)}{m!}}{m!}=\lim_{x\to x_0}\frac{f^{(m)}(x)-f^{(m)}(x_0)}{m!}=0.\ \blacksquare\]


Definitions of  the “Taylor polynomial” and “remainder”

\(P_{m,x_0}(x)\) is called the Taylor polynomial of degree \(m\) for \(f(x)\) at \(x_0\). The error \(R_{m,x_0}(x)\) is also called the remainder term. You should verify that
P^{\prime\prime}_{m,x_0}(x_0)&=f^{\prime\prime}(x_0), \\
To estimate the error of this approximation, we would like to have an expression for the remainder \(R_{m,x_0}(x)\). Various expressions under stronger regularity assumptions on \(f\) exist in the literature. We mention one of them which is called the Lagrange form of the remainder term, after the great mathematician Joseph-Louis Lagrange.

Theorem 2. If \(f^{(m+1)}\) is continuous on an open interval \(I\) that contains \(x_0\) and \(x\in I\), then there exists a number \(\xi\) between \(x\) and \(x_0\) such that \[R_{m,x_0}(x)=\frac{f^{(m+1)}(\xi)}{(m+1)!}(x-x_0)^{(m+1)}.\]

Show the proof

Proof: For clarity we replace \(x\) by \(b\) and show that there is some \(\xi\) between \(x_0\) and \(b\) such that \[R_{m,x_0}(b)=\frac{f^{(m+1)}(\xi)}{(m+1)!}(b-x_0)^{m+1}.\] We choose \(K\) such that \[f(b)=P_{m,x_0}(b)+K(b-x_0)^{m+1} \tag{*}\] and define
\[\begin{align} g(x)&=f(x)-P_{m,x_0}(x)-K(x-x_0)^{m+1}\\ &=f(x)-P_{m,x_0}(x)-\frac{f(b)-P_{m,x_0}(b)}{(b-x_0)^{m+1}}(x-x_0)^{m+1}.\end{align}\]
Notice that
\[g(x_0)=g'(x_0)=g^{\prime\prime}(x_0)=\cdots =g^{(m)}(x_0)=0\] because we constructed \(P_{m,x_0}(x)\) by matching the first \(m\) derivatives of \(P_{m,x_0}(x)\) and \(f(x)\) at \(x=x_0\), and the first \(n\) derivatives of \(K(x-x_0)^{m+1}\) are all zero. Also note that we chose \(K\) such that \(g(b)=0\).

According to Rolle’s theorem, there is \(\xi_1\) between \(x_0\) and \(b\) such that \(g'(\xi_1)=0\).

Again because \(g'(x_0)=g'(\xi)=0\), by Rolle’s theorem there is a number \(\xi_2\) between \(x_0\) and \(\xi_1\) such that \(g”(\xi_2)=0\). We can repeat this argument, finding a \(\xi_m\) between \(x_0\) and \(\xi_{m-1}\) such that \(g'(\xi_m)=0\). If we use this argument again, there is a number \(\xi_{m+1}\) between \(x_0\) and \(\xi_m\) such that \(g'(\xi_{m+1})=0\). Let’s evaluate \(g'(x)\): \[g^{(m+1)}(x)=f^{(m+1)}(x)-(m+1)!K .\] Thus:
\[g^{(m+1)}(\xi_{m+1})=f^{(m+1)}(\xi_{m+1})-(m+1)!K=0 \Rightarrow K=\frac{f^{(m+1)}(\xi_{m+1})}{(m+1)!}.\]
Let’s put \(\xi=\xi_{m+1}\). If we use the definition of \(K\) in (*), we have:
\[R_{m,x_0}(b)=f(b)-P_{m,x_0}(b)=K (b-x_0)^{m+1}=\frac{f^{(m+1)}(\xi)}{(m+1)!}(b-x_0)^{m+1}.\ \blacksquare\]

Rolle’s theorem says if \(f(a)=f(b)\) for \(b\neq a\) and \(f\) is differentiable between \(a\) and \(b\) and continuous on \([a,b]\), then there is at least a number \(c\) such that \(f^{\prime}(c)=0\). This theorem is very intuitive just by looking at the following figure.

Rolle's Theorem

We don’t know anything about \(\xi\) except that \(\xi\) is between \(x_0\) and \(x\).

If we place \(x=x_0+h\), we have:

\[\bbox[#F2F2F2,5px,border:2px solid black] {\begin{align} \label{Eq:Taylor-1D} f(x_0+h)=&f(x_0)+\frac{f'(x_0)}{1!}h+\frac{f^{\prime\prime}(x_0)}{2!}h^2+\cdots+\frac{f^{(m)}(x_0)}{m!}h^m+\frac{f^{(m+1)}(x_0+\theta h)}{(m+1)!}h^{m+1}\\ \nonumber =&\sum_{k=1}^m \frac{f^{(k)}(x_0)}{k!}h^k+\frac{f^{(m+1)}(x_0+\theta h)}{(m+1)!}h^{m+1}\\ \text{for some } 0< & \theta<1.\end{align}}\]



Taylor’s Formula for Functions of Several Variables


Now we wish to extend the polynomial expansion to functions of several variables. We learned that if \(f(x,y)\) is differentiable at \((x_0,y_0)\), we can approximate it with a linear function (or more accurately an affine function), \(P_{1,(x_0,y_0)}(x,y)=a_0+a_1x+a_2y\). Matching the value and first partial derivatives and placing \(x=x_0+h\) and \(y=y_0+k\) result in \[P_{1,(x_0,y_0)}(x,y)=f(x_0,y_0)+f_x(x_0,y_0)h+f_y(x_0,y_0)k.\] For a better approximation we consider \(P_{2,(x_0,y_0)}=a_0+a_1x+a_2y+b_1 x^2+b_2 xy+b_3 y^2\). Matching the zero, first, and second partial derivatives results in
\[P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+h f_x+k f_y +\frac{1}{2!}\left[h^2 f_{xx}+2 hk f_{xy}+k^2 f_{yy}\right],\]
where the partial derivatives are evaluated at \((x_0,y_0)\). The above expression can also be written as:
\[P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right) f(x,y)\right]_{{x=x_0}\atop{y=y_0}}+\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2 f(x,y)\right]_{{x=x_0}\atop{y=y_0}},\]
\[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2=h^2\frac{\partial^2}{\partial x^2}+2hk\frac{\partial^2}{\partial x\partial y}+k^2\frac{\partial^2}{\partial y^2}.\]

Another form of writing \(P_{2,(x_0,y_0)}\) is:
\[P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+\begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y}\end{bmatrix} \begin{bmatrix} h\\ k \end{bmatrix}+\frac{1}{2!}\begin{bmatrix} h & k \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y}\\ \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial y^2} \end{bmatrix} \begin{bmatrix} h\\ k \end{bmatrix},\]

Hessian Matrix

where again the partial derivatives are evaluated at \((x_0,y_0)\). The \(2\times 2\) matrix in the above expression is called the Hessian matrix and is denoted by \(H(x_0,y_0)\). We will talk about it later in this section.

We can use the results for functions of a single variable to derive formulas for the Taylor polynomial and the remainder of a function \(f\) of two or more variables. Assume \(f(x,y)\) is continuous and has continuous partial derivatives at \((x_0,y_0)\). Let \[x=x_0+ht,\quad y=y_0+kt\] where \(x_0, y_0, h\) and \(k\) are treated as constants and \(t\) is a variable. Then \[F(t)=f(x(t),y(t))=f(x_0+ht+y_0+kt)\] By Taylor’s formula, we have:
\[\begin{align}F(t)=&F(0)+F'(0)t+\frac{1}{2!}F^{\prime\prime}(0) t^2+\frac{1}{3!}F^{\prime\prime\prime}(0) t+\cdots\\ +& \frac{1}{m!}F^{(m)}(t) t^m+\frac{1}{(m+1)!}F^{(m+1)}(\xi)\ t^{m+1}\tag{$\dagger$}\end{align}\]
where \(\xi\) is a number between 0 and \(t\). Using the chain rule, as we saw before, we have:
\[\begin{align} F'(t)=&\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}\\ &=h \frac{\partial f}{\partial x}+k\frac{\partial f}{\partial y}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)f\end{align}\]
We can show
\[\begin{align} F^{\prime\prime}(t)&=h^2 \frac{\partial^2 f}{\partial x^2}+2hk\frac{\partial^2 f}{\partial x \partial y}+k^2\frac{\partial^2 f}{\partial y^2}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2f,\end{align}\]
the third derivative is 
\[\begin{align} F^{\prime\prime\prime}(t)&=h^3 \frac{\partial^3 f}{\partial x^3}+3h^2k\frac{\partial^3 f}{\partial x^2 \partial y}+3hk^2\frac{\partial^3 f}{\partial x\partial y^2}+k^3\frac{\partial^3 f}{\partial y^3}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^3f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^3f\end{align}\]
and in general:
\[\begin{align} F^{(m)}(t)&=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^mf \tag{**}\end{align}\]

This may be proved by induction.


Induction has two steps: (1) We prove (**) holds true for \(n=1\), (2) We prove that if (**) is true for any value \(m=k\), then it is also true for \(m=k+1\)

\[F^{(m)}(0)=\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf(x,y)\right]_{{x=x_0}\atop{y=y_0}}=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^m f(x_0,y_0),\]and
\[ F^{(m+1)}(\xi)=\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^{m+1}f(x,y)\right]_{{x=x_0+\xi h}\atop{y=y_0+\xi k}}=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^{m+1} f(x_0+\xi h,y_0+\xi k)\]
Substituting in (\(\dagger\)), we have
\[\begin{align} F(t)=&\Big[f\Big]_{{x=x_0}\atop{y=y_0}}+t \Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)f\Bigg]_{{x=x_0}\atop{y=y_0}}+\cdots+\frac{t^m}{m!}\Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf\Bigg]_{{x=x_0}\atop{y=y_0}}\\ &\hspace{0.5cm}+\frac{t^{m+1}}{(m+1)!}\Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^{m+1}f\Bigg]_{{x=x_0+\xi h}\atop{y=y_0+\xi k}}\\
=&f(x_0,y_0)+t \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)f(x_0,y_0)+\cdots+\frac{t^m}{m!} \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)^m f(x_0,y_0)\\ &+\frac{t^{m+1}}{(m+1)!} \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)^{m+1} f(x_0+\xi h,y_0+\xi k) \end{align}\]

for some \(\xi\) between 0 and \(t\). Because this is true for all values of \(t\), we can plug \(t=1\) in and find the Taylor’s formula for functions of two variables. Here we proved the following theorem for \(n=2\), but generalization is easy.

Theorem 3. Let \(f:U\to\mathbb{R}\) where \(U\subseteq \mathbb{R}^n\) is an open set. Suppose \(f\) has continuous partial derivatives up (at least) to order \(m+1\), and consider \(\mathbf{x}_0\in U\) and \(\mathbf{h}\in\mathbb{R}^n\) such that \(\mathbf{x}_0+t\mathbf{h}\in U\) for \(0\leq t\leq 1\). Then there is a number \(0<\theta<1\) such that
\[\begin{align} f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x})&+\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)f(\mathbf{x}_0)+\frac{1}{2!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2 f(\mathbf{x}_0)+\cdots \\ &+\frac{1}{m!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^m f(\mathbf{x}_0)+\frac{1}{(m+1)!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^{m+1} f(\mathbf{x}_0+\theta \mathbf{h})\end{align}\]

Another form of writing Taylor’s formula is:

for some $0<\theta<1$.

If we place \(\mathbf{h}=\mathbf{x}-\mathbf{x}_0\) in the above formula, the polynomial that we obtain is called the polynomial approximation of \(f\) of degree \(m\) at \(\mathbf{x}_0\).


Example 1
Given \(f(x,y)=\sin x\ e^{y-2x}\), find a second degree polynomial approximation to \(f\) near the point \((\frac{\pi}{2},\pi)\) and use it to estimate the value \(f(0.95\frac{\pi}{2},1.1\pi)\)

For the polynomial approximation of degree 2, we need to find the first and second partial derivatives of \(f\).

\[\begin{align} &f(x,y)=\sin x\ e^{y-2x} \Rightarrow f\left(\frac{\pi}{2},\pi\right)=1,\\ &f_{x}(x,y)=\cos x\ e^{y-2x}-2\sin x\ e^{y-2x}\Rightarrow f_x\left(\frac{\pi}{2},\pi\right)=-2,\\ &f_y(x,y)=\sin x\ e^{y-2x}\Rightarrow f_y\left(\frac{\pi}{2},\pi\right)=1,\\ &f_{xx}(x,y)=3\sin x\ e^{y-2x}-4 \cos x\ e^{y-2x}\Rightarrow f_{xx}\left(\frac{\pi}{2},\pi\right)=3,\\ &f_{xy}(x,y)=\cos x\ e^{y-2x}-2\sin x\ e^{y-2x}\Rightarrow f_{xy}\left(\frac{\pi}{2},\pi\right)=-2,\\ &f_{yy}(x,y)=\sin x\ e^{y-2x} \Rightarrow f_{yy}\left(\frac{\pi}{2},\pi\right)=1.\end{align}\]
\[f\left(\frac{\pi}{2}+h,\pi+k\right)\approx 1-2 h+k+\frac{1}{2!}\left(3 h^2-2\times 2 h k+k^2\right).\]
If we place \(x=\frac{\pi}{2}+h\) and \(y=\pi+k\), we obtain the second degree polynomial approximation of \(f\) near \((\pi/2,\pi)\)
\[\begin{align}f(x,y)\approx & P_{2,\left(\frac{\pi}{2},\pi\right)}(x,y)=1-2\left(x-\frac{\pi}{2}\right)+(y-\pi)+\\ &\frac{3}{2}\left(x-\frac{\pi}{2}\right)^2-2\left(x-\frac{\pi}{2}\right)(y-\pi)+\frac{1}{2}(y-\pi)^2\end{align}\]
\[\begin{align}f\left(\frac{0.95}{2}\pi,1.1\pi\right)\approx & 1-2(-0.025\pi)+0.1\pi+\frac{3}{2}(-0.025\pi)^2 \\ & -2(-0.025\pi)(0.1\pi)+\frac{1}{2}(0.1\pi)^2\approx 1.57919.\end{align}\]
The exact value of \(f\left(\frac{0.95}{2}\pi,1.1\pi\right)\) is 1.59704. The error of this approximation is \(\approx 1.12\%\), while if we used the linear approximation \(P_{1,\left(\frac{\pi}{2},\pi\right)}=1-2\left(x-\frac{\pi}{2}\right)+(y-\pi)\), the error would be \(\approx 7.88\%\).

The quadratic term \(\sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf{x}_0)h_i h_j\) can be written as
\[\begin{bmatrix} h_1 & h_2 & \cdots & h_n\end{bmatrix} \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n}\\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n}\\ \vdots & \vdots &\ddots & \vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} \begin{bmatrix} h_1 \\ h_2 \\ \\ \vdots \\ \\ h_n\end{bmatrix}\]
Similar to the case of \(n=2\), the \(n\times n\) matrix of the second order derivatives \(\left[\frac{\partial^2 f}{\partial x_i \partial x_j}\right]_{n\times n}\) is called the Hessian matrix and is denoted by \(H(\mathbf{x})\). Therefore, we can write:
\[\sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf{x}_0)h_i h_j=\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2 f(\mathbf{x}_0)=\mathbf{h}^T H(\mathbf{x}_0) \mathbf{h},\]
where \(\mathbf{h}\) is considered an \(n\times 1\) column matrix.1 Using this notation, the linear expansion of \(f\) can be written as:
\[f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x}_0)+\mathbf{h}\boldsymbol{\cdot} \nabla f(\mathbf{x}_0)+\mathbf{h}^T H(\mathbf{x}_0+\theta \mathbf{h})\, \mathbf{h}.\]

1 Some books consider \(\mathbf{h}\) as a \(1 × n\) row matrix. When there is any ambiguity, it is better to write it as\(\langle \mathbf{h}\vert H(\mathbf{x}_0)\vert \mathbf{h}\rangle\)