## Review of Taylor’s Formula for Functions of a Single Variable

Let’s review the Taylor series for single variable functions. Suppose $$y=f(x)$$ is differentiable at $$x_0$$, then it has a linear approximation at $$x_0$$, and we have:
$f(x)=f(x_0)+f'(x_0)(x-x_0)+(x-x_0) \varepsilon_1(x-x_0),$ with$\lim_{x\to x_0}\varepsilon_1(x-x_0)=0.$ Therefore, $P_{1,x_0}(x)=f(x_0)+f'(x_0)(x-x_0)$ is the linear approximation and the error in this approximation is: $R_{1,x_0}(x)=(x-x_0) \varepsilon_1(x-x_0).$ The subscripts $$1$$ and $$x_0$$ in $$P_{1,x_0}$$ and $$R_{1,x_0}$$ show the maximum power of $$x$$ that appears in the polynomial and the point about which we approximate the function $$f$$.

Note that as $$x\to x_0$$, $$R_{1,x_0}(x)$$ tends to 0 faster than $$(x-x_0)$$ because $R_{1,x_0}(x)/(x-x_0)=\varepsilon_{1,x_0}(x-x_0)\to 0.$ Mathematically we write $R_{1,x_0}(x)={\color{red} o}\left(x-x_0\right).$

If we want a better approximation of $$f$$, instead of a linear function, we may use a quadratic function $P_{2,x_0}(x)=a_0+a_1 x+a_2 x^2.$ To determine the coefficients $$a_0, a_1$$ and $$a_2$$, we match the values of the functions and their first and second derivatives at $$x=x_0$$. This means the graph of $$P_{2,x_0}(x)$$ has the same value, the same slope and the same concavity as the graph of $$f$$ at $$x=x_0$$. The quadratic polynomial reads:
$P_{2,x_0}(x)=f(x_0)+f'(x_0)(x-x_0)+\frac{1}{2!} f”(x_0)(x-x_0)^2,$
and the error is
$R_{2,x_0}(x)=(x-x_0)^2\varepsilon_{2,x_0}(x-x_0), \quad \text{where}\quad \lim_{x\to x_0}\varepsilon_{2,x_0}(x-x_0)=0.$

We still can improve approximations of $$f$$ at $$x=x_0$$ by using higher order polynomials and matching more derivatives at the selected base point. If we use a polynomial of order $$m$$, we can prove (see the following theorem) that the error in the approximation goes to zero faster than $$(x-x_0)^m$$ as $$x\to x_0$$. Mathematically we write $$R_{m,x_0}(x)=o\left((x-x_0)^m\right)$$ which means $$R_{m,x_0}(x)/(x-x_0)^m\to 0$$ as $$x\to x_0$$. In general, we have the following theorem:

Theorem 1. Suppose $$f$$ is a function that is $$m(\geq1)$$ times differentiable at $$x_0$$; that is $f'(x_0), f^{\prime\prime}(x_0), \cdots, f^{(m)}(x_0)$ all exist. Let $P_{m,x_0}(x)=a_0+a_1(x-x_0)+\cdots+a_m(x-x_0)^m,$ where $a_k=\frac{f^{(k)}(x_0)}{k!}, \quad 0\leq k\leq m,$ and $\varepsilon_{m,x_0}(x)=\frac{f(x)-P_{m,x_0}(x)}{(x-x_0)^m}.$ Then $$\lim_{x\to x_0}\varepsilon_{m,x_0}(x)=0$$.
• Remark that $$f^{(0)}(x_0)=f(x_0)$$, $$f^{(1)}(x_0)=f^{\prime}(x_0)$$, $$f^{(2)}(x_0)=f^{\prime \prime}(x_0)$$, $$…$$, and $$0!=1$$.

#### Show the proof

Proof: Notice that
\begin{align}\lim_{x\to x_0}\varepsilon_{m,x_0}(x)&=\lim_{x\to x_0}\frac{f(x)-P_{m,x_0}(x)}{(x-x_0)^m}\\&=\lim_{x\to x_0}\frac{f(x)-f(x_0)-\frac{f'(x_0)}{1!}(x-x_0)-\cdots \frac{f^{(m)}(x_0)}{m!}(x-x_0)^m}{(x-x_0)^m}=\frac{0}{0}.\end{align}
If we apply l’Hôpital’s rule $$m$$ times, we obtain:
$\lim_{x\to x_0}\frac{f^{(m)}(x)-m!\frac{f^{(m)}(x_0)}{m!}}{m!}=\lim_{x\to x_0}\frac{f^{(m)}(x)-f^{(m)}(x_0)}{m!}=0.\ \blacksquare$

##### Definitions of  the “Taylor polynomial” and “remainder”

$$P_{m,x_0}(x)$$ is called the Taylor polynomial of degree $$m$$ for $$f(x)$$ at $$x_0$$. The error $$R_{m,x_0}(x)$$ is also called the remainder term. You should verify that
\begin{align*}
P_{m,x_0}(x_0)&=f(x_0),\\
P’_{m,x_0}(x_0)&=f'(x_0),\\
P^{\prime\prime}_{m,x_0}(x_0)&=f^{\prime\prime}(x_0), \\
&\vdots\\
P^{(m)}_{m,x_0}(x_0)&=f^{(m)}(x_0).
\end{align*}
To estimate the error of this approximation, we would like to have an expression for the remainder $$R_{m,x_0}(x)$$. Various expressions under stronger regularity assumptions on $$f$$ exist in the literature. We mention one of them which is called the Lagrange form of the remainder term, after the great mathematician Joseph-Louis Lagrange.

Theorem 2. If $$f^{(m+1)}$$ is continuous on an open interval $$I$$ that contains $$x_0$$ and $$x\in I$$, then there exists a number $$\xi$$ between $$x$$ and $$x_0$$ such that $R_{m,x_0}(x)=\frac{f^{(m+1)}(\xi)}{(m+1)!}(x-x_0)^{(m+1)}.$

#### Show the proof

Proof: For clarity we replace $$x$$ by $$b$$ and show that there is some $$\xi$$ between $$x_0$$ and $$b$$ such that $R_{m,x_0}(b)=\frac{f^{(m+1)}(\xi)}{(m+1)!}(b-x_0)^{m+1}.$ We choose $$K$$ such that $f(b)=P_{m,x_0}(b)+K(b-x_0)^{m+1} \tag{*}$ and define
\begin{align} g(x)&=f(x)-P_{m,x_0}(x)-K(x-x_0)^{m+1}\\ &=f(x)-P_{m,x_0}(x)-\frac{f(b)-P_{m,x_0}(b)}{(b-x_0)^{m+1}}(x-x_0)^{m+1}.\end{align}
Notice that
$g(x_0)=g'(x_0)=g^{\prime\prime}(x_0)=\cdots =g^{(m)}(x_0)=0$ because we constructed $$P_{m,x_0}(x)$$ by matching the first $$m$$ derivatives of $$P_{m,x_0}(x)$$ and $$f(x)$$ at $$x=x_0$$, and the first $$n$$ derivatives of $$K(x-x_0)^{m+1}$$ are all zero. Also note that we chose $$K$$ such that $$g(b)=0$$.

According to Rolle’s theorem, there is $$\xi_1$$ between $$x_0$$ and $$b$$ such that $$g'(\xi_1)=0$$.

Again because $$g'(x_0)=g'(\xi)=0$$, by Rolle’s theorem there is a number $$\xi_2$$ between $$x_0$$ and $$\xi_1$$ such that $$g”(\xi_2)=0$$. We can repeat this argument, finding a $$\xi_m$$ between $$x_0$$ and $$\xi_{m-1}$$ such that $$g'(\xi_m)=0$$. If we use this argument again, there is a number $$\xi_{m+1}$$ between $$x_0$$ and $$\xi_m$$ such that $$g'(\xi_{m+1})=0$$. Let’s evaluate $$g'(x)$$: $g^{(m+1)}(x)=f^{(m+1)}(x)-(m+1)!K .$ Thus:
$g^{(m+1)}(\xi_{m+1})=f^{(m+1)}(\xi_{m+1})-(m+1)!K=0 \Rightarrow K=\frac{f^{(m+1)}(\xi_{m+1})}{(m+1)!}.$
Let’s put $$\xi=\xi_{m+1}$$. If we use the definition of $$K$$ in (*), we have:
$R_{m,x_0}(b)=f(b)-P_{m,x_0}(b)=K (b-x_0)^{m+1}=\frac{f^{(m+1)}(\xi)}{(m+1)!}(b-x_0)^{m+1}.\ \blacksquare$

Rolle’s theorem says if $$f(a)=f(b)$$ for $$b\neq a$$ and $$f$$ is differentiable between $$a$$ and $$b$$ and continuous on $$[a,b]$$, then there is at least a number $$c$$ such that $$f^{\prime}(c)=0$$. This theorem is very intuitive just by looking at the following figure. We don’t know anything about $$\xi$$ except that $$\xi$$ is between $$x_0$$ and $$x$$.

If we place $$x=x_0+h$$, we have:

\bbox[#F2F2F2,5px,border:2px solid black] {\begin{align} \label{Eq:Taylor-1D} f(x_0+h)=&f(x_0)+\frac{f'(x_0)}{1!}h+\frac{f^{\prime\prime}(x_0)}{2!}h^2+\cdots+\frac{f^{(m)}(x_0)}{m!}h^m+\frac{f^{(m+1)}(x_0+\theta h)}{(m+1)!}h^{m+1}\\ \nonumber =&\sum_{k=1}^m \frac{f^{(k)}(x_0)}{k!}h^k+\frac{f^{(m+1)}(x_0+\theta h)}{(m+1)!}h^{m+1}\\ \text{for some } 0< & \theta<1.\end{align}}

## Taylor’s Formula for Functions of Several Variables

Now we wish to extend the polynomial expansion to functions of several variables. We learned that if $$f(x,y)$$ is differentiable at $$(x_0,y_0)$$, we can approximate it with a linear function (or more accurately an affine function), $$P_{1,(x_0,y_0)}(x,y)=a_0+a_1x+a_2y$$. Matching the value and first partial derivatives and placing $$x=x_0+h$$ and $$y=y_0+k$$ result in $P_{1,(x_0,y_0)}(x,y)=f(x_0,y_0)+f_x(x_0,y_0)h+f_y(x_0,y_0)k.$ For a better approximation we consider $$P_{2,(x_0,y_0)}=a_0+a_1x+a_2y+b_1 x^2+b_2 xy+b_3 y^2$$. Matching the zero, first, and second partial derivatives results in
$P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+h f_x+k f_y +\frac{1}{2!}\left[h^2 f_{xx}+2 hk f_{xy}+k^2 f_{yy}\right],$
where the partial derivatives are evaluated at $$(x_0,y_0)$$. The above expression can also be written as:
$P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right) f(x,y)\right]_{{x=x_0}\atop{y=y_0}}+\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2 f(x,y)\right]_{{x=x_0}\atop{y=y_0}},$
where
$\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2=h^2\frac{\partial^2}{\partial x^2}+2hk\frac{\partial^2}{\partial x\partial y}+k^2\frac{\partial^2}{\partial y^2}.$

Another form of writing $$P_{2,(x_0,y_0)}$$ is:
$P_{2,(x_0,y_0)}(x,y)=f(x_0,y_0)+\begin{bmatrix} \frac{\partial f}{\partial x} & \frac{\partial f}{\partial y}\end{bmatrix} \begin{bmatrix} h\\ k \end{bmatrix}+\frac{1}{2!}\begin{bmatrix} h & k \end{bmatrix} \begin{bmatrix} \frac{\partial^2 f}{\partial x^2} & \frac{\partial^2 f}{\partial x \partial y}\\ \frac{\partial^2 f}{\partial x \partial y} & \frac{\partial^2 f}{\partial y^2} \end{bmatrix} \begin{bmatrix} h\\ k \end{bmatrix},$

##### Hessian Matrix

where again the partial derivatives are evaluated at $$(x_0,y_0)$$. The $$2\times 2$$ matrix in the above expression is called the Hessian matrix and is denoted by $$H(x_0,y_0)$$. We will talk about it later in this section.

We can use the results for functions of a single variable to derive formulas for the Taylor polynomial and the remainder of a function $$f$$ of two or more variables. Assume $$f(x,y)$$ is continuous and has continuous partial derivatives at $$(x_0,y_0)$$. Let $x=x_0+ht,\quad y=y_0+kt$ where $$x_0, y_0, h$$ and $$k$$ are treated as constants and $$t$$ is a variable. Then $F(t)=f(x(t),y(t))=f(x_0+ht+y_0+kt)$ By Taylor’s formula, we have:
\begin{align}F(t)=&F(0)+F'(0)t+\frac{1}{2!}F^{\prime\prime}(0) t^2+\frac{1}{3!}F^{\prime\prime\prime}(0) t+\cdots\\ +& \frac{1}{m!}F^{(m)}(t) t^m+\frac{1}{(m+1)!}F^{(m+1)}(\xi)\ t^{m+1}\tag{\dagger}\end{align}
where $$\xi$$ is a number between 0 and $$t$$. Using the chain rule, as we saw before, we have:
\begin{align} F'(t)=&\frac{\partial f}{\partial x}\frac{dx}{dt}+\frac{\partial f}{\partial y}\frac{dy}{dt}\\ &=h \frac{\partial f}{\partial x}+k\frac{\partial f}{\partial y}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)f\end{align}
We can show
\begin{align} F^{\prime\prime}(t)&=h^2 \frac{\partial^2 f}{\partial x^2}+2hk\frac{\partial^2 f}{\partial x \partial y}+k^2\frac{\partial^2 f}{\partial y^2}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^2f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2f,\end{align}
the third derivative is
\begin{align} F^{\prime\prime\prime}(t)&=h^3 \frac{\partial^3 f}{\partial x^3}+3h^2k\frac{\partial^3 f}{\partial x^2 \partial y}+3hk^2\frac{\partial^3 f}{\partial x\partial y^2}+k^3\frac{\partial^3 f}{\partial y^3}\\ &=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^3f\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^3f\end{align}
and in general:
\begin{align} F^{(m)}(t)&=\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf\\ &=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^mf \tag{**}\end{align}

This may be proved by induction.

Therefore

Induction has two steps: (1) We prove (**) holds true for $$n=1$$, (2) We prove that if (**) is true for any value $$m=k$$, then it is also true for $$m=k+1$$

$F^{(m)}(0)=\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf(x,y)\right]_{{x=x_0}\atop{y=y_0}}=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^m f(x_0,y_0),$and
$F^{(m+1)}(\xi)=\left[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^{m+1}f(x,y)\right]_{{x=x_0+\xi h}\atop{y=y_0+\xi k}}=\left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^{m+1} f(x_0+\xi h,y_0+\xi k)$
Substituting in ($$\dagger$$), we have
\begin{align} F(t)=&\Big[f\Big]_{{x=x_0}\atop{y=y_0}}+t \Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)f\Bigg]_{{x=x_0}\atop{y=y_0}}+\cdots+\frac{t^m}{m!}\Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^mf\Bigg]_{{x=x_0}\atop{y=y_0}}\\ &\hspace{0.5cm}+\frac{t^{m+1}}{(m+1)!}\Bigg[\left(h\frac{\partial}{\partial x}+k\frac{\partial}{\partial y}\right)^{m+1}f\Bigg]_{{x=x_0+\xi h}\atop{y=y_0+\xi k}}\\ =&f(x_0,y_0)+t \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)f(x_0,y_0)+\cdots+\frac{t^m}{m!} \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)^m f(x_0,y_0)\\ &+\frac{t^{m+1}}{(m+1)!} \left((h,k)\boldsymbol{\cdot}\overrightarrow{\nabla} \right)^{m+1} f(x_0+\xi h,y_0+\xi k) \end{align}

for some $$\xi$$ between 0 and $$t$$. Because this is true for all values of $$t$$, we can plug $$t=1$$ in and find the Taylor’s formula for functions of two variables. Here we proved the following theorem for $$n=2$$, but generalization is easy.

Theorem 3. Let $$f:U\to\mathbb{R}$$ where $$U\subseteq \mathbb{R}^n$$ is an open set. Suppose $$f$$ has continuous partial derivatives up (at least) to order $$m+1$$, and consider $$\mathbf{x}_0\in U$$ and $$\mathbf{h}\in\mathbb{R}^n$$ such that $$\mathbf{x}_0+t\mathbf{h}\in U$$ for $$0\leq t\leq 1$$. Then there is a number $$0<\theta<1$$ such that
\begin{align} f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x})&+\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)f(\mathbf{x}_0)+\frac{1}{2!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2 f(\mathbf{x}_0)+\cdots \\ &+\frac{1}{m!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^m f(\mathbf{x}_0)+\frac{1}{(m+1)!}\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^{m+1} f(\mathbf{x}_0+\theta \mathbf{h})\end{align}

Another form of writing Taylor’s formula is: for some $0<\theta<1$.

If we place $$\mathbf{h}=\mathbf{x}-\mathbf{x}_0$$ in the above formula, the polynomial that we obtain is called the polynomial approximation of $$f$$ of degree $$m$$ at $$\mathbf{x}_0$$.

Example 1
Given $$f(x,y)=\sin x\ e^{y-2x}$$, find a second degree polynomial approximation to $$f$$ near the point $$(\frac{\pi}{2},\pi)$$ and use it to estimate the value $$f(0.95\frac{\pi}{2},1.1\pi)$$

Solution
For the polynomial approximation of degree 2, we need to find the first and second partial derivatives of $$f$$.

\begin{align} &f(x,y)=\sin x\ e^{y-2x} \Rightarrow f\left(\frac{\pi}{2},\pi\right)=1,\\ &f_{x}(x,y)=\cos x\ e^{y-2x}-2\sin x\ e^{y-2x}\Rightarrow f_x\left(\frac{\pi}{2},\pi\right)=-2,\\ &f_y(x,y)=\sin x\ e^{y-2x}\Rightarrow f_y\left(\frac{\pi}{2},\pi\right)=1,\\ &f_{xx}(x,y)=3\sin x\ e^{y-2x}-4 \cos x\ e^{y-2x}\Rightarrow f_{xx}\left(\frac{\pi}{2},\pi\right)=3,\\ &f_{xy}(x,y)=\cos x\ e^{y-2x}-2\sin x\ e^{y-2x}\Rightarrow f_{xy}\left(\frac{\pi}{2},\pi\right)=-2,\\ &f_{yy}(x,y)=\sin x\ e^{y-2x} \Rightarrow f_{yy}\left(\frac{\pi}{2},\pi\right)=1.\end{align}
Thus
$f\left(\frac{\pi}{2}+h,\pi+k\right)\approx 1-2 h+k+\frac{1}{2!}\left(3 h^2-2\times 2 h k+k^2\right).$
If we place $$x=\frac{\pi}{2}+h$$ and $$y=\pi+k$$, we obtain the second degree polynomial approximation of $$f$$ near $$(\pi/2,\pi)$$
\begin{align}f(x,y)\approx & P_{2,\left(\frac{\pi}{2},\pi\right)}(x,y)=1-2\left(x-\frac{\pi}{2}\right)+(y-\pi)+\\ &\frac{3}{2}\left(x-\frac{\pi}{2}\right)^2-2\left(x-\frac{\pi}{2}\right)(y-\pi)+\frac{1}{2}(y-\pi)^2\end{align}
Therefore:
\begin{align}f\left(\frac{0.95}{2}\pi,1.1\pi\right)\approx & 1-2(-0.025\pi)+0.1\pi+\frac{3}{2}(-0.025\pi)^2 \\ & -2(-0.025\pi)(0.1\pi)+\frac{1}{2}(0.1\pi)^2\approx 1.57919.\end{align}
The exact value of $$f\left(\frac{0.95}{2}\pi,1.1\pi\right)$$ is 1.59704. The error of this approximation is $$\approx 1.12\%$$, while if we used the linear approximation $$P_{1,\left(\frac{\pi}{2},\pi\right)}=1-2\left(x-\frac{\pi}{2}\right)+(y-\pi)$$, the error would be $$\approx 7.88\%$$.

The quadratic term $$\sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf{x}_0)h_i h_j$$ can be written as
$\begin{bmatrix} h_1 & h_2 & \cdots & h_n\end{bmatrix} \begin{bmatrix} \frac{\partial^2 f}{\partial x_1^2} & \frac{\partial^2 f}{\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_1 \partial x_n}\\ \frac{\partial^2 f}{\partial x_2 \partial x_1} & \frac{\partial^2 f}{\partial x_2^2} & \cdots & \frac{\partial^2 f}{\partial x_2 \partial x_n}\\ \vdots & \vdots &\ddots & \vdots\\ \frac{\partial^2 f}{\partial x_n \partial x_1} & \frac{\partial^2 f}{\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f}{\partial x_n^2} \end{bmatrix} \begin{bmatrix} h_1 \\ h_2 \\ \\ \vdots \\ \\ h_n\end{bmatrix}$
Similar to the case of $$n=2$$, the $$n\times n$$ matrix of the second order derivatives $$\left[\frac{\partial^2 f}{\partial x_i \partial x_j}\right]_{n\times n}$$ is called the Hessian matrix and is denoted by $$H(\mathbf{x})$$. Therefore, we can write:
$\sum_{i,j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf{x}_0)h_i h_j=\left(\mathbf{h}\boldsymbol{\cdot}\overrightarrow{\nabla}\right)^2 f(\mathbf{x}_0)=\mathbf{h}^T H(\mathbf{x}_0) \mathbf{h},$
where $$\mathbf{h}$$ is considered an $$n\times 1$$ column matrix.1 Using this notation, the linear expansion of $$f$$ can be written as:
$f(\mathbf{x}_0+\mathbf{h})=f(\mathbf{x}_0)+\mathbf{h}\boldsymbol{\cdot} \nabla f(\mathbf{x}_0)+\mathbf{h}^T H(\mathbf{x}_0+\theta \mathbf{h})\, \mathbf{h}.$

1 Some books consider $$\mathbf{h}$$ as a $$1 × n$$ row matrix. When there is any ambiguity, it is better to write it as$$\langle \mathbf{h}\vert H(\mathbf{x}_0)\vert \mathbf{h}\rangle$$