Review of Taylor’s Formula for Functions of a Single Variable

Let’s review the Taylor series for single variable functions. Suppose y=f(x) is differentiable at x0, then it has a linear approximation at x0, and we have:
f(x)=f(x0)+f(x0)(xx0)+(xx0)ε1(xx0), withlimxx0ε1(xx0)=0. Therefore, P1,x0(x)=f(x0)+f(x0)(xx0) is the linear approximation and the error in this approximation is: R1,x0(x)=(xx0)ε1(xx0). The subscripts 1 and x0 in P1,x0 and R1,x0 show the maximum power of x that appears in the polynomial and the point about which we approximate the function f.

Note that as xx0, R1,x0(x) tends to 0 faster than (xx0) because R1,x0(x)/(xx0)=ε1,x0(xx0)0. Mathematically we write R1,x0(x)=o(xx0).

 

If we want a better approximation of f, instead of a linear function, we may use a quadratic function P2,x0(x)=a0+a1x+a2x2. To determine the coefficients a0,a1 and a2, we match the values of the functions and their first and second derivatives at x=x0. This means the graph of P2,x0(x) has the same value, the same slope and the same concavity as the graph of f at x=x0. The quadratic polynomial reads:
P2,x0(x)=f(x0)+f(x0)(xx0)+12!f(x0)(xx0)2,
and the error is
R2,x0(x)=(xx0)2ε2,x0(xx0),wherelimxx0ε2,x0(xx0)=0.

We still can improve approximations of f at x=x0 by using higher order polynomials and matching more derivatives at the selected base point. If we use a polynomial of order m, we can prove (see the following theorem) that the error in the approximation goes to zero faster than (xx0)m as xx0. Mathematically we write Rm,x0(x)=o((xx0)m) which means Rm,x0(x)/(xx0)m0 as xx0. In general, we have the following theorem:

Theorem 1. Suppose f is a function that is m(1) times differentiable at x0; that is f(x0),f(x0),,f(m)(x0) all exist. Let Pm,x0(x)=a0+a1(xx0)++am(xx0)m, where ak=f(k)(x0)k!,0km, and εm,x0(x)=f(x)Pm,x0(x)(xx0)m. Then limxx0εm,x0(x)=0.
  • Remark that f(0)(x0)=f(x0), f(1)(x0)=f(x0), f(2)(x0)=f(x0), , and 0!=1.

Show the proof

 

Definitions of  the “Taylor polynomial” and “remainder”

Pm,x0(x) is called the Taylor polynomial of degree m for f(x) at x0. The error Rm,x0(x) is also called the remainder term. You should verify that
Pm,x0(x0)=f(x0),Pm,x0(x0)=f(x0),Pm,x0(x0)=f(x0),Pm,x0(m)(x0)=f(m)(x0).
To estimate the error of this approximation, we would like to have an expression for the remainder Rm,x0(x). Various expressions under stronger regularity assumptions on f exist in the literature. We mention one of them which is called the Lagrange form of the remainder term, after the great mathematician Joseph-Louis Lagrange.

Theorem 2. If f(m+1) is continuous on an open interval I that contains x0 and xI, then there exists a number ξ between x and x0 such that Rm,x0(x)=f(m+1)(ξ)(m+1)!(xx0)(m+1).

Show the proof


We don’t know anything about ξ except that ξ is between x0 and x.

If we place x=x0+h, we have:

f(x0+h)=f(x0)+f(x0)1!h+f(x0)2!h2++f(m)(x0)m!hm+f(m+1)(x0+θh)(m+1)!hm+1=k=1mf(k)(x0)k!hk+f(m+1)(x0+θh)(m+1)!hm+1for some 0<θ<1.

 

 

Taylor’s Formula for Functions of Several Variables

 

Now we wish to extend the polynomial expansion to functions of several variables. We learned that if f(x,y) is differentiable at (x0,y0), we can approximate it with a linear function (or more accurately an affine function), P1,(x0,y0)(x,y)=a0+a1x+a2y. Matching the value and first partial derivatives and placing x=x0+h and y=y0+k result in P1,(x0,y0)(x,y)=f(x0,y0)+fx(x0,y0)h+fy(x0,y0)k. For a better approximation we consider P2,(x0,y0)=a0+a1x+a2y+b1x2+b2xy+b3y2. Matching the zero, first, and second partial derivatives results in
P2,(x0,y0)(x,y)=f(x0,y0)+hfx+kfy+12![h2fxx+2hkfxy+k2fyy],
where the partial derivatives are evaluated at (x0,y0). The above expression can also be written as:
P2,(x0,y0)(x,y)=f(x0,y0)+[(hx+ky)f(x,y)]x=x0y=y0+[(hx+ky)2f(x,y)]x=x0y=y0,
where
(hx+ky)2=h22x2+2hk2xy+k22y2.

Another form of writing P2,(x0,y0) is:
P2,(x0,y0)(x,y)=f(x0,y0)+[fxfy][hk]+12![hk][2fx22fxy2fxy2fy2][hk],

 
Hessian Matrix

where again the partial derivatives are evaluated at (x0,y0). The 2×2 matrix in the above expression is called the Hessian matrix and is denoted by H(x0,y0). We will talk about it later in this section.

We can use the results for functions of a single variable to derive formulas for the Taylor polynomial and the remainder of a function f of two or more variables. Assume f(x,y) is continuous and has continuous partial derivatives at (x0,y0). Let x=x0+ht,y=y0+kt where x0,y0,h and k are treated as constants and t is a variable. Then F(t)=f(x(t),y(t))=f(x0+ht+y0+kt) By Taylor’s formula, we have:
F(t)=F(0)+F(0)t+12!F(0)t2+13!F(0)t+()+1m!F(m)(t)tm+1(m+1)!F(m+1)(ξ) tm+1
where ξ is a number between 0 and t. Using the chain rule, as we saw before, we have:
F(t)=fxdxdt+fydydt=hfx+kfy=(hx+ky)f=((h,k))f
We can show
F(t)=h22fx2+2hk2fxy+k22fy2=(hx+ky)2f=((h,k))2f,
the third derivative is 
F(t)=h33fx3+3h2k3fx2y+3hk23fxy2+k33fy3=(hx+ky)3f=((h,k))3f
and in general:
F(m)(t)=(hx+ky)mf(**)=((h,k))mf

This may be proved by induction.

Therefore

Induction has two steps: (1) We prove (**) holds true for n=1, (2) We prove that if (**) is true for any value m=k, then it is also true for m=k+1

F(m)(0)=[(hx+ky)mf(x,y)]x=x0y=y0=((h,k))mf(x0,y0),and
F(m+1)(ξ)=[(hx+ky)m+1f(x,y)]x=x0+ξhy=y0+ξk=((h,k))m+1f(x0+ξh,y0+ξk)
Substituting in (), we have
F(t)=[f]x=x0y=y0+t[(hx+ky)f]x=x0y=y0++tmm![(hx+ky)mf]x=x0y=y0+tm+1(m+1)![(hx+ky)m+1f]x=x0+ξhy=y0+ξk=f(x0,y0)+t((h,k))f(x0,y0)++tmm!((h,k))mf(x0,y0)+tm+1(m+1)!((h,k))m+1f(x0+ξh,y0+ξk)
for some ξ between 0 and t. Because this is true for all values of t, we can plug t=1 in and find the Taylor’s formula for functions of two variables. Here we proved the following theorem for n=2, but generalization is easy.

Theorem 3. Let f:UR where URn is an open set. Suppose f has continuous partial derivatives up (at least) to order m+1, and consider x0U and hRn such that x0+thU for 0t1. Then there is a number 0<θ<1 such that
f(x0+h)=f(x)+(h)f(x0)+12!(h)2f(x0)++1m!(h)mf(x0)+1(m+1)!(h)m+1f(x0+θh)

Another form of writing Taylor’s formula is:

for some 0<θ<1.

If we place h=xx0 in the above formula, the polynomial that we obtain is called the polynomial approximation of f of degree m at x0.

 

Example 1
Given f(x,y)=sinx ey2x, find a second degree polynomial approximation to f near the point (π2,π) and use it to estimate the value f(0.95π2,1.1π)

Solution
For the polynomial approximation of degree 2, we need to find the first and second partial derivatives of f.

f(x,y)=sinx ey2xf(π2,π)=1,fx(x,y)=cosx ey2x2sinx ey2xfx(π2,π)=2,fy(x,y)=sinx ey2xfy(π2,π)=1,fxx(x,y)=3sinx ey2x4cosx ey2xfxx(π2,π)=3,fxy(x,y)=cosx ey2x2sinx ey2xfxy(π2,π)=2,fyy(x,y)=sinx ey2xfyy(π2,π)=1.
Thus
f(π2+h,π+k)12h+k+12!(3h22×2hk+k2).
If we place x=π2+h and y=π+k, we obtain the second degree polynomial approximation of f near (π/2,π)
f(x,y)P2,(π2,π)(x,y)=12(xπ2)+(yπ)+32(xπ2)22(xπ2)(yπ)+12(yπ)2
Therefore:
f(0.952π,1.1π)12(0.025π)+0.1π+32(0.025π)22(0.025π)(0.1π)+12(0.1π)21.57919.
The exact value of f(0.952π,1.1π) is 1.59704. The error of this approximation is 1.12%, while if we used the linear approximation P1,(π2,π)=12(xπ2)+(yπ), the error would be 7.88%.

The quadratic term i,j=1n2fxixj(x0)hihj can be written as
[h1h2hn][2fx122fx1x22fx1xn2fx2x12fx222fx2xn2fxnx12fxnx22fxn2][h1h2hn]
Similar to the case of n=2, the n×n matrix of the second order derivatives [2fxixj]n×n is called the Hessian matrix and is denoted by H(x). Therefore, we can write:
i,j=1n2fxixj(x0)hihj=(h)2f(x0)=hTH(x0)h,
where h is considered an n×1 column matrix.1 Using this notation, the linear expansion of f can be written as:
f(x0+h)=f(x0)+hf(x0)+hTH(x0+θh)h.


1 Some books consider h as a 1×n row matrix. When there is any ambiguity, it is better to write it ash|H(x0)|h↩