Inner Product Spaces

Definition. Let V be a vector space over F, where $F = \real$ or $\complex$ . An inner product on V is a function $(\cdot, \cdot): V
   \times V \rightarrow F$ which satisfies:

  1. (Linearity) $(ax + y, z) = a(x, z) + (y,
   z)$ , for $x, y, z \in V$ , $a \in F$ .
  2. (Symmetry) $(x, y) = \overline{(y,x)}$ , for $x, y \in V$ . ("$\overline x$ " denotes the complex conjugate of x.)
  3. (Positive-definiteness) If $x \ne 0$ , then $(x, x) > 0$ .

A vector space with an inner product is an inner product space. If $F = \real$ , V is a real inner product space; if $F =
   \complex$ , V is a complex inner product space.

Example. Suppose V is a complex inner product space. By properties 1 and 2,

$$(x, ay + z) = \overline{(ay + z, x)} = \overline{a(y, x) + (z, x)} = \overline a \overline{(y, x)} + \overline{(z, x)} = \overline a (x, y) + (x, z).$$

That is, the inner product is almost linear in the second variable --- constants pull out, up to complex conjugation. This is sometimes referred to as sesquilinearity.

Example. If $F =
   \real$ , $\overline x = x$ . In this case, the second property reads $(x, y) =
   (y, x)$ . Moreover, the result I derived in the last example becomes $(x,
   ay + z) = a(x, y) + (x, z)$ .

Thus, a real inner product is linear in each variable.

Why include complex conjugation in the symmetry axiom? If I had used $(x, y) = (y, x)$ in the complex case, then

$$0 < (ix, ix) = i(x, ix) = i(ix, x) = i\cdot i(x, x) = -(x, x).$$

This contradicts $(x, x) > 0$ . That is, I can't have both pure symmetry and positive definiteness.


$$(0, x) = (0 + 0, x) = (0, x) + (0, x), \quad\hbox{so}\quad 0 = (0, x).$$

By symmetry, $(x, 0) = 0$ as well.

Example. If $F =
   \complex$ , the condition $(x, x) > 0$ implies that $(x, x) \in
   \real$ .

Example. The dot product on $\real^n$ is given by

$$\langle a_1, \ldots, a_n\rangle \cdot \langle b_1, \ldots, b_n\rangle = a_1 b_1 + \cdots + a_n b_n.$$

It's easy to verify that the axioms for an inner product hold. For example,

$$\langle a_1, \ldots, a_n\rangle \cdot \langle a_1, \ldots, a_n\rangle = a_1^2 + \cdot + a_n^2 > 0$$

provided that $\langle a_1, \ldots,
   a_n\rangle \ne 0$ .

I can use an inner product to define lengths and angles. You can think of an inner product as an axiomatic way of introducing (metric) geometry into vector spaces.

Definition. Let V be an inner product space, and let $x, y \in V$ .

  1. The length of x is $\|x\| = (x, x)^{1/2}$ .
  2. The distance between x and y is $\|x - y\|$ .
  3. The angle between x and y is the smallest positive real number $\theta$ satisfying

$$\cos \theta = \dfrac{(x, y)}{\|x\|\|y\|}.$$

Remark. The definition of the angle between x and y will make sense once I prove the Cauchy-Schwarz ineqaulity below (since I need to know that $-1 \le \dfrac{(x,
   y)}{\|x\|\|y\|} \le 1$

Proposition. Let V be a real inner product space, $a \in \real$ , $x, y \in V$ .

(a) $\|ax\| = |a|\|x\|$ . ("$|a|$ " denotes the absolute value of a.)

(b) $x \ne 0$ if and only if $\|x\| >
   0$ .

(c) ( Cauchy- Schwarz inequality) $|(x,y)| \le \|x\|\|y\|$ .

(d) ( Triangle inequality) $\|x + y\| \le \|x\| + \|y\|$ .

Proof. (a) Since $(ax, ax) = a^2(x, x)$ ,

$$\|ax\| = \sqrt{a^2} \|x\| = |a|\|x\|.$$

(b) $x \ne 0$ implies $(x, x) > 0$ , and hence $\|x\| > 0$ . Conversely, if $x = 0$ , then $(0, 0) = 0$ , so $\|x\| = 0$ .

(c) First, observe that if $a, b \in
   \real$ , then

$$0 \le (ax + by, ax + by) = a^2 (x, x) + 2ab (x, y) + b^2 (y, y) = a^2 \|x\|^2 + 2ab (x, y) + b^2 \|y\|^2.$$

Since the last line is true for any $a, b \in \real$ , I may set $a = \|y\|$ , $b = \|x\|$ . Then

$$0 \le 2\|x\|^2 \|y\|^2 + 2\|x\|\|y\|(x, y),$$

$$0 \le \|x\|\|y\| + (x, y),$$

$$\|x\|\|y\| \ge -(x, y).$$

Replace x with $-x$ ; this yields

$$\|x\|\|y\| \ge (x,y).$$

Therefore, $\|x\|\|y\| \ge |(x,y)|$ .


$$\|x + y\|^2 = (x + y, x + y) = \|x\|^2 + 2(x, y) + \|y\|^2 \le \|x\|^2 + 2\|x\|\|y\| + \|y\|^2 = (\|x\| + \|y\|)^2.$$

Hence, $\|x + y\| \le \|x\| + \|y\|$ .

Example. $\real^3$ is an inner product space using the standard dot product of vectors. The cosine of the angle between $\langle
   2,-2,1\rangle$ and $\langle 6,-8,24\rangle$ is

$$\cos \theta = \dfrac{\langle 2,-2,1\rangle\cdot \langle 6,-8,24\rangle} {\|\langle 2,-2,1\rangle\|\|\langle 6,-8,24\rangle\|} = \dfrac{12 + 16 + 24}{(3)(26)} = \dfrac{2}{3}.\quad\halmos$$

Example. Let $C[0, 1]$ denote the real vector space of continuous functions on the interval $[0, 1]$ . Define an inner product on $C[0, 1]$ by

$$(f, g) = \int_0^1 f(x) g(x)\,dx.$$

Note that $f(x) g(x)$ is integrable, since it's continuous on a closed interval.

The verification that this gives an inner product relies on standard properties of Riemann integrals. For example, if $f \ne 0$ ,

$$(f, f) = \int_0^1 f(x)^2\,dx > 0.$$

Given that this is a real inner product, I may apply the preceding proposition to produce some useful results. For example, the Schwarz inequality says that

$$\left(\int_0^1 f(x)^2\,dx\right)^{1/2} \left(\int_0^1 g(x)^2\,dx\right)^{1/2} \ge \left|\int_0^1 f(x) g(x)\,dx\right|.\quad\halmos$$

Definition. A set of vectors S in an inner product space V is orthogonal if $(v_i, v_j) = 0$ for $v_i, v_j \in S$ , $v_i
   \ne v_j$ .

An orthogonal set S is orthonormal if $\|v_i\| = 1$ for all $v_i \in S$ .

The vectors in an orthogonal set are mutually perpendicular. The vectors in an orthonormal set are mutually perpendicular unit vectors.

Notation. If I is an index set, the Kronecker delta $\delta_{ij}$ (or $\delta(i,j)$ ) is defined by

$$\delta_{ij} = \cases{0 & if $i \ne j$\cr 1 & if $i = j$\cr}$$

With this notation, a set $S = \{v_i\}$ is orthonormal if

$$(v_i, v_j) = \delta_{ij}.$$

Note that the $n \times n$ matrix whose $(i, j)$ -th component is $\delta_{ij}$ is the $n \times n$ identity matrix.

Example. The standard basis for $\real^n$ is orthonormal.

Example. Here is an orthonormal basis for $\real^2$ :

$$\left\{\left[\matrix{ \dfrac{3}{5} \cr \dfrac{4}{5} \cr}\right], \left[\matrix{ -\dfrac{4}{5} \cr \dfrac{3}{5} \cr}\right]\right\} \quad\halmos$$

Example. Let $C[0, 2\pi]$ denote the complex-valued continuous functions on $[0, 2\pi]$ . Define an inner product by

$$(f,g) = \dfrac{1}{2\pi} \int_0^{2\pi} f(x) \overline{g(x)}\,dx.$$

Let $m, n \in \integer$ . Then

$$\dfrac{1}{2\pi} \int_0^{2pi} e^{imx} e^{-inx}\,dx = \delta_{mn}.$$

It follows that the set

$$\left\{\dfrac{1}{\sqrt{2\pi}} e^{mix} \bigm| m = \ldots, -1, 0, 1, \ldots \right\}$$

is orthonormal in $C[0, 2\pi]$ .

Proposition. Let $\{v_i\}$ be an orthogonal set of vectors, $v_i \ne 0$ for all i. Then $\{v_i\}$ is independent.

Proof. Suppose

$$a_1 v_{i_1} + \cdots + a_n v_{i_n} = 0.$$

Take the inner product of both sides with $v_{i_1}$ . Since $\{v_i\}$ is orthogonal, this gives $a_1 (v_{i_1},
   v_{i_1}) = 0$ . Now $(v_{i_1}, v_{i_1}) > 0$ since $v_{i_1} \ne 0$ , so $a_1 = 0$ . Similarly, $a_j = 0$ for all j. Therefore, $\{v_i\}$ is independent.

An orthonormal set consists of vectors of length 1, so the vectors are obviously nonzero. Hence, an orthonormal set is independent, and forms a basis for the subspace it spans. A basis which is an orthonormal set is called an orthonormal basis.

It is very easy to find the components of a vector relative to an orthonormal basis.

Proposition. Let $\{v_i\}$ be an orthonormal basis for V, and let $v \in V$ . Then

$$v = \sum_{i} (v, v_i) v_i.$$

Note: In fact, the sum above is a finite sum --- that is, only finitely many terms are nonzero.

Proof. Since $\{v_i\}$ is a basis,

$$v = a_1 v_{i_1} + \cdots + a_n v_{i_n}, \quad\hbox{where}\quad a_j \in F, v_{i_v} \in \{v_i\}.$$

Let $i_j \in \{i_1, \ldots, i_n\}$ . Take the inner product of both sides with $v_{i_j}$ . Then

$$(v, v_{i_j}) = a_1 (v_{i_1}, v_{i_j}) + \cdots + a_n (v_{i_n}, v_{i_j}) = a_j,$$

since by orthonormality $(v_{i_k},
   v_{i_j}) = 0$ if $i_k \ne i_j$ and $(v_{i_1}, v_{i_j})
   = 1$ .


$$v = (v, v_{i_1}) v_{i_1} + \cdots + (v, v_{i_n}) v_{i_n}.$$

Observe that if $i \notin \{i_1, \ldots,
   i_n\}$ , then

$$(v_i, v) = (v_i, a_1 v_{i_1} + \cdots + a_n v_{i_n}) = 0.$$

It follows that the only nonzero terms in the sum $\sum_{i} (v, v_i) v_i$ are those for which $i
   \in \{i_1, \ldots, i_n\}$ , and

$$v = (v, v_{i_1}) v_{i_1} + \cdots + (v, v_{i_n}) v_{i_n} = \sum_{i} (v, v_i) v_i.\quad\halmos$$

Example. Here is an orthonormal basis for $\real^2$ :

$$\left\{\left[\matrix{\dfrac{3}{5} \cr \noalign{\vskip2pt} \dfrac{4}{5} \cr}\right], \left[\matrix{ -\dfrac{4}{5} \cr \noalign{\vskip2pt} \dfrac{3}{5} \cr}\right]\right\}$$

To express $\langle -7, 6\rangle$ in terms of this basis, take the dot product of the vector with each element of the basis:

$$\left[\matrix{-7 \cr 6 \cr}\right] = \dfrac{3}{5}\left[\matrix{\dfrac{3}{5} \cr \noalign{\vskip2pt} \dfrac{4}{5} \cr}\right] + \dfrac{46}{5}\left[\matrix{-\dfrac{4}{5} \cr \noalign{\vskip2pt} \dfrac{3}{5} \cr}\right]. \quad\halmos$$

Example. Let $C[0, 2\pi]$ denote the complex inner product space of complex-valued continuous functions on $[0, 2\pi]$ , where the inner product is defined by

$$(f,g) = \dfrac{1}{2\pi} \int_0^{2\pi} f(x) \overline{g(x)}\,dx.$$

I showed earlier that the following set is orthonormal:

$$S = \left\{\dfrac{1}{\sqrt{2\pi}} e^{mix} \bigm| m = \ldots, -1, 0, 1, \ldots \right\}.$$

Suppose I try to compute the "components" of $f(x) = x$ relative to this orthonormal set by taking inner products --- that is, using the approach of the preceding example.

For $m = 0$ ,

$$\dfrac{1}{\sqrt{2\pi}} \int_0^{2\pi} x\,dx = \pi\sqrt{2\pi}.$$

Suppose $m \ne 0$ . Then

$$\dfrac{1}{\sqrt{2\pi}} \int_0^{2\pi} x e^{-mix}\,dx = \dfrac{1}{\sqrt{2\pi}}\left[\dfrac{i}{m} xe^{-mix} - \dfrac{1}{m^2} e^{-mix}\right]_0^{2\pi} = \dfrac{i \sqrt{2\pi}}{m}.$$

There are infinitely many nonzero components! Of course, the reason this does not contradict the earlier result is that $f(x) = x$ may not lie in the span of S. S is orthonormal, hence independent, but it is not a basis for $C[0, 2\pi]$ .

In fact, since $e^{mix} = \cos mx + i
   \sin mx$ , a finite linear combination of elements of S must be periodic.

It is still reasonable to ask whether (or in what sense) the infinite sum

$$\pi\sqrt{2\pi} + \sum_{m=1}^\infty \left(\dfrac{i \sqrt{2\pi}}{m} e^{mix} - \dfrac{i \sqrt{2\pi}}{m} e^{-mix}\right)$$

represents the function $f(x) = x$ . For example, it is reasonable to ask whether the series converges uniformly to f at each point of $[0, 2\pi]$ . The answers to these kinds of questions would require an excursion into the theory of Fourier series.

Since it's so easy to find the components of a vector relative to an orthonormal basis, it's of interest to have an algorithm which converts a given basis to an orthonormal one.

The Gram-Schmidt algorithm converts a basis to an orthonormal basis by "straightening out" the vectors one by one.

$$\hbox{\epsfysize=1.75in \epsffile{inner1.eps}}$$

The picture shows the first step in the straightening process. Given vectors $v_1$ and $v_2$ , I want to replace $v_2$ with a vector perpendicular to $v_1$ . I can do this by taking the component of $v_2$ perpendicular to $v_1$ , which is

$$v_2 - \dfrac{(v_1, v_2)}{(v_1, v_1)} v_1.$$

Lemma. ( Gram-Schmidt algorithm) Let $\{v_1, \ldots,
   v_k\}$ is a set of nonzero vectors in an inner product space V. Suppose $v_1$ , ..., $v_{k-1}$ are pairwise orthogonal. Let

$$v_k' = v_k - \sum_{i<k} \dfrac{(v_i, v_k)}{(v_i, v_i)} v_i.$$

Then $v_k'$ is orthogonal to $v_1$ , ..., $v_{k-1}$ .

Proof. Let $j
   \in \{1, \ldots, k - 1\}$ . Then

$$(v_j, v_k') = (v_j, v_k) - \sum_{i<k} \dfrac{(v_i, v_k)}{(v_i, v_i)} (v_j, v_i).$$

Now $(v_j, v_i) = 0$ for $i \ne j$ , so the right side collapses to

$$(v_j, v_k) - \dfrac{(v_j, v_k)}{(v_j, v_j)} (v_j, v_j) = (v_j, v_k) - (v_j, v_k) = 0.\quad\halmos$$

Suppose that I start with an independent set $\{v_1, \ldots, v_n\}$ . Apply the Gram-Schmidt procedure to the set, beginning with $v_1' = v_1$ . This produces an orthogonal set $\{v_1', \ldots, v_n'\}$ . In fact, $\{v_1',
   \ldots, v_n'\}$ is a nonzero orthogonal set, so it is independent as well.

To see that each $v_k'$ is nonzero, suppose

$$0 = v_k' = v_k - \sum_{i<k} \dfrac{(v_i, v_k)}{(v_i, v_i)} v_i.$$


$$v_k = \sum_{i<k} \dfrac{(v_i, v_k)}{(v_i, v_i)} v_i,$$

which contradicts the independence of $\{v_i\}$ .

In general, if the algorithm is applied iteratively to a set of vectors, the span is preserved at each state. That is,

$$\langle v_1, \ldots, v_k\rangle = \langle v_1', \ldots, v_k'\rangle.$$

This is trivially true at the start, since $v_1 = v_1'$ . Assume inductively that

$$\langle v_1, \ldots, v_{k-1}\rangle = \langle v_1', \ldots, v_{k-1}'\rangle.$$

The equation

$$v_k' = v_k - \sum_{i<k} \dfrac{(v_i, v_k)}{(v_i, v_i)} v_i$$

shows that $\langle v_1', \ldots,
   v_k'\rangle \subset \langle v_1, \ldots, v_k\rangle$ .


$$v_k = v_k' + \sum_{i<k} \dfrac{(v_i, v_k)}{(v_i, v_i)} v_i \subset \langle v_k'\rangle + \langle v_1, \ldots, v_{k-1}\rangle =$$

$$\langle v_k'\rangle + \langle v_1', \ldots, v_{k-1}'\rangle = \langle v_1', \ldots, v_k'\rangle.$$

It follows that $\langle v_1, \ldots,
   v_k\rangle \subset \langle v_1', \ldots, v_k'\rangle$ , so $\langle
   v_1, \ldots, v_{k-1}\rangle = \langle v_1', \ldots, v_{k-1}'\rangle$ , as claimed.

To summarize: If you apply Gram-Schmidt to a set of vectors, the algorithm produces a new set of vectors with the same span as the old set. If the original set was independent, the new set is independent (and orthogonal) as well.

So, for example, if Gram-Schmidt is applied to a basis for an inner product space, it will produce an orthogonal basis for the space.

Finally, you can always produce orthonormal set from a orthogonal set (of nonzero vectors) --- merely divide each vector in the orthogonal set by its length.

Example. ( Gram-Schmidt) Apply Gram-Schmidt to the set

$$\left\{\left[\matrix{3 \cr 0 \cr 4 \cr}\right], \left[\matrix{-1 \cr 0 \cr 7 \cr}\right], \left[\matrix{2 \cr 9 \cr 11 \cr}\right]\right\}.$$

$$v_1' = v_1 = \langle 3, 0, 4\rangle,$$

$$v_2' = \langle -1, 0, 7\rangle - \dfrac{\langle -1, 0, 7\rangle\cdot \langle 3, 0, 4\rangle} {\langle 3, 0, 4\rangle\cdot \langle 3, 0, 4\rangle} \langle 3, 0, 4\rangle = \langle -4, 0, 3\rangle,$$

$$v_3' = \langle 2, 9, 11\rangle - \dfrac{\langle 2, 9, 11\rangle\cdot \langle 3, 0, 4\rangle} {\langle 3, 0, 4\rangle\cdot \langle 3, 0, 4\rangle} \langle 3, 0, 4\rangle - \dfrac{\langle 2, 9, 11\rangle\cdot \langle -4, 0, 3\rangle} {\langle -4, 0, 3\rangle\cdot \langle -4, 0, 3\rangle} \langle -4, 0, 3\rangle = \langle 0, 9, 0\rangle.$$

(A common mistake here is to project onto $v_1$ , $v_2$ , ... . I need to project onto the vectors that have already been orthogonalized. That is why I projected onto $\langle 3, 0, 4\rangle$ and $\langle -4, 0,
   3\rangle$ rather than $\langle 3, 0, 4\rangle$ and $\langle -1, 0,
   7\rangle$ .)

The set

$$\left\{\left[\matrix{3 \cr 0 \cr 4 \cr}\right], \left[\matrix{-4 \cr 0 \cr 3 \cr}\right], \left[\matrix{0 \cr 9 \cr 0 \cr}\right]\right\}$$

is orthogonal.

The correponding orthonormal set is

$$\left\{\dfrac{1}{5}\left[\matrix{3 \cr 0 \cr 4 \cr}\right], \dfrac{1}{5}\left[\matrix{-4 \cr 0 \cr 3 \cr}\right], \left[\matrix{0 \cr 1 \cr 0 \cr}\right]\right\}.\quad\halmos$$

Example. ( Gram-Schmidt) Find an orthonormal basis for the subspace spanned by the vectors

$$v_1 = \langle 1,0,2,2\rangle, \quad v_2 = \langle 10,1,0,4\rangle, \quad v_3 = \langle 1,1,0,13\rangle.$$

I'll use $v_1'$ , $v_2'$ , $v_3'$ to denote the orthonormal basis.

To simplify the computations, you should fix the vectors so they're mutually perpendicular first. Then you can divide each by its length to get vectors of length 1.


$$v_1' = v_1 = \langle 1,0,2,2\rangle.$$


$$v_2' = v_2 - \dfrac{v_2\cdot v_1'} {v_1'\cdot v_1'}v_1' = \langle 10,1,0,4\rangle - \dfrac{\langle 10,1,0,4\rangle\cdot \langle 1,0,2,2\rangle} {\langle 1,0,2,2\rangle\cdot \langle 1,0,2,2\rangle} \langle 1,0,2,2\rangle =$$

$$\langle 10,1,0,4\rangle - (2)\langle 1,0,2,2\rangle = \langle 10,1,0,4\rangle - \langle 2,0,4,4\rangle = \langle 8,1,-4,0\rangle.$$

You can check that $v_2'\cdot v_1' = 0$ , so the first two are perpendicular.


$$v_3' = v_3 - \dfrac{v_3\cdot v_1'} {v_1'\cdot v_1'}v_1' - \dfrac{v_3\cdot v_2'} {v_2'\cdot v_2'}v_2' =$$

$$\langle 1,1,0,13\rangle - \dfrac{\langle 1,1,0,13\rangle\cdot \langle 1,0,2,2\rangle} {\langle 1,0,2,2\rangle\cdot \langle 1,0,2,2\rangle} \langle 1,0,2,2\rangle - \dfrac{\langle 1,1,0,13\rangle\cdot \langle 8,1,-4,0\rangle} {\langle 8,1,-4,0\rangle\cdot \langle 8,1,-4,0\rangle} \langle 8,1,-4,0\rangle =$$

$$\langle 1,1,0,13\rangle - \left(\dfrac{27}{9}\right) \langle 1,0,2,2\rangle - \left(\dfrac{9}{81}\right) \langle 8,1,-4,0\rangle = \langle 1,1,0,13\rangle - \langle 3,0,6,6\rangle - \left\langle \dfrac{8}{9}, \dfrac{1}{9},-\dfrac{4}{9},0\right\rangle =$$

$$\left\langle -\dfrac{26}{9},\dfrac{8}{9}, -\dfrac{50}{9},7\right\rangle.$$

Thus, the orthogonal set is

$$v_1' = \langle 1,0,2,2\rangle, \quad v_2' = \langle 8,1,-4,0\rangle, \quad v_3' = \left\langle -\dfrac{26}{9},\dfrac{8}{9}, -\dfrac{50}{9},7\right\rangle.$$

To get an orthonormal basis, divide each of these vectors by its length:

$$\left\langle \dfrac{1}{3},0,\dfrac{2}{3},\dfrac{2}{3}\right\rangle, \left\langle \dfrac{8}{9},\dfrac{1}{9},-\dfrac{4}{9},0\right\rangle, \left\langle -\dfrac{26}{9\sqrt{89}},\dfrac{8}{9\sqrt{89}}, -\dfrac{50}{9\sqrt{89}},\dfrac{7}{\sqrt{89}}\right\rangle. \quad\halmos$$

For the next result, I'll use the following convention. Vectors will be understood to be column vectors; that is, v will refer to an n-dimensional column vector

$$v = \left[\matrix{v_1 \cr v_2 \cr \vdots \cr v_n \cr}\right].$$

If I need an n-dimensional row vector, I'll take the transpose. Thus,

$$v^T = \left[\matrix{v_1 & v_2 & \cdots & v_n \cr}\right].$$

Lemma. Let A be an invertible $n \times n$ matrix with entries in $\real$ . Then

$$(x, y) = x^TA^TAy$$

defines an inner product on $\real^n$ .

Proof. I have to check linearity, symmetry, and positive definiteness.

First, if $a \in \real$ , then

$$(ax_1 + x_2, y) = (ax_1 + x_2)A^TAy = a(x_1A^TAy) + x_2A^TAy = a(x_1, y) + (x_2, y).$$

This proves that the function is linear in the first slot.


$$(x, y) = x^TA^TAy = (y^TA^TAx)^T = y^TA^TAx = (y, x).$$

The second equality comes from the fact that $(BC)^T = C^TB^T$ for matrices. The third inequality comes from the fact that $y^TA^TAx$ is a $1 \times 1$ matrix, so it equals its transpose.

This proves that the function is symmetric.


$$(x, x) = x^TA^TAx = (Ax)^T(Ax).$$

Now $Ax$ is an $n \times 1$ vector --- I'll label its components this way:

$$Ax = \left[\matrix{u_1 \cr u_2 \cr \vdots \cr u_n \cr}\right].$$


$$(x, x) = (Ax)^T(Ax) = \left[\matrix{u_1 & u_2 & \cdots & u_n \cr}\right] \left[\matrix{u_1 \cr u_2 \cr \vdots \cr u_n \cr}\right] = u_1^2 + u_2^2 + \cdots + u_n^2 \ge 0.$$

That is, the inner product of a vector with itself is a nonnegative number. All that remains is to show that if the inner product of a vector with itself is 0, them the vector is $\vec{0}$ .

But with the notation above,

$$0 = (x, x) = u_1^2 + u_2^2 + \cdots + u_n^2$$

implies $u_1 = u_2 = \cdots = u_n = 0$ , i.e.

$$Ax = \left[\matrix{u_1 \cr u_2 \cr \vdots \cr u_n \cr}\right] = \left[\matrix{0 \cr 0 \cr \vdots \cr 0 \cr}\right] = 0.$$

Finally, I'll use the fact that A is invertible:

$$A^{-1}Ax = A^{-1}0, \quad x = 0.$$

This proves that the function is positive definite, so it's an inner product.

Example. The previous lemma provides lots of examples of inner products on $\real^n$ besides the usual dot product. All I have to do is take an invertible matrix A and form $A^TA$ , defining the inner product as above.

For example,

$$A = \left[\matrix{5 & 2 \cr 2 & 1 \cr}\right] \quad\hbox{is invertible}.$$


$$A^TA = \left[\matrix{29 & 12 \cr 12 & 5 \cr}\right].$$

(Notice that $A^TA$ will always be symmetric.) The inner product defined by this matrix is

$$(\langle x_1,x_2\rangle, \langle y_1,y_2\rangle) = \left[\matrix{x_1 & x_2 \cr}\right] \left[\matrix{29 & 12 \cr 12 & 5 \cr}\right] \left[\matrix{y_1 \cr y_2 \cr}\right] = 29x_1y_1 + 12x_2y_1 + 12x_1y_2 + 5x_2y_2.$$

For example, under this inner product,

$$(\langle 1,2\rangle, \langle -8,3\rangle) = -358, \quad \|\langle 5,-2\rangle\| = \sqrt{505}.\quad\halmos$$

Definition. A matrix A in $M(n,\real)$ is orthogonal if $AA^T = I$ .

Proposition. Let A be an orthogonal matrix.

(a) $\det(A) = \pm 1$ .

(b) $AA^T = I = A^TA$ --- in other words, $A^T = A^{-1}$ .

(c) The rows of A form an orthonormal set. The columns of A form an orthonormal set.

(d) A preserves dot products --- and hence, lengths and angles --- in the sense that

$$(Ax)\cdot(Ay) = x\cdot y.$$

Proof. (a) If A is orthogonal,

$$\det(AA^T) = \det(I) = 1, \quad\hbox{so}\quad \det(A)det(A^T) = 1, \quad\hbox{or}\quad (\det(A))^2 = 1.$$

Therefore, $\det(A) = \pm 1$ .

(b) Since $\det A = \pm 1$ , the determinant is certainly nonzero, so A is invertible. Hence,

$$A^{-1}AA^T = A^{-1}I = A^{-1}, \quad\hbox{or}\quad A^T = A^{-1}.$$

But $A^{-1}A = I$ , so $A^TA= I$ as well.

(c) The equation $AA^T = I$ implies that the rows of A form an orthonormal set of vector. Likewise, $A^TA = I$ shows that the same is true for the columns of A.

(d) The ordinary dot product of vectors $x = \langle x_1,x_2,\ldots,x_n\rangle$ and $y = \langle
   y_1,y_2,\ldots,y_n\rangle$ can be written as a matrix multiplication:

$$x\cdot y = \left[\matrix{x_1 & x_2 & \cdots & x_n \cr}\right] \left[\matrix{y_1 \cr y_2 \cr \vdots \cr y_n \cr}\right] = x^Ty.$$

(Remember the convention that vectors are column vectors.)

Suppose A is orthogonal. Then

$$(Ax)\cdot(Ay) = (Ax)^T(Ay) = x^TA^TAy = x^TIy = x^Ty = x\cdot y.$$

In other words, orthogonal matrices preserve dot products. It follows that orthogonal matrices will also preserve lengths of vectors and angles between vectors, because these are defined in terms of dot products.

Example. Find real numbers a and b such that the following matrix is orthogonal:

$$A = \left[\matrix{ a & 0.6 \cr b & 0.8 \cr}\right].$$

Since the columns of A must form an orthonormal set, I must have

$$(a, b) \cdot (0.6, 0.8) = 0 \quad\hbox{and}\quad \|(a, b)\| = 1.$$

(Note that $\|(0.6, 0.8)\| = 1$ already.) The first equation gives

$$0.6 a + 0.8 b = 0.$$

The easy way to get a solution is to swap 0.6 and 0.8 and negate one of them; thus, $a = -0.8$ and $b = 0.6$ .

Since $\|(-0.8, 0.6)\| = 1$ , I'm done. (If the a and b I chose had made $\|(a, b)\| \ne 1$ , then I'd simply divide $(a, b)$ by its length.)

Example. Orthogonal $2 \times 2$ matrices represent rotations of the plane about the origin or reflections across a line through the origin.

Rotations are represented by matrices

$$\left[\matrix{\cos \theta & -\sin \theta \cr \sin \theta & \cos \theta \cr}\right].$$

You can check that this works by considering the effect of multiplying the standard basis vectors $(1, 0)$ and $(0, 1)$ by this matrix.

Multiplying a vector by the following matrix product reflects the vector across the line L that makes an angle $\theta$ with the x-axis:

$$\left[\matrix{\cos \theta & -\sin \theta \cr \sin \theta & \cos \theta \cr}\right] \left[\matrix{1 & 0 \cr 0 & -1 \cr}\right] \left[\matrix{\cos \theta & \sin \theta \cr -\sin \theta & \cos \theta \cr}\right].$$

Reading from right to left, the first matrix rotates everything by $-\theta$ radians, so L coincides with the x-axis. The second matrix reflects everything across the x-axis. The third matrix rotates everything by $\theta$ radians. Hence, a given vector is rotated by $-\theta$ and reflected across the x-axis, after which the reflected vector is rotated by $\theta$ . The net effect is to reflect across L.

Many transformation problems can be easily accomplished by doing transformations to reduce a general problem to a special case.

Send comments about this page to:

Bruce Ikenaga's Home Page

Copyright 2008 by Bruce Ikenaga