A tiny remark about the Cauchy-Schwarz inequality

The Cauchy-Schwarz inequality is not hard to prove, so there is not much reason for a page devoted to simplifying the usual proof, or rather simplifying the usual presentation of the usual proof. What is more, the idea that follows is so natural that it must be well known to a significant proportion of mathematicians. Hence the word `tiny' above.

Nevertheless, most textbooks and all analysis courses I have attended favour the approach where you write down a quadratic form, use the fact that it is non-negative everywhere, and observe that this implies the Cauchy-Schwarz inequality. No explanation is usually given of where the quadratic form comes from. This page is intended for those who happen not to have observed, or been shown, that more or less the same argument can be made to seem much more natural. Indeed, this is another example of a proof that a well-programmed computer could reasonably be expected to discover.

First, let us consider the basic, real-analysis version of the inequality, namely

a₁b₁+...+a_nb_n < (a₁²+...+a_n²) ^1/2(b₁²+...+b_n²) ^1/2

with equality if and only if the sequences (a_i) and (b_i) are proportional.

How might one go about proving this statement using no tricks? One idea is to try to find a natural way to express the fact that two sequences are proportional. Of course, we could say something like `there exists a constant lambda such that a_i= lambda b_i for every i', but this introduces an unknown constant lambda, and it will make our proof harder later on if we have to find this lambda.

This is not a serious problem though, as we can identify lambda as something like a₁/b₁. And if we dislike the lack of symmetry involved in choosing a₁/b₁ rather than some other a_i/b_i, we could simply say that all the a_i/b_i are equal.

This still leaves a minor problem that some of the b_i may be zero, and the related minor problem that we are not dealing with the a_i and b_i symmetrically. To get round these small difficulties, let us define (a_i) and (b_i) to be proportional if a_ib_j= a_jb_i for every pair i,j.

Now we would like to express this fact analytically , and for this there is a very standard idea. If you want lots of real numbers to be zero then you can achieve this by insisting that the sum of their squares is zero. In this case we want all the numbers a_ib_j-a_jb_i to be zero, so the sequences (a_i) and (b_i) are proportional if and only if

Sum_i,j (a_ib_j-a_jb_i)²=0

and the expression on the left is trivially at least zero.

Expanding out the bracket on the left hand side we get

Sum_i,j(a_i²b_j ² +a_j²b_i² -2a_ib_ja_jb_i)

which equals

2(Sum_ia_i²) (Sum_jb_j²) -2(Sum_ia_ib_i)²

The inequality, together with the equality case, follows immediately, provided that the two sequences are positive, which we may clearly assume.

Note that the only idea above was to write down the proportionality of the two sequences in a nice way. The rest of the argument was an entirely mechanical manipulation. Can we do something similar for the more abstract, inner-product-space version of the inequality?

For some reason the keyboard I am writing this on refuses to do vertical bars, so I shall write [x] for the norm of x and < x,y > for the inner product of x and y. Beginning with the real case, we would like to show that < x,y > is at most [x][y], with equality if and only if x and y are proportional with a positive constant. Can we express the proportionality of x and y without using coordinates?

A first attempt is to say that x and y are proportional if and only if x/[x] and y/[y] are equal. This is not quite accurate (for example, y might be -x), but the inaccuracy works in our favour as the condition is in fact equivalent to x and y being proportional with a positive constant. Bearing in mind that we eventually want a nice expression to deal with, let us rewrite this equality as x[y]-y[x]=0.

We now want some way of distinguishing zero amongst all vectors in an inner-product space. We need go no further than the axioms! Indeed, x[y]-y[x]=0 if and only if

[x[y]-y[x]]²=0

I put the square in because one always likes to expand such an expression in terms of inner products. Indeed, let us do just that, obtaining that

2[x]²[y]²-2[x][y]< x,y >

is greater than or equal to zero, with equality only if x[y]-y[x]=0. If either [x] or [y] is zero then the Cauchy-Schwarz inequality is trivial. Otherwise, we can divide through by 2[x][y] and obtain the inequality in general, with equality if and only if x/[x] and y/[y] are equal, that is, if and only if x and y are proportional with a positive constant of proportionality.

The complex case is not much harder. This time [x[y]-y[x]]² expands out as

2[x]²[y]²-[x][y](< x,y > + < y,x >)

Let w be a complex number of modulus 1 with the property that < x,wy > is real and non-negative, and therefore equal to the modulus of < x,y >. Replacing y with wy we find that the modulus of < x,y > is at most [x][y], with equality if and only if x[y]-wy[x]=0. Thus, equality holds for the modulus of the inner product if and only if x and y are proportional, from which it is easy to see that it holds for the inner product itself if and only if the constant of proportionality is real and positive. (Choosing w above is, admittedly, a trick, but it is a very standard one.)

The idea of the above arguments is to contrast them with the usual, slightly less motivated approach of considering the expression [x-cy]², which is real and non-negative, and then choosing a `clever' value of c from which to deduce the Cauchy-Schwarz inequality. Of course, c can be justified as the value that minimizes the quadratic expression that results from expanding [x-cy]², but even so the idea of writing down [x-cy]² in the first place is not an obvious one.

Actually (this paragraph was added a day or two later) it can be justified as follows. Two vectors x and y are proportional if and only if 0 is a non-trivial linear combination of the two. Moreover, if neither is zero, then they are proportional if and only if x-cy=0 for some constant c. If this does not happen, then the line of points of the form x-cy has some positive distance from 0, which we can calculate by minimizing [x-cy]. However, it seems perverse to bother with this calculation when we know that if x-cy is ever zero, then c must equal [x]/[y].

Just for the record, I looked through my bookshelf for all proofs that I could find of the Cauchy-Schwarz inequality. Only Apostol (Mathematical Analysis, p.20 exercise 1-15) and Jeffreys and Jeffreys (Methods of Mathematical Physics, 3rd Ed. p.54) prove the inequality (for real numbers) this way. The identity that proves it is known as Lagrange's identity. Even they merely ask you to note that Lagrange's identity is true and that it implies the Cauchy-Schwarz inequality. My point above is that the identity is an obvious thing to write down.