How to think of a proof of the fundamental theorem of algebra

Prerequisites

A familiarity with polynomials and with basic real analysis.

Statement

Every polynomial (with arbitrary complex coefficients) has a root in the complex plane. (Hence, by the factor theorem, the number of roots of a polynomial, up to multiplicity, equals its degree.)

Preamble

The proofs of this theorem that I learnt as an undergraduate appeared some way into a course on complex analysis. One proof appealed to Liouville's theorem - if P does not have a root, then one can show that 1/P is bounded and analytic, and therefore constant. Another used Rouché's theorem and the principle of the argument to calculate the number of zeros inside a large circle. This page is to demonstrate that the theorem is intuitively very obvious, even if the details take some work. (In fact, I shall not even give the details.) As with my other pages of this kind, the ideas are well known, but for some reason they often fail to find their way into courses and textbooks, so perhaps they bear repeating.

How to come up with a proof.

If you have heard of the impossibility of solving the quintic by radicals, or if you have simply tried and failed to solve such equations, then you will understand that it is unlikely that algebra alone will help us to find a solution of an arbitrary polynomial equation.

In fact, what does it mean to solve a polynomial equation? When we `solve' quadratics, what we actually do is reduce the problem to solving quadratics of the particularly simple form x2=C. In other words, our achievement is relative: if it is possible to find square roots, then it is possible to solve arbitrary quadratic equations. But is it possible to find square roots? Algebra cannot help us here. (What it can do is tell us that the existence of square roots does not lead to a contradiction of the field axioms. We simply "adjoin" square roots to the rational numbers and go ahead and do calculations with them - just as we adjoin i to the reals without worrying about its existence. See my beginnings of Galois theory page for related ideas.) The way we prove the existence of the square root of two uses analysis. (See my dialogue concerning the existence of root two for a discussion of this.) What is more, although one can easily devise an algorithm for computing the digits of root two, the cleanest proof of its existence is the following: 12 < 2 and 22 > 2, and the function x2 is continuous, so by the intermediate value theorem there exists z between 1 and 2 such that z2=2.

These thoughts suggest the following constraints on what we might expect a proof of the fundamental theorem of algebra to be like. First, we expect it to involve analysis, and second, it seems reasonable to look for an indirect existence proof rather than trying to construct a solution explicitly.

The following general principle is one of the most important, if not the most important, of mathematical problem solving.

Principle 1.

If you are trying to prove a statement P, then find a similar statement Q that already has a proof, and try to modify the proof of Q so that it establishes P.

Is there some statement, similar to the fundamental theorem of algebra, that we know how to prove already? Yes there is - we have just been discussing it. We know how to prove the existence of solutions of many polynomials with real coefficients. If they are sometimes negative and sometimes positive, then the intermediate value theorem tells us that they have solutions. In particular, all real polynomials of odd degree have at least one root.

So we should now ask ourselves whether this proof can be appropriately modified. Before doing this, let us think about it for a while, bearing in mind the following principle.

Principle 2.

If you are hoping to modify a statement or proof, then it is often helpful to express it in more general (and therefore flexible) terms. It may well be worth being imprecise in order to achieve this - once the correct modification has been spotted, precision can be added in again.

If P is a real polynomial, then the intermediate value theorem tells us the following: if P(x) < 0 and P(y) > 0 then there exists z between x and y such that P(z)=0. What does `between' mean? It means that either x < z < y or y < z < x. Is anything similar if P is a complex polynomial?

There are several immediate difficulties. One is that we cannot give a meaning to statements such as P(x) > 0, since the complex numbers do not have a natural ordering. Another problem is that we do not have a good notion of `betweenness', for much the same reason. (We could, perhaps, say that z is between x and y if it lies on the line segment joining x to y, but this definition does not lead anywhere useful.)

Let us try to think about the intermediate value theorem in a more visual way. It tells us that if we pick up a stretchy line segment and put it on the real line (possibly folding it in places) without breaking it, and if one end of the segment lies on one side of zero and the other end on the other side, then it must cross zero.

If we put a line segment down in the complex plane, then it is quite obvious that no information about the positions of the end points will force some interior point to hit zero - it can always go round the side. If, on the other hand, we were to put one end in the upper half plane and the other end in the lower half plane, we would be able to find a point that lay on the real axis. Is there some pattern emerging here?

Yes there is. We have to work in the correct dimension. If we live in a one-dimensional world (such as the real numbers) then a one-dimensional set (image of a line segment) must, under appropriate conditions, intersect a zero-dimensional set (the point zero). In a two-dimensional world (such as the complex numbers) a one-dimensional set will intersect a zero-dimensional set only under very exceptional circumstances (and if it does, then it can be made not to by a small perturbation), but it is not hard to force two one-dimensional sets to intersect. In general, in d dimensions, it is easy for an r-dimensional set to miss an s-dimensional set if r+s < d, and not so easy otherwise.

Notice that I have not been precise about what I mean by dimension. For the purposes of thinking about a problem, that does not seem to matter.

Now we would like to find a root of a complex polynomial P. The polynomial can be thought of as picking up the complex plane, stretching it about a bit, and putting it down again. (Once more, this vagueness is useful.) We are trying to show that, no matter how we do this, we will cover the point zero. That is, the image of P must contain zero. Now we may have a chance of proving this, because the image of P is (likely to be) two-dimensional, and the point zero is zero-dimensional. This gives us more of a clue about what our adaptation of the intermediate-value theorem ought to be like.

First, we need a two-dimensional analogue of a closed bounded interval in the reals. What could this be? Closed intervals are usually defined using the ordering on the reals, and we don't have such an ordering in the plane. Let us go back to our discussion of `betweenness'. The closed interval [x,y] consists of all points z that lie between x and y. Can we think of this in a way that does not involve the ordering? Yes we can. We can imagine that x and y are sort of guards, that stop any z in the interval from escaping, or as a zero-dimensional `fence', or boundary. If a one-dimensional set in R needs a zero-dimensional boundary, then we might expect a two-dimensional set in C to need a one-dimensional boundary. (This is because a possible escape route will be one-dimensional, so needs something at least 2-1=1-dimensional to get in its way.)

By now, we have been led to the idea of looking at a region in the complex numbers which is surrounded by some closed curve. Indeed, why not go for the simplest thing and take a circular disc? (We could have reached this idea if we had already thought of an interval as a one-dimensional disc with the end points being a one-dimensional circle - a well known and useful analogy.)

Now we have to decide what to do with this disc. Let us return to the one-dimensional case for inspiration. Here are a few things we can say about the hypotheses needed for a successful application of the intermediate value theorem. Let the interval be [x,y] and the polynomial be P.

  1. P(x) and P(y) lie on either side of zero.
  2. Zero lies inside the interval [P(x),P(y)].
  3. No path from zero can avoid meeting P(x) or P(y).
  4. Zero lies between P(x) and P(y).

Does that get us anywhere? Well, the points P(x) and P(y) seem to be important. Indeed, they are the images of the boundary of the closed interval [x,y]. The condition we need is that these two points should somehow surround zero. So in the two-dimensional case, when we come to look at a disc, we should look at its boundary circle, and see if it it makes sense to talk of the image of this circle `surrounding' zero.

Here is an example where the concept seems to make sense. Suppose we have a continuous complex-valued function defined on the closed unit disc. Suppose also that f(z)=z whenever |z|=1. (That is, the boundary of the disc remains fixed.) Then the image of the boundary is the unit circle, which certainly seems to surround zero. What is more, it is not at all easy to see how the interior of the disc can possibly avoid hitting zero under these circumstances. (Imagine a circular piece of rubber sheeting, attached all around its boundary to the top of a table, with the interior free to be stretched. It seems that no amount of stretching will ever allow you to see the bit of the table that lies inside the boundary, so that the centre of the circle, in particular, is always covered.)

Here, on the other hand, is an example that should give us pause. Define a continuous function on the closed unit disc as follows. First, stretch the disc into a (two-dimensional) sausage shape. Next, bend the ends of the sausage right round, so that they overlap and we have a ring shape. Now slide this ring shape until zero lies in the middle of it, but is not part of it. It seems that zero is `surrounded' by the image of the disc, but is not contained in the image. What has gone wrong?

The problem seems to be that as you trace the image of the boundary, it goes all the way round zero, but then it doubles back on itself and goes round the other way. In other words, the difference between this example and the first one is that with the previous example, the boundary went round zero once, whereas here it goes round zero times (in total).

Now let us think why it might be true that if the image of the bounding circle goes round once, then zero is covered by the image of the disc. Again, the analogy with the one-dimensional situation can guide us. There, we find that we cannot get from P(x) to P(y) without crossing zero. That is, if we imagine a point moving continuously from P(x) to P(y), then it eventually crosses zero. Now this will not work in two dimensions, because, as I said before, a path can easily avoid zero. However, a path of lines does not find it so easy (because it traces out a two-dimensional set). That is, we may be getting somewhere if we can find a continuous one-parameter family of curves to look at.

Given that we have already got one such curve, namely the image of the boundary of the disc, a natural choice suggests itself. The curve we have is the image of the circle of radius 1, but the disc itself is the union of the circles of radius t, for t between 0 and 1. What can we say about the images of these curves? Let us write Ct for the image of the circle of radius t.

We are assuming that C1 goes once round zero. Can we say anything else? Well, not all that much, but we do know that C0 consists of a single point, and thus does not go round zero at all. (If the point equals zero, then we have found our solution.) We also know that as t varies, the curve Ct varies continuously. So somewhere in between comes a point (that is, a value of t) where something changes . (I have applied a vague form of the intermediate value theorem in my mind.) But how could anything change?

If a curve goes once round zero, then a nearby curve goes once round zero. The only way of moving it continuously to stop it going round zero is for a point of the curve to cross over zero. But if this happens as t varies from 1 to 0, then we have a solution.

Well, that idea can be made rigorous. First, one has to decide what it means to `go round zero' a certain number of times. This can be done by defining winding numbers carefully, but it can also be done with no reference to complex analysis. Secondly, one must show that the winding number of a curve does not change under a sufficiently small perturbation. Then, one must use the fact that a continuous integer-valued function defined on an interval (in this case, the winding number of Ct about zero) is constant, which is more or less the same as the intermediate value theorem.

However, do we actually get the hypotheses we are looking for? Given a polynomial, can we find a circle C such that P(C) goes round zero once?

Actually, that is not quite the right question. All we need for the above argument to work is that it should go round zero a non-zero number of times, and this is easy to ensure. The equivalent step in showing that an odd-degree real polynomial has a root is to show that it is positive somewhere and negative somewhere. To do this, one takes x large and negative and y large and positive. If they are sufficiently large, then the leading term in the polynomial dominates everything else, and we are done. If we try the analogous idea for complex polynomials, we end up considering a very large circle, so large that we can ignore everything except the leading term of the polynomial. Since zn goes round zero n times as z goes round a circle of radius C about the origin, so does P(z). Define Ct to be the image under P of the circle of radius t. As t goes down from C to 0, Ct shrinks to a point, which does not go round the origin at all. By our earlier argument, at some value of t the number of times the curve goes round 0 must change, and for such a value of t the curve Ct contains the origin. (A careful version of this argument establishes that something changes n times, up to multiplicity, and therefore that there are n roots.)

How to make the above ideas rigorous

Here is a sketch of one argument that makes the above ideas rigorous using Cauchy's integral formula and the homotopy invariance of path integrals. (The second statement is the version of Cauchy's theorem that states that if D is a domain, f is an analytic function defined on D and p and q are closed paths that are homotopic in D, then the integral of f round p equals the integral of f round q. It would be possible but slightly messy to deduce this directly from Cauchy's theorem in the case we shall need.) I stress once again that it is not necessary to use complex analysis - it just provides us with a convenient way of defining winding numbers.

A fundamental fact of complex analysis is that the integral round the unit circle of 1/z is 2(pi)i. It follows from homotopy invariance, or a direct calculation, that the same is true if you integrate round a circle of radius R about 0, whatever the value of R. Now let P be a monic polynomial of degree n and let r be large enough for the leading term to dominate. Let pr be the path [t goes to P(r e2(pi)it)] and let qr be the path [t goes to r^n e2(pi)int]. Thus, qr just goes n times round the circle of radius rn about the origin. It is easy to check that pr and qr are homotopic in the complex plane with the origin removed - simply let the intermediate paths be (1-s)pr+sqr as s varies from 0 to 1.

It follows that the integral of 1/z round pr is 2(pi)in, since this is certainly the integral round qr. Now let r shrinks down to zero. This defines a homotopy from pr to the trivial path p0, unless there exists a t such that pt contains zero. But it cannot define a homotopy, since the integral `round' the trivial path is zero and not n. Therefore, one of the intermediate paths does indeed contain zero, which implies that P has a root.