A familiarity with polynomials and with basic real analysis.

Every polynomial (with arbitrary complex coefficients) has a root in the complex plane. (Hence, by the factor theorem, the number of roots of a polynomial, up to multiplicity, equals its degree.)

The proofs of this theorem that I learnt as an undergraduate appeared some way into a course on complex analysis. One proof appealed to Liouville's theorem - if P does not have a root, then one can show that 1/P is bounded and analytic, and therefore constant. Another used Rouché's theorem and the principle of the argument to calculate the number of zeros inside a large circle. This page is to demonstrate that the theorem is intuitively very obvious, even if the details take some work. (In fact, I shall not even give the details.) As with my other pages of this kind, the ideas are well known, but for some reason they often fail to find their way into courses and textbooks, so perhaps they bear repeating.

If you have heard of the impossibility of solving the quintic by radicals, or if you have simply tried and failed to solve such equations, then you will understand that it is unlikely that algebra alone will help us to find a solution of an arbitrary polynomial equation.

In fact, what does it mean to solve a polynomial equation? When we
`solve' quadratics, what we actually do is reduce the problem to
solving quadratics of the particularly simple form x^{2}=C.
In other words, our achievement is relative: * if * it is
possible to find square roots, * then * it is possible to solve
arbitrary quadratic equations. But is it possible to find square
roots? Algebra cannot help us here. (What it * can * do is tell
us that the existence of square roots does not lead to a contradiction
of the field axioms. We simply "adjoin" square roots to the rational
numbers and go ahead and do calculations with them - just as we adjoin
i to the reals without worrying about its existence. See my beginnings of Galois theory page for related
ideas.) The way we prove the existence of the square root of two uses
analysis. (See my dialogue concerning the
existence of root two for a discussion of this.) What is more,
although one can easily devise an algorithm for computing the digits
of root two, the cleanest proof of its existence is the following:
1^{2} < 2 and 2^{2} > 2, and the function
x^{2} is continuous, so by the intermediate value theorem
there exists z between 1 and 2 such that z^{2}=2.

These thoughts suggest the following constraints on what we might expect a proof of the fundamental theorem of algebra to be like. First, we expect it to involve analysis, and second, it seems reasonable to look for an indirect existence proof rather than trying to construct a solution explicitly.

The following general principle is one of the most important,
if not * the * most important, of mathematical problem solving.

* If you are trying to prove a statement P, then find a similar
statement Q that already has a proof, and try to modify the proof
of Q so that it establishes P. *

Is there some statement, similar to the fundamental theorem of algebra, that we know how to prove already? Yes there is - we have just been discussing it. We know how to prove the existence of solutions of many polynomials with real coefficients. If they are sometimes negative and sometimes positive, then the intermediate value theorem tells us that they have solutions. In particular, all real polynomials of odd degree have at least one root.

So we should now ask ourselves whether this proof can be appropriately modified. Before doing this, let us think about it for a while, bearing in mind the following principle.

* If you are hoping to modify a statement or proof, then it is
often helpful to express it in more general (and therefore flexible)
terms. It may well be worth being imprecise in order to achieve this -
once the correct modification has been spotted, precision can be added
in again. *

If P is a real polynomial, then the intermediate value theorem tells us the following: if P(x) < 0 and P(y) > 0 then there exists z between x and y such that P(z)=0. What does `between' mean? It means that either x < z < y or y < z < x. Is anything similar if P is a complex polynomial?

There are several immediate difficulties. One is that we cannot give a meaning to statements such as P(x) > 0, since the complex numbers do not have a natural ordering. Another problem is that we do not have a good notion of `betweenness', for much the same reason. (We could, perhaps, say that z is between x and y if it lies on the line segment joining x to y, but this definition does not lead anywhere useful.)

Let us try to think about the intermediate value theorem in a more visual way. It tells us that if we pick up a stretchy line segment and put it on the real line (possibly folding it in places) without breaking it, and if one end of the segment lies on one side of zero and the other end on the other side, then it must cross zero.

If we put a line segment down in the complex plane, then it is quite obvious that no information about the positions of the end points will force some interior point to hit zero - it can always go round the side. If, on the other hand, we were to put one end in the upper half plane and the other end in the lower half plane, we would be able to find a point that lay on the real axis. Is there some pattern emerging here?

Yes there is. We have to work in the correct dimension. If we live in a one-dimensional world (such as the real numbers) then a one-dimensional set (image of a line segment) must, under appropriate conditions, intersect a zero-dimensional set (the point zero). In a two-dimensional world (such as the complex numbers) a one-dimensional set will intersect a zero-dimensional set only under very exceptional circumstances (and if it does, then it can be made not to by a small perturbation), but it is not hard to force two one-dimensional sets to intersect. In general, in d dimensions, it is easy for an r-dimensional set to miss an s-dimensional set if r+s < d, and not so easy otherwise.

Notice that I have not been precise about what I mean by dimension. For the purposes of thinking about a problem, that does not seem to matter.

Now we would like to find a root of a complex polynomial P. The polynomial can be thought of as picking up the complex plane, stretching it about a bit, and putting it down again. (Once more, this vagueness is useful.) We are trying to show that, no matter how we do this, we will cover the point zero. That is, the image of P must contain zero. Now we may have a chance of proving this, because the image of P is (likely to be) two-dimensional, and the point zero is zero-dimensional. This gives us more of a clue about what our adaptation of the intermediate-value theorem ought to be like.

First, we need a two-dimensional analogue of a closed bounded interval in the reals. What could this be? Closed intervals are usually defined using the ordering on the reals, and we don't have such an ordering in the plane. Let us go back to our discussion of `betweenness'. The closed interval [x,y] consists of all points z that lie between x and y. Can we think of this in a way that does not involve the ordering? Yes we can. We can imagine that x and y are sort of guards, that stop any z in the interval from escaping, or as a zero-dimensional `fence', or boundary. If a one-dimensional set in R needs a zero-dimensional boundary, then we might expect a two-dimensional set in C to need a one-dimensional boundary. (This is because a possible escape route will be one-dimensional, so needs something at least 2-1=1-dimensional to get in its way.)

By now, we have been led to the idea of looking at a region in the complex numbers which is surrounded by some closed curve. Indeed, why not go for the simplest thing and take a circular disc? (We could have reached this idea if we had already thought of an interval as a one-dimensional disc with the end points being a one-dimensional circle - a well known and useful analogy.)

Now we have to decide what to do with this disc. Let us return to the one-dimensional case for inspiration. Here are a few things we can say about the hypotheses needed for a successful application of the intermediate value theorem. Let the interval be [x,y] and the polynomial be P.

- P(x) and P(y) lie on either side of zero.
- Zero lies inside the interval [P(x),P(y)].
- No path from zero can avoid meeting P(x) or P(y).
- Zero lies between P(x) and P(y).

Does that get us anywhere? Well, the points P(x) and P(y) seem to be important. Indeed, they are the images of the boundary of the closed interval [x,y]. The condition we need is that these two points should somehow surround zero. So in the two-dimensional case, when we come to look at a disc, we should look at its boundary circle, and see if it it makes sense to talk of the image of this circle `surrounding' zero.

Here is an example where the concept seems to make sense. Suppose we have a continuous complex-valued function defined on the closed unit disc. Suppose also that f(z)=z whenever |z|=1. (That is, the boundary of the disc remains fixed.) Then the image of the boundary is the unit circle, which certainly seems to surround zero. What is more, it is not at all easy to see how the interior of the disc can possibly avoid hitting zero under these circumstances. (Imagine a circular piece of rubber sheeting, attached all around its boundary to the top of a table, with the interior free to be stretched. It seems that no amount of stretching will ever allow you to see the bit of the table that lies inside the boundary, so that the centre of the circle, in particular, is always covered.)

Here, on the other hand, is an example that should give us pause. Define a continuous function on the closed unit disc as follows. First, stretch the disc into a (two-dimensional) sausage shape. Next, bend the ends of the sausage right round, so that they overlap and we have a ring shape. Now slide this ring shape until zero lies in the middle of it, but is not part of it. It seems that zero is `surrounded' by the image of the disc, but is not contained in the image. What has gone wrong?

The problem seems to be that as you trace the image of the boundary, it goes all the way round zero, but then it doubles back on itself and goes round the other way. In other words, the difference between this example and the first one is that with the previous example, the boundary went round zero once, whereas here it goes round zero times (in total).

Now let us think why it might be true that if the image of the
bounding circle goes round once, then zero is covered by the image of
the disc. Again, the analogy with the one-dimensional situation can
guide us. There, we find that we cannot get from P(x) to P(y) without
crossing zero. That is, if we imagine a point moving continuously
from P(x) to P(y), then it eventually crosses zero. Now this will
not work in two dimensions, because, as I said before, a path can
easily avoid zero. However, * a path of lines * does not
find it so easy (because it traces out a two-dimensional set).
That is, we may be getting somewhere if we can find a continuous
one-parameter family of curves to look at.

Given that we have already got one such curve, namely the image of
the boundary of the disc, a natural choice suggests itself. The curve
we have is the image of the circle of radius 1, but the disc itself is
the union of the circles of radius t, for t between 0 and 1. What can
we say about the images of these curves? Let us write C_{t}
for the image of the circle of radius t.

We are assuming that C_{1} goes once round zero. Can we
say anything else? Well, not all that much, but we do know that
C_{0} consists of a single point, and thus does not go round
zero at all. (If the point equals zero, then we have found our
solution.) We also know that as t varies, the curve C_{t}
varies continuously. So somewhere in between comes a point (that is, a
value of t) where * something changes *. (I have applied a
vague form of the intermediate value theorem in my mind.) But how
could anything change?

If a curve goes once round zero, then a nearby curve goes once round zero. The only way of moving it continuously to stop it going round zero is for a point of the curve to cross over zero. But if this happens as t varies from 1 to 0, then we have a solution.

Well, that idea can be made rigorous. First, one has to decide
what it means to `go round zero' a certain number of times. This
can be done by defining winding numbers carefully, but it can also
be done with no reference to complex analysis. Secondly, one must
show that the winding number of a curve does not change under a
sufficiently small perturbation. Then, one must use the fact that
a continuous integer-valued function defined on an interval (in
this case, the winding number of C_{t} about zero) is
constant, which is more or less the same as the intermediate
value theorem.

However, do we actually get the hypotheses we are looking for? Given a polynomial, can we find a circle C such that P(C) goes round zero once?

Actually, that is not quite the right question. All we need for
the above argument to work is that it should go round zero a non-zero
number of times, and this is easy to ensure. The equivalent step in
showing that an odd-degree real polynomial has a root is to show that
it is positive somewhere and negative somewhere. To do this, one
takes x large and negative and y large and positive. If they are
sufficiently large, then the leading term in the polynomial
dominates everything else, and we are done. If we try the analogous
idea for complex polynomials, we end up considering a very large
circle, so large that we can ignore everything except the leading
term of the polynomial. Since z^{n} goes round zero n times
as z goes round a circle of radius C about the origin, so does
P(z). Define C_{t} to be the image under P of the circle
of radius t. As t goes down from C to 0, C_{t} shrinks to
a point, which does not go round the origin at all. By our earlier
argument, at some value of t the number of times the curve goes
round 0 must change, and for such a value of t the curve C_{t}
contains the origin. (A careful version of this argument establishes
that something changes n times, up to multiplicity, and therefore
that there are n roots.)

Here is a sketch of one argument that makes the above ideas rigorous using Cauchy's integral formula and the homotopy invariance of path integrals. (The second statement is the version of Cauchy's theorem that states that if D is a domain, f is an analytic function defined on D and p and q are closed paths that are homotopic in D, then the integral of f round p equals the integral of f round q. It would be possible but slightly messy to deduce this directly from Cauchy's theorem in the case we shall need.) I stress once again that it is not necessary to use complex analysis - it just provides us with a convenient way of defining winding numbers.

A fundamental fact of complex analysis is that the integral round
the unit circle of 1/z is 2(pi)i. It follows from homotopy invariance,
or a direct calculation, that the same is true if you integrate round
a circle of radius R about 0, whatever the value of R. Now let P be a
monic polynomial of degree n and let r be large enough for the leading
term to dominate. Let p_{r} be the path [t goes to P(r
e^{2(pi)it})] and let q_{r} be the path [t goes to r^n
e^{2(pi)int}]. Thus, q_{r} just goes n times round the
circle of radius r^{n} about the origin. It is easy to check
that p_{r} and q_{r} are homotopic in the complex
plane with the origin removed - simply let the intermediate paths be
(1-s)p_{r}+sq_{r} as s varies from 0 to 1.

It follows that the integral of 1/z round p_{r} is
2(pi)in, since this is certainly the integral round q_{r}. Now
let r shrinks down to zero. This defines a homotopy from p_{r}
to the trivial path p_{0}, unless there exists a t such that
p_{t} contains zero. But it cannot define a homotopy, since
the integral `round' the trivial path is zero and not n. Therefore,
one of the intermediate paths does indeed contain zero, which implies
that P has a root.