What is the point of the mean value theorem?

Let me begin by stating the theorem.

Theorem.

Let f be a continuous function from a closed interval [a,b] to the real numbers, and suppose that f is differentiable throughout the open interval (a,b). Then there is some point c in (a,b) such that f'(c) equals (f(b)-f(a))/(b-a).

This statement, when one first sees it, appears to be little more than a curiosity. It may be true, but one wouldn't choose it as an example to persuade somebody of the joys of calculus.

If you have just covered the theorem in lectures, then your lecturer will probably have explained what it is used for, so this page is for those who do not always take in one hundred percent of their lectures, or who have done some calculus but not yet seen the theorem.


The main use of the mean value theorem is in justifying statements that many people wrongly take to be too obvious to need justification. One example of such a statement is the following.

(*) If the derivative of a function f is everywhere strictly positive, then f is a strictly increasing function.

Here, I take f to be real valued and defined on some interval of real numbers. Why is this statement so often thought to be obvious, and why is it not obvious? To answer these two questions, let me first discuss the word `obvious'. And here I cannot improve on a definition told to me by a former colleague of mine (Milne Anderson at UCL).

A statement is obvious if a proof instantly springs to mind.

The virtue of this definition is, of course, that it silences undergraduates who ask, `But why isn't that obvious?' However, it also helps to diagnose their problem, since very often when a statement seems obvious it is because a proof seems to spring instantly to mind. Only when one tries to work out the details of the half-formed argument does one realize why the argument is wrong and the statement is not obvious after all.


What, then, is the half-formed argument that leads so many people to think that statement (*) above is obvious? Here are two possibilities.


Now let us see what is wrong with these two arguments. The problem with the first is that in order to make sense of it one must be precise about concepts such as `slope' and `gradient' at x. Once one is, one probably ends up with argument (ii), so I shall concentrate on that one. It is not hard to choose where to attack it, since it starts off very precisely (and one can imagine it spoken with great confidence) but becomes a little vague (and the tone of voice shriller) towards the end. In particular, the italics, and the word `surely' are a giveaway. An important assumption, made almost but not quite explicit at the end of the argument, is the following one.

(**) Define a function f to be locally strictly increasing if, for every x, there is a y > x such that f(t) > f(x) for every t in the half-open interval (x,y]. Then a function is strictly increasing if (and of course only if) it is locally strictly increasing.

Presumably somebody who believes that (*) is obvious and justifies this belief by proposing argument (ii) is taking assumption (**) as obvious. But is it obvious? Let us try the same technique as before and try to prove it by writing down whatever comes instantly to mind. We ought to start by remembering the definition of a strictly increasing function: f is strictly increasing if x < z implies that f(x) < f(z). (I have chosen the letter z to avoid confusion with the y that appeared earlier.) Suppose, then, that x < z (and that x and z both belong to the domain of f). How are we to go about proving that f(x) < f(z) knowing only that f is locally strictly increasing?

Well, we had better use what we know, so why don't we start by choosing some y > x such that f(t) > f(x) for every t in the interval (x,y]. If y > z then we will have proved what we want, but life may not be so kind. However, even if y < z it is closer to z than x and we can always repeat the process. With the benefit of a little foresight, let us rename y as y1. Since f is locally strictly increasing, we can now find y2 such that f(t) > f(y1) for every t in the interval (y1,y2]. This process can continue, and we obtain from it a sequence y1 < y2 < y3 < ... with the property that f(y1) < f(y2) < f(y3) < ... The trouble is, of course, that there is no guarantee that any of the yn will exceed z.

On the other hand, if every yn is less than z, then we have a bounded sequence which is monotone increasing. It therefore has a limit, u say. It looks plausible that f(u) might be bigger than all the f(yn). How should we prove this?

There seems to be a problem: next to every yn we have an interval where the function is bigger, but there is no reason for any of these intervals to go beyond u. In fact, as they are defined they don't . Worse still, this observation leads rather easily to a counterexample to assertion (**): just take f to be a function which increases up to u and then suddenly drops at u itself. For example, one could define the function f from [0,2] to the reals by setting f(x)=x if x is in [0,1) and f(x)=x-1 if x is in [1,2]. This function is locally strictly increasing according to the definition given.

This apparently serious difficulty is in fact not too hard to deal with. There was something rather unnatural about the definition of locally strictly increasing functions in that it looked only to the right of any given number. Surely (and especially in the light of the example just given) it is more natural to look on both sides. Let us therefore reformulate (**) as follows.

(***) Define a function f to be locally strictly increasing if, for every x, there is a c > 0 such that f(t) < f(x) for every t in the half-open interval [x-c,x) and f(t) > f(x) for every t in the half-open interval (x,x+c]. Then a function is strictly increasing if (and only if) it is locally strictly increasing.

Notice that a function with strictly positive derivative is locally strictly increasing in this new sense, so let us return to our argument. Since our definition is stronger than it was before, we still have our sequence y1, y2, y3,... but this time we can do a bit more with its limit u. By the new definition of locally strictly increasing, we can find a c > 0 such that f(t) < f(u) for every t in the interval [u-c,u). However, this interval contains lots of yn (as yn tends to u) from which it follows, in particular, that f(u) > f(x). So we have made a tiny bit of further progress towards z.

Now we can repeat the whole process, setting u0=u and generating a sequence u0 < u1 < u2 < ... such that f(u0) < f(u1) < f(u2) < ... Needless to say, all our problems will be repeated, and, equally needless to say, we can step past them and produce a whole new sequence, then repeat that whole process again, and again, and so on. Then we can take all those sequences and so on and so on.

Somehow it becomes difficult to organize one's thoughts at this point. However, if you have looked at another page of mine concerning continuous functions on [0,1] you will recognise that identical difficulties occurred there. It is possible to get round these difficulties using ordinal numbers , but there is a much tidier way, and it is the following argument, which proves (***), shows that it is not obvious (though not especially hard either) and brings out the connection with the result on continuous functions.

Proof of (***).

Let x < z. We would like to show that f(x) < f(z). To do this, suppose that f(z) is not greater than f(x) and define u to be `the first point where anything goes wrong', or, to be more precise, the infimum of the set B={t in (x,z]: f(t)<f(x)}. (This infimum exists, since B is bounded below and contains z.) Since f is locally strictly increasing, there is a v > u such that f(t) > f(u) for every t in the interval (u,v]. It follows that u cannot equal x, since x does not belong to B and if u=x then neither does any point in (x,v]. But f is locally strictly increasing, so we can find w < u such that f(w) < f(u). But this w belongs to B, which contradicts the definition of u.


Notice that we have now proved that a function which has strictly positive derivative is strictly increasing - since, as I remarked earlier, it is easy to show that it is locally strictly increasing, according to the second definition. Notice also that there was a small but definite difficulty to overcome. The point of the mean value theorem is that it can be used to deal with that difficulty. Here, then, is the usual one-line argument that deduces (*) from the mean value theorem.

A second proof of (*)

Suppose that f is not strictly increasing. Then we can find x < y such that f(y)>f(x). By the mean value theorem there exists some c in the interval (x,y) such that f'(c)=(f(y)-f(x))/(y-x), which is less than or equal to zero. This contradicts our assumption that f' was positive everywhere.

It should come as absolutely no surprise to learn that the proof of the mean value theorem depends on the fact that continuous functions on closed intervals are bounded and attain their bounds, and that proving that involves, as I have already mentioned, exactly the difficulties that we encountered in proving (*). In other words, there is a genuine difficulty in proving (*), but once you have proved the mean value theorem you can use it and forget about the difficulty. There is still work involved in proving (*), but it is now hidden from view. This is a simple example of a very common phenomenon in mathematics.


Let us explore further the connection between the proof of (***) and the proof that continuous functions on closed intervals are bounded. Since they involved very similar difficulties, is there some lemma which can be proved and then used to make both results easy?

Well, in both cases we had a situation something like this. We had a function f which behaved itself locally and we wanted to show that it behaved itself everywhere. Let us compare the two sorts of local good behaviour.

In both cases, we started with the knowledge that f was locally well behaved. We also found it very useful if we could cover the whole interval [a,b] with finitely many of the intervals I. In the first case, that guarantees that f is bounded on [a,b], and what we really need is a lemma like this.

Lemma

Suppose that for each x in the closed interval [a,b] we have an open interval Ix containing x. (If x is a or b then we ask only that I should be half-open.) Then there is a finite sequence a=x0,x1,x2, ...xn=b such that the union of the corresponding intervals is the whole of [a,b].

This result is easily seen to be equivalent to the Heine-Borel theorem, and can be proved just as I proved it in the special cases.

What about the second case? Unfortunately, the lemma isn't quite enough, owing to examples like the following. (I should remark that I have only just noticed this - I am writing on 13th December 2001.) Suppose that a=0, b=1 and let x1=1/3 and x2=2/3. Let I(x1)=(1/6,1] and I(x2)=[0,5/6). Then the two intervals (which are open in [0,1]) cover [0,1], but all they tell us is that f(0)< f(2/3), f(1/3)< f(2/3) and f(1/3)< f(1), which does not prove that f(0)< f(1).

Here is a slight strengthening of the Heine-Borel theorem that rules out examples of the above kind.

Lemma

Suppose that for each x in the closed interval [a,b] we have an open interval Ix containing x. (If x is a or b then we ask only that I should be half-open.) Then there is a finite sequence a=x0< x1< x2< ... < xn=b such that the union of the corresponding intervals is the whole of [a,b] and for every j the intersection of Ixj-1 and Ixj is non-empty.

Proof

This is more or less the same as the usual proof of the Heine-Borel theorem. Let us say that [0,s] can be properly covered if it can be written as a subset of a finite union of intervals Ixj with the intersection of Ixj-1 and Ixj always non-empty. Obviously if [0,s] can be properly covered, then so can [0,t] for any t< s. Now let u be the supremum of the set of all s such that [0,s] can be properly covered. It cannot be that u< 1 and [0,u] can be properly covered, since one of the open intervals used would contain u and hence some v> u, contradicting the fact that u is an upper bound. On the other hand, if [0,v] can be properly covered for every v< u, then take the interval Iu=(w,x) and combine it with a proper covering of [0,w]. Assume that this proper covering is minimal, which implies that w belongs to Ixk where k is the largest j used. Since Ixk is open, it overlaps with Iu and therefore we have a proper covering of [0,u]. The only way this can fail to be a contradiction is if u=1 and [0,1] can be properly covered.


Here, spelt out, is how to deduce (*) from the slightly modified Heine-Borel theorem.

A third proof of (*)

It follows easily from the definition of differentiability that for every x there is an interval I=(x-c,x+c) such that f(t) < f(x) for every t in I less than x and f(t) > f(x) for every t in I greater than x. (Take half-open intervals when x is a or b.) This gives us an open cover of [a,b]. By the lemma above there is a proper covering of [a,b] by finitely many of these intervals. This provides us with a sequence a=x0 < x1 < x2 < ...xn=b such that the intervals corresponding to neighbouring xj overlap, which implies that f(a)=f(x0) < f(x1) < ... < f(xn)=b. (For example, to see that f(x1) < f(x2), pick y between x1 and x2 and belonging to both the intervals. Then f(x1) < f(y) < f(x2) by the way the intervals were constructed.) Since this can be done for any subinterval [c,d] of [a,b], f is strictly increasing in the whole interval.

Exercise.

Anything that can be proved using the Heine-Borel theorem can also be proved using the Bolzano-Weierstrass theorem. Can you find a proof of (*) using an argument in a Bolzano-Weierstrass spirit?


A fourth proof of (*)

Let a < b. By the fundamental theorem of calculus, f(b)-f(a) is the integral from a to b of f'. Since f' is everywhere positive, this integral is positive.

Why on earth should one bother with the mean value theorem, or indeed any of the above arguments, if we can deduce the result so much more simply and naturally? One answer is that the usual proof of this version of the fundamental theorem of calculus uses the mean value theorem. Another is that it requires the derivative of f to be Riemann integrable (not true for all differentiable functions), so it proves (*) only under a slightly stronger assumption, though one which will hold for most interesting functions.