Let me begin by stating the theorem.
This statement, when one first sees it, appears to be little more than a curiosity. It may be true, but one wouldn't choose it as an example to persuade somebody of the joys of calculus.
If you have just covered the theorem in lectures, then your lecturer will probably have explained what it is used for, so this page is for those who do not always take in one hundred percent of their lectures, or who have done some calculus but not yet seen the theorem.
The main use of the mean value theorem is in justifying statements that many people wrongly take to be too obvious to need justification. One example of such a statement is the following.
(*) If the derivative of a function f is everywhere strictly positive, then f is a strictly increasing function.
Here, I take f to be real valued and defined on some interval of real numbers. Why is this statement so often thought to be obvious, and why is it not obvious? To answer these two questions, let me first discuss the word `obvious'. And here I cannot improve on a definition told to me by a former colleague of mine (Milne Anderson at UCL).
A statement is obvious if a proof instantly springs to mind.
The virtue of this definition is, of course, that it silences undergraduates who ask, `But why isn't that obvious?' However, it also helps to diagnose their problem, since very often when a statement seems obvious it is because a proof seems to spring instantly to mind. Only when one tries to work out the details of the half-formed argument does one realize why the argument is wrong and the statement is not obvious after all.
What, then, is the half-formed argument that leads so many people to think that statement (*) above is obvious? Here are two possibilities.
Now let us see what is wrong with these two arguments. The problem with the first is that in order to make sense of it one must be precise about concepts such as `slope' and `gradient' at x. Once one is, one probably ends up with argument (ii), so I shall concentrate on that one. It is not hard to choose where to attack it, since it starts off very precisely (and one can imagine it spoken with great confidence) but becomes a little vague (and the tone of voice shriller) towards the end. In particular, the italics, and the word `surely' are a giveaway. An important assumption, made almost but not quite explicit at the end of the argument, is the following one.
(**) Define a function f to be locally strictly increasing if, for every x, there is a y > x such that f(t) > f(x) for every t in the half-open interval (x,y]. Then a function is strictly increasing if (and of course only if) it is locally strictly increasing.
Presumably somebody who believes that (*) is obvious and justifies this belief by proposing argument (ii) is taking assumption (**) as obvious. But is it obvious? Let us try the same technique as before and try to prove it by writing down whatever comes instantly to mind. We ought to start by remembering the definition of a strictly increasing function: f is strictly increasing if x < z implies that f(x) < f(z). (I have chosen the letter z to avoid confusion with the y that appeared earlier.) Suppose, then, that x < z (and that x and z both belong to the domain of f). How are we to go about proving that f(x) < f(z) knowing only that f is locally strictly increasing?
Well, we had better use what we know, so why don't we start by choosing some y > x such that f(t) > f(x) for every t in the interval (x,y]. If y > z then we will have proved what we want, but life may not be so kind. However, even if y < z it is closer to z than x and we can always repeat the process. With the benefit of a little foresight, let us rename y as y_{1}. Since f is locally strictly increasing, we can now find y_{2} such that f(t) > f(y_{1}) for every t in the interval (y_{1},y_{2}]. This process can continue, and we obtain from it a sequence y_{1} < y_{2} < y_{3} < ... with the property that f(y_{1}) < f(y_{2}) < f(y_{3}) < ... The trouble is, of course, that there is no guarantee that any of the y_{n} will exceed z.
On the other hand, if every y_{n} is less than z, then we have a bounded sequence which is monotone increasing. It therefore has a limit, u say. It looks plausible that f(u) might be bigger than all the f(y_{n}). How should we prove this?
There seems to be a problem: next to every y_{n} we have an interval where the function is bigger, but there is no reason for any of these intervals to go beyond u. In fact, as they are defined they don't . Worse still, this observation leads rather easily to a counterexample to assertion (**): just take f to be a function which increases up to u and then suddenly drops at u itself. For example, one could define the function f from [0,2] to the reals by setting f(x)=x if x is in [0,1) and f(x)=x-1 if x is in [1,2]. This function is locally strictly increasing according to the definition given.
This apparently serious difficulty is in fact not too hard to deal with. There was something rather unnatural about the definition of locally strictly increasing functions in that it looked only to the right of any given number. Surely (and especially in the light of the example just given) it is more natural to look on both sides. Let us therefore reformulate (**) as follows.
(***) Define a function f to be locally strictly increasing if, for every x, there is a c > 0 such that f(t) < f(x) for every t in the half-open interval [x-c,x) and f(t) > f(x) for every t in the half-open interval (x,x+c]. Then a function is strictly increasing if (and only if) it is locally strictly increasing.
Notice that a function with strictly positive derivative is locally strictly increasing in this new sense, so let us return to our argument. Since our definition is stronger than it was before, we still have our sequence y_{1}, y_{2}, y_{3},... but this time we can do a bit more with its limit u. By the new definition of locally strictly increasing, we can find a c > 0 such that f(t) < f(u) for every t in the interval [u-c,u). However, this interval contains lots of y_{n} (as y_{n} tends to u) from which it follows, in particular, that f(u) > f(x). So we have made a tiny bit of further progress towards z.
Now we can repeat the whole process, setting u_{0}=u and generating a sequence u_{0} < u_{1} < u_{2} < ... such that f(u_{0}) < f(u_{1}) < f(u_{2}) < ... Needless to say, all our problems will be repeated, and, equally needless to say, we can step past them and produce a whole new sequence, then repeat that whole process again, and again, and so on. Then we can take all those sequences and so on and so on.
Somehow it becomes difficult to organize one's thoughts at this point. However, if you have looked at another page of mine concerning continuous functions on [0,1] you will recognise that identical difficulties occurred there. It is possible to get round these difficulties using ordinal numbers , but there is a much tidier way, and it is the following argument, which proves (***), shows that it is not obvious (though not especially hard either) and brings out the connection with the result on continuous functions.
Notice that we have now proved that a function which has strictly positive derivative is strictly increasing - since, as I remarked earlier, it is easy to show that it is locally strictly increasing, according to the second definition. Notice also that there was a small but definite difficulty to overcome. The point of the mean value theorem is that it can be used to deal with that difficulty. Here, then, is the usual one-line argument that deduces (*) from the mean value theorem.
It should come as absolutely no surprise to learn that the proof of the mean value theorem depends on the fact that continuous functions on closed intervals are bounded and attain their bounds, and that proving that involves, as I have already mentioned, exactly the difficulties that we encountered in proving (*). In other words, there is a genuine difficulty in proving (*), but once you have proved the mean value theorem you can use it and forget about the difficulty. There is still work involved in proving (*), but it is now hidden from view. This is a simple example of a very common phenomenon in mathematics.
Let us explore further the connection between the proof of (***) and the proof that continuous functions on closed intervals are bounded. Since they involved very similar difficulties, is there some lemma which can be proved and then used to make both results easy?
Well, in both cases we had a situation something like this. We had a function f which behaved itself locally and we wanted to show that it behaved itself everywhere. Let us compare the two sorts of local good behaviour.
In both cases, we started with the knowledge that f was locally well behaved. We also found it very useful if we could cover the whole interval [a,b] with finitely many of the intervals I. In the first case, that guarantees that f is bounded on [a,b], and what we really need is a lemma like this.
This result is easily seen to be equivalent to the Heine-Borel theorem, and can be proved just as I proved it in the special cases.
What about the second case? Unfortunately, the lemma isn't quite enough, owing to examples like the following. (I should remark that I have only just noticed this - I am writing on 13th December 2001.) Suppose that a=0, b=1 and let x_{1}=1/3 and x_{2}=2/3. Let I(x_{1})=(1/6,1] and I(x_{2})=[0,5/6). Then the two intervals (which are open in [0,1]) cover [0,1], but all they tell us is that f(0)< f(2/3), f(1/3)< f(2/3) and f(1/3)< f(1), which does not prove that f(0)< f(1).
Here is a slight strengthening of the Heine-Borel theorem that rules out examples of the above kind.
Here, spelt out, is how to deduce (*) from the slightly modified Heine-Borel theorem.
Why on earth should one bother with the mean value theorem, or indeed any of the above arguments, if we can deduce the result so much more simply and naturally? One answer is that the usual proof of this version of the fundamental theorem of calculus uses the mean value theorem. Another is that it requires the derivative of f to be Riemann integrable (not true for all differentiable functions), so it proves (*) only under a slightly stronger assumption, though one which will hold for most interesting functions.