One way to arrive at the definition of continuous functions

If you are seeing it for the first time, then the following definition is quite a lot to take in all at once:

f is continuous at x if, for every e > 0 there exists d > 0 such that |f(y)-f(x)| < e whenever |y-x| < d.

The definition is usually presented as the rigorous formulation of our intuitive notion that a continuous function is one with a graph that can be drawn without lifting pen from paper. On the other hand, a beginning analysis course usually includes various standard examples of strange functions that demonstrate that the rigorous definition and the intuitive one do not quite coincide. (For example, there are functions that are continuous at every irrational x and discontinuous at every rational x. It is impossible to make intuitive sense of this statement in terms of drawing the graph of such a function.)

It may be easier to forget all about drawing graphs, and begin with a different set of intuitive ideas, coming from how we take measurements in real life. Suppose we have some physical quantity y that depends in some way on another physical quantity x. For example, x might be the number of seconds after dropping a stone from the top of a high building and y might be the number of metres travelled by the stone since it was dropped. Suppose further that we wish to determine y, but can only actually measure x. In the example given, it might be night time. If we want to know how far the stone has gone, we cannot look at the stone because it is dark. However, elementary mechanics tells us that y=g x2/2, where g is the acceleration due to gravity (I ignore wind resistance etc.) so instead we could look at a stopwatch, determine x and then calculate y.

One problem with this is that, by the time we have done the calculation, the stone will probably have reached the ground. However, if we were really keen, we could set up a computer with a diagram of the building and a small dot indicating the position of the stone, and we could do it in such a way that the computer graphic really did indicate where the stone was until it hit the ground.

Much more important from the point of view of continuity is the following (very standard) observation: physical measurements are not real numbers. That is, a measurement of a physical quantity will not be an exactly accurate infinite decimal. Rather, it will usually be given in the form of a finite decimal together with some error estimate: x=3.14 +/- 0.02 or something like that.

In the example of the stone, the fact that one cannot be exactly accurate does not much matter. The computer can contain a stopwatch which is accurate to a very small fraction of a second, and there are enough pixels on the screen for the small dot to indicate the position of the stone at least as accurately as our brains can track it. The definition of continuity begins to emerge as soon as one asks why small inaccuracies do not matter.

One must be careful to separate two different aspects of this question. One is why we can tolerate small inaccuracies in y (the position) and the other is why we can tolerate small inaccuracies in x (the time). The first question is not a particularly mathematical one. We don't mind knowing y to `only' fifteen decimal places because we cannot detect changes in y of order 10-15. One could go further and say that it is not obvious what it even means to talk about the position of a stone, which after all occupies quite a bit of space, to an accuracy of 10-15. Are we talking about its centre of gravity? That doesn't solve all our problems, because we then have to decide where the boundary of the stone is - that is, what counts as stone and what doesn't. At a distance scale of 10-15, this becomes a somewhat arbitrary decision.

The second question is more interesting mathematically - why do small inaccuracies in x not matter too much? Remember that what we are really interested in is y, but circumstances are forcing us to determine y indirectly, through a measurement of x. Thus, we are in the following situation.

Let us write y=f(x). The scientific theory tells us what f is in our given situation. (For the stone, f(x) was g x2/2.) If we are forced to have some inaccuracy in our measurement of x, but can tolerate a small inaccuracy in y, then the following question is of paramount importance: what is the effect on y of a small change in x?

To make this question more precise, let us suppose that we need to know y for some practical reason, and we are told that it is vitally important that y should be accurate to five significant figures. Can we guarantee this (assuming our scientific theory to be correct) if we are able to measure x accurately enough?

Two contrasting examples.

In the case of the function f(x)=g x2/2, the answer is yes. Let us consider the effect of perturbing x very slightly, which we can do by replacing x by x+h, where h is very small (and possibly negative). We find that

f(x+h)=(g/2)(x2+2xh+h2)

If h is small enough, then both 2xh and h2 will be smaller than 10-6. Since g is less than 10, it follows that (g/2)(2xh+h2) is less than 10-5. In other words, as long as x is measured accurately enough, f(x) will be accurate to within 10-5. Now it is quite obvious that there was nothing special about the number 10-5 in the above argument. I could have replaced it by any positive number. Therefore, the following statement holds true.

Given any desired accuracy for y=f(x), one can ensure that accuracy by measuring x sufficiently accurately.

Now let us consider a different physical situation. Imagine that we have a rod with one end attached, via a hinge, to the ground in such a way that the rod can move freely in some vertical plane. The position of the rod is determined by the angle it makes with the vertical, which can vary from -90 (the rod is lying on the ground pointing to our left) through 0 (the rod is vertical) to +90 (the rod is on the ground pointing right). Suppose that the hinge is very well oiled, and we now perform the following simple experiment. First, we hold the rod still in some position, then we let go. If the rod moves, we wait until it stops and then record its position.

Let x be the initial position of the rod, given as an angle to the vertical, and let y be the position after it stops. Notice that y can take one of three values (assuming a frictionless hinge): -90, 0 or 90. Moreover, it takes the value -90 when x is negative (if the rod slants to the left then it falls to the left), 0 when x is 0 (if the rod starts exactly vertical, then it balances and does not move) and +90 when x is positive (if it slants to the right then it falls to the right). Writing y=h(x), we have that h(x) is -90 when x is negative, 0 when x is zero and 90 when x is positive.

Now suppose that we cannot measure x exactly and we want to know y to a good accuracy. If x slants noticeably, then we can do it. For example, if x=-10 (i.e., x slants to the left by 10 degrees) then the rod falls to the left, so y=-90, and even if our measurement was only accurate to within one degree, our calculation would still be correct, since all numbers between -11 and -9 are negative.

On the other hand, we will have a problem if x=0. If it happens that the rod is exactly vertical, we cannot say with any confidence where it will end up, since, in the absence of infinitely accurate measurement, we do not know that the rod is exactly vertical. If there is even the slightest slant to the left, then it will fall (possibly after a very long time) to the left, and similarly for the right.

Of course, I have made some physically unrealistic assumptions, such as the frictionless hinge, or the absence of small vibrations that might affect the experiment. However, mathematically speaking, we can say that the function h has the following property (and a more physically realistic function would approximate the property).

If all we know about x is that it is very close to zero, then we cannot determine h(x) to within 10 degrees. That is, if we know that x lies between -d and d, then, however small d is, we still cannot approximate h(x).

The definition of continuity.

The difference between the functions f and h above is that f is continuous and h is discontinuous. Roughly speaking, this means that small changes in x cause only small changes in f(x), while they may cause large changes in h(x) if x is close to zero. How do we get from this idea to the usual epsilon-delta definition? Since this is an HTML document, I shall write e and d instead of epsilon and delta.

Well, I have talked a lot about wanting to know f(x) to a given accuracy. How can we represent this mathematically? We must choose an amount of error that we are prepared to tolerate. Let us call this number e. We then hope that by determining x accurately enough, we can determine f(x) to within e. What do we mean by the phrase `accurately enough'? We mean that there is some amount of error in our specification of x, say d, which is small enough that knowing x to within d tells us f(x) to within e. That is, if y is some number within d of x - in other words, if |y-x| < d - then f(y) will equal f(x) to within an accuracy of e - in other words, |f(y)-f(x)| < e.

Now remember that we wanted to be able to do this for any given error e in the quantity f(x). In other words, whatever accuracy e is demanded of f(x), if d is small enough, then knowing x to within d determines f(x) to within e. Finally, writing this out as a definition, we say

f is continuous at x if for every e > 0 (the accuracy demanded of f(x)) there exists a d > 0 (the accuracy that will be sufficient for x) such that if |y-x| < d then |f(y)-f(x)| < e.

This may well still seem complicated. However, as I have said, it is just a mathematical formalization of the following idea: in a world where perfect accuracy is neither necessary nor possible, one can determine f(x) to any desired accuracy as long as x itself can be determined accurately enough.

Another way of thinking about it.

Here is an alternative way of thinking about continuity. It is not that different from what I have given already, but perhaps just different enough to be worth mentioning. Imagine that you have a very powerful computer which can handle very long decimal expansions: in fact, if you say that you want to work to n decimal places, the computer will be happy to oblige, however big n is (a bit like Maple and Mathematica if n is kept realistic). The one thing it cannot do, of course, is deal with infinite decimals.

Now suppose you want to write some routines to calculate certain functions, and you want to be able to calculate these functions to arbitrary precision. If you take a function like x2 you can write a program roughly like this: input n (the desired number of significant figures); input first n+3 digits of x; calculate the square, y, of the resulting (long, but finite) decimal; output y, truncated to n significant figures. Here, the output is very close to x2 because the input is very close to x.

By contrast, if a function f(x) is defined to be 0 when x is less than pi and 1 when x is greater than equal to pi, we cannot design a satisfactory program for calculating f. If x is nowhere near pi then of course we can calculate f(x) easily, but f(pi) itself cannot even be approximated if we do not have infinite precision. Why not? Because, however many significant figures of pi you give as your input, you will not determine for the computer whether the number you are talking about is less than pi or greater than equal to pi, so all it can tell you is that f(x) is either 0 or 1, which is hardly a good approximation. This is not a surprise - it just expresses the fact that f is discontinuous at pi.