If you are seeing it for the first time, then the following definition is quite a lot to take in all at once:

f is * continuous at * x if, for every e > 0
there exists d > 0 such that |f(y)-f(x)| < e whenever
|y-x| < d.

The definition is usually presented as the rigorous formulation of our intuitive notion that a continuous function is one with a graph that can be drawn without lifting pen from paper. On the other hand, a beginning analysis course usually includes various standard examples of strange functions that demonstrate that the rigorous definition and the intuitive one do not quite coincide. (For example, there are functions that are continuous at every irrational x and discontinuous at every rational x. It is impossible to make intuitive sense of this statement in terms of drawing the graph of such a function.)

It may be easier to forget all about drawing graphs, and begin
with a different set of intuitive ideas, coming from how we take
measurements in real life. Suppose we have some physical quantity y
that depends in some way on another physical quantity x. For example,
x might be the number of seconds after dropping a stone from the top
of a high building and y might be the number of metres travelled by the
stone since it was dropped. Suppose further that we wish to determine
y, but can only actually measure x. In the example given, it might
be night time. If we want to know how far the stone has gone, we
cannot look at the stone because it is dark. However, elementary
mechanics tells us that y=g x^{2}/2, where g is the acceleration
due to gravity (I ignore wind resistance etc.) so instead we could
look at a stopwatch, determine x and then * calculate * y.

One problem with this is that, by the time we have done the calculation, the stone will probably have reached the ground. However, if we were really keen, we could set up a computer with a diagram of the building and a small dot indicating the position of the stone, and we could do it in such a way that the computer graphic really did indicate where the stone was until it hit the ground.

Much more important from the point of view of continuity is the following (very standard) observation: physical measurements are not real numbers. That is, a measurement of a physical quantity will not be an exactly accurate infinite decimal. Rather, it will usually be given in the form of a finite decimal together with some error estimate: x=3.14 +/- 0.02 or something like that.

In the example of the stone, the fact that one cannot be
exactly accurate does not much matter. The computer can contain
a stopwatch which is accurate to a very small fraction of a second,
and there are enough pixels on the screen for the small dot to
indicate the position of the stone at least as accurately as our
brains can track it. The definition of continuity begins to
emerge as soon as one asks * why * small inaccuracies
do not matter.

One must be careful to separate two different aspects of this
question. One is why we can tolerate small inaccuracies in y (the
position) and the other is why we can tolerate small inaccuracies
in x (the time). The first question is not a particularly mathematical
one. We don't mind knowing y to `only' fifteen decimal places because
we cannot detect changes in y of order 10^{-15}. One could
go further and say that it is not obvious what it even * means *
to talk about the position of a stone, which after all occupies
quite a bit of space, to an accuracy of 10^{-15}. Are
we talking about its centre of gravity? That doesn't solve all
our problems, because we then have to decide where the boundary of
the stone is - that is, what counts as stone and what doesn't. At
a distance scale of 10^{-15}, this becomes a somewhat
arbitrary decision.

The second question is more interesting mathematically - why do small inaccuracies in x not matter too much? Remember that what we are really interested in is y, but circumstances are forcing us to determine y indirectly, through a measurement of x. Thus, we are in the following situation.

- We want to know y.

- We do not mind small inaccuracies in our knowledge of y.

- We can measure x to a high accuracy, but not a perfect one.

- We have a scientific theory that tells us how y depends on x.

To make this question more precise, let us suppose that we need to know y for some practical reason, and we are told that it is vitally important that y should be accurate to five significant figures. Can we guarantee this (assuming our scientific theory to be correct) if we are able to measure x accurately enough?

In the case of the function f(x)=g x^{2}/2, the
answer is yes. Let us consider the effect of perturbing x
very slightly, which we can do by replacing x by x+h, where
h is very small (and possibly negative). We find that

f(x+h)=(g/2)(x^{2}+2xh+h^{2})

If h is small enough, then both 2xh and h^{2} will
be smaller than 10^{-6}. Since g is less than 10,
it follows that (g/2)(2xh+h^{2}) is less than
10^{-5}. In other words, as long as x is measured
accurately enough, f(x) will be accurate to within 10^{-5}.
Now it is quite obvious that there was nothing special about
the number 10^{-5} in the above argument. I could
have replaced it by * any * positive number. Therefore,
the following statement holds true.

* Given any desired accuracy for y=f(x), one can ensure that
accuracy by measuring x sufficiently accurately. *

Now let us consider a different physical situation. Imagine that we have a rod with one end attached, via a hinge, to the ground in such a way that the rod can move freely in some vertical plane. The position of the rod is determined by the angle it makes with the vertical, which can vary from -90 (the rod is lying on the ground pointing to our left) through 0 (the rod is vertical) to +90 (the rod is on the ground pointing right). Suppose that the hinge is very well oiled, and we now perform the following simple experiment. First, we hold the rod still in some position, then we let go. If the rod moves, we wait until it stops and then record its position.

Let x be the initial position of the rod, given as an angle to the vertical, and let y be the position after it stops. Notice that y can take one of three values (assuming a frictionless hinge): -90, 0 or 90. Moreover, it takes the value -90 when x is negative (if the rod slants to the left then it falls to the left), 0 when x is 0 (if the rod starts exactly vertical, then it balances and does not move) and +90 when x is positive (if it slants to the right then it falls to the right). Writing y=h(x), we have that h(x) is -90 when x is negative, 0 when x is zero and 90 when x is positive.

Now suppose that we cannot measure x exactly and we want to know y to a good accuracy. If x slants noticeably, then we can do it. For example, if x=-10 (i.e., x slants to the left by 10 degrees) then the rod falls to the left, so y=-90, and even if our measurement was only accurate to within one degree, our calculation would still be correct, since all numbers between -11 and -9 are negative.

On the other hand, we will have a problem if x=0. If it happens
that the rod is exactly vertical, we cannot say with any confidence
where it will end up, since, in the absence of infinitely
accurate measurement, we do not * know * that the rod is
exactly vertical. If there is even the slightest slant to the
left, then it will fall (possibly after a very long time) to
the left, and similarly for the right.

Of course, I have made some physically unrealistic assumptions, such as the frictionless hinge, or the absence of small vibrations that might affect the experiment. However, mathematically speaking, we can say that the function h has the following property (and a more physically realistic function would approximate the property).

* If all we know about x is that it is very close to zero,
then we cannot determine h(x) to within 10 degrees. That is,
if we know that x lies between -d and d, then, however small
d is, we still cannot approximate h(x). *

The difference between the functions f and h above is that f is continuous and h is discontinuous. Roughly speaking, this means that small changes in x cause only small changes in f(x), while they may cause large changes in h(x) if x is close to zero. How do we get from this idea to the usual epsilon-delta definition? Since this is an HTML document, I shall write e and d instead of epsilon and delta.

Well, I have talked a lot about wanting to know f(x) to a given accuracy. How can we represent this mathematically? We must choose an amount of error that we are prepared to tolerate. Let us call this number e. We then hope that by determining x accurately enough, we can determine f(x) to within e. What do we mean by the phrase `accurately enough'? We mean that there is some amount of error in our specification of x, say d, which is small enough that knowing x to within d tells us f(x) to within e. That is, if y is some number within d of x - in other words, if |y-x| < d - then f(y) will equal f(x) to within an accuracy of e - in other words, |f(y)-f(x)| < e.

Now remember that we wanted to be able to do this for any given error e in the quantity f(x). In other words, whatever accuracy e is demanded of f(x), if d is small enough, then knowing x to within d determines f(x) to within e. Finally, writing this out as a definition, we say

* f is continuous at x if for every e > 0 (the
accuracy demanded of f(x)) there exists a d > 0 (the
accuracy that will be sufficient for x) such that
if |y-x| < d then |f(y)-f(x)| < e. *

This may well still seem complicated. However, as I have said, it
is just a mathematical formalization of the following idea: in
a world where perfect accuracy is neither necessary nor
possible, one can determine f(x) to * any desired * accuracy
as long as x itself can be determined accurately enough.

Here is an alternative way of thinking about continuity. It is
not that different from what I have given already, but perhaps just
different enough to be worth mentioning. Imagine that you have
a very powerful computer which can handle very long decimal
expansions: in fact, if you say that you want to work to n
decimal places, the computer will be happy to oblige, however big n
is (a bit like Maple and Mathematica if n is kept realistic). The
one thing it cannot do, of course, is deal with * infinite *
decimals.

Now suppose you want to write some routines to calculate certain
functions, and you want to be able to calculate these functions to
arbitrary precision. If you take a function like x^{2} you
can write a program roughly like this: input n (the desired
number of significant figures); input first n+3 digits of x; calculate
the square, y, of the resulting (long, but finite) decimal; output y,
truncated to n significant figures. Here, the output is very close to
x^{2} because the input is very close to x.

By contrast, if a function f(x) is defined to be 0 when x is less
than pi and 1 when x is greater than equal to pi, we cannot design a
satisfactory program for calculating f. If x is nowhere near pi then
of course we can calculate f(x) easily, but f(pi) itself cannot even
be * approximated * if we do not have infinite precision. Why
not? Because, however many significant figures of pi you give as your
input, you will not determine for the computer whether the number you
are talking about is less than pi or greater than equal to pi, so all
it can tell you is that f(x) is either 0 or 1, which is hardly a good
approximation. This is not a surprise - it just expresses the fact that
f is discontinuous at pi.