This page ended up being much longer than I had expected when I started writing it. If you wish, you can get to the main point straight away.
What is a set? At school one is told that a set is a collection of objects, usually with some property in common. For example, one could divide a class into the set G of all girls and the set B of all boys. One could also divide it into three sets X, Y and Z, consisting of all those whose surnames begin with a letter from A to E, from F to N and from O to Z respectively. Simple examples like this allow one to define the basic set-theoretic operations such as unions and intersections. For example, G cap Y is the set of all girls with surnames beginning with a letter between F to N.
Later one comes to regard such sets as a little babyish. The real sets of interest consist of mathematical objects, such as numbers or points in space. So what is a set, if we restrict ourselves to the mathematical world? Well, it is a collection of objects of a mathematical kind. But what does that mean? What is a collection?
Here is an attempt to answer that question which fails badly. Let V be the totality of all mathematical objects. A set is just a function f from V to the pair {0,1}. I call v an element of the set if and only if f(v)=1.
A few objections to the above suggestion: (i) the definition rests on the notion of a function, which itself is usually defined in terms of sets; (ii) it is dangerous to talk about the totality of all mathematical objects, as it lays one open to Russell's paradox (and in any case it is difficult to make the notion precise without using set theory); (iii) it also uses the notion of the pair {0,1}, which is a set. The third objection is not too serious, as one can probably find a way of defining small sets. However, the first two certainly are troublesome. Let us focus on each of them in turn.
In order to avoid difficulty (ii), let us consider a simpler question: what is a set of positive integers? I shall assume that we know what the set N of all positive integers is, so what I am asking is what is meant by the notion of a subset of N. Intuitively, the answer to this question seems clear - you just pick out some of the positive integers and not others. So how can we formalize this idea? We would like to think of a set of integers as something like a method of dividing them into two classes - those that are chosen and those that are not. This sounds suspiciously like a function f from N to a set of size two (which we might as well take to be the set {0,1}). Unfortunately, we are now facing difficulty (i) again. Is there some way to define functions without relying on the notion of a set?
It is clear that we haven't made much progress, since there is a straightforward equivalence between the notion of a subset of N and the notion of a function from N to {0,1}: for each set X define f(x) to be 1 if x is in X and 0 otherwise, and for each 01-valued function f define X to be the set {x:f(x)=1}. To try to break the deadlock, it may help if we look at some examples of subsets of N.
For each of the sets X_{i} above, we specified some rule that determined which positive integers lay in the set and which did not. For example, for X_{1} the rule is that n lies in the set if and only if n is one of the numbers 1,4,7 or 17, and for X_{8} the rule is that n lies in the set if and only if it is the r^{th} prime for some integer r such that, for some other positive integer k, r is the smallest integer that can be written as the sum of three kth powers in three distinct ways. In each case, the rule is something that we can specify unambiguously. Could it be that what we are searching for is a definition along the following lines: a set X of positive integers is a precisely specified rule that determines, for each positive integer n, whether or not n belongs to X?
An easy argument shows that this is not what we mean when we talk about sets: there are only countably many rules (we use finite strings of finitely many symbols to specify them) and uncountably many subsets of N. Therefore, there are undefinable sets of positive integers. In fact, almost all sets of integers are undefinable.
This observation leads to well known paradoxes, but ones that can be dealt with. A discussion of them can be found here .
For now, though, let us merely remark that if almost all sets of positive integers are undefinable, then we will not understand what the word `set' means, in general, by looking at specific examples. What we are trying to capture is the following intuition: a set X of positive integers is a rule which tells you, for each positive integer n, whether or not n belongs to X, except that the rule can be totally arbitrary and impossible to specify in a finite time .
There is something a little hazy about the idea of subsets of N that cannot be defined. In particular, one cannot actually give examples of them. (But once again see here for a fuller discussion of this point.) So why should one bother with them? The answer is that although one does not study specific undefinable sets of positive integers (since that is impossible), it is very useful to talk about the class of all sets of positive integers. This class has properties which can be discussed without reference to specific sets. To give a simple example: if I take two subsets A and B, I can define their union to be the set of all n such that n belongs either to A or to B (or to both). This definition makes sense even if A and B are undefinable, and so do many other operations.
Now, simple operations such as finite unions and intersections, when applied to definable sets, give rise to definable sets. So for what kind of operation is the class of definable sets inadequate?
There is an obvious way to come up with at least some sort of answer to this question, and that is to examine carefully the proof that there exists an undefinable set, trying to interpret the set that appears at the end of the proof as the result of some operation applied to sets that are definable.
Here, then, is a very brief sketch of the diagonal argument needed to prove that there is an undefinable set.
Haven't I just defined A? That is the paradox I discuss elsewhere . However, it is clear that I have not defined A according to the method chosen in step 1, and in any case what concerns me here is how we built the set A out of the sets A_{n}. Though the method was simple, it was somewhat artificial. Could we not do all useful mathematics just using `natural' operations, and never leaving the world of definable sets?
The answer to that question is less clear cut than one might think, but the following statement is hard to deny: there are many circumstances in which it is very convenient to use operations that allow us to prove that there are uncountably many sets, thereby forcing us to refer to classes of objects that are almost all undefinable. This becomes very clear when we talk not about subsets of N but about real numbers.
Here is a version of Cantor's diagonal argument for the real numbers (due to Cantor himself).
As a corollary, there are undefinable real numbers. (Note that we could have proved this by taking an undefinable set of positive integers and converting it into the binary expansion of a real number. However, that would have relied on what you might regard as the unnatural process of diagonalization.)
The above argument of Cantor works as long as you believe in the completeness axiom for the reals. That is, if you are prepared to accept that every monotone increasing sequence that is bounded above has a limit (which we would apply to the sequence a_{1},a_{2},a_{3},....) then you must accept the conclusion that there are uncountably many real numbers, and hence that not all real numbers are definable.
If you do not like indirect proofs of existence, and believe only in those mathematical objects you can construct (for example, you might believe only in those real numbers for which there is an algorithm which outputs their decimal expansion - though this is not the same as the set of definable real numbers, since one can apply a diagonal argument to this set in a definable way) then you will have to modify the completeness axiom in some way. You might wish to say not that every sequence has a limit, but rather that every definable sequence has a limit. However, then you will have to do much more work. For example, suppose that you would like to prove the following useful statement of elementary real analysis: if A is bounded above, then there is a sequence x_{1},x_{2},x_{3},.... converging to sup A. The usual proof goes as follows.
How might we produce a definable version of the above argument? Well, we would presumably start with a definable set A. Next, for step 1 we would try to choose the numbers x_{n} in some completely explicit way. Well, if you define for me a set A, I can probably come up with the x_{n}, but for a general proof, we must specify some method that always works. Here is an attempt.
Unfortunately, the above approach, while looking explicit, is in fact not, because the ordering on the definable real numbers is not itself definable. (If it were, then the diagonal argument would produce a new definable real number - contradiction.)
I do not want to go further and discuss what to do next. All I wish to demonstrate is that much of real analysis would become significantly more complicated if everything had to be definable all the time. Since a major use of sets is to help us talk about real numbers (e.g. via decimal expansions), the same is true of sets. See this dialogue for a further discussion of these matters.
After all this discussion, we are no nearer a definition that does justice to our notion of an arbitrary, not necessarily definable, subset of the natural numbers. Since it is possible to give an explicit bijection between the set of all subsets of the natural numbers and the set of all real numbers, our difficulties apply to real numbers as well. What is a real number?
The abstract approach to this question is simply to ignore it, or else give the following glib answer: a real number is an element of the set of all real numbers. The reason this is not a totally useless definition is that we assume that the set of real numbers has some structure. For example, real numbers can be added and multiplied, put in order and so on. Most importantly, they satisfy the completeness axiom.
Thus, what I am really talking about is the axiomatic approach. In the case of the real numbers, one starts with the axioms for a complete ordered field, proves that any two complete ordered fields are isomorphic, and then defines a real number to be an element of one. It is possible to deduce all the statements of ordinary mathematics, and in particular basic analysis, from the axioms without worrying about what real numbers actually are. That is, what matters about the real numbers is not their essence but how they behave.
Now it is important to show that the axioms one uses are consistent, and for this one should show that there actually is something that satisfies them. In other words, it is dangerous to study complete ordered fields without constructing an example of one. There are several methods known for doing this, such as using Cauchy sequences of rationals, Dedekind cuts or good old-fashioned decimal expansions .
But if we can construct a complete ordered field, does that not give us a good definition of what it means to be a real number, and does it not show that real numbers exist? Unfortunately not: it doesn't really help us with our earlier difficulties, because the method of construction uses notions such as `an arbitrary Cauchy sequence of rationals', or `an arbitrary sequence of numbers between 0 and 9', and we were already uncomfortable with those.
The famous Principia Mathematica of Russell and Whitehead was an attempt to build up all of mathematics from solid foundations. These foundations were set-theoretic axioms and the hope was that these chosen axioms were so manifestly and undeniably true that any logical consequence of them was obviously true as well.
Unfortunately, in order to avoid Russell's paradox, they invented the rather artificial and elaborate theory of types, which meant that their axioms were not quite as self-evident as they might have liked. However, another system of axioms, due to Zermelo and Fraenkel , did the job much more neatly, and most mathematicians today regard their work as secure on the grounds that it can, in principle, be expressed as a purely logical consequence of these axioms (perhaps with the axiom of choice thrown in).
How, roughly, does this reduction-in-principle to set theory work? Well, for example, one can do real analysis using the axioms for a complete ordered field, but a complete ordered field can be constructed out of the rationals (though for this one needs axioms that allow one to talk about arbitrary sequences of rationals). The rationals can be constructed from the integers. (One defines a rational to be an equivalence class of ordered pairs (m,n) of integers with n not zero. (m,n) is equivalent to (p,q) if mq=np. (m,n)+(p,q) is defined , rather than calculated, to be (mq+np,nq) and (m,n)(p,q)=(mp,nq). One then checks that the field axioms are satisfied, and finally writes m/n instead of (m,n).) The integers can be constructed from the positive integers. (Here one defines an integer to be an equivalence class of pairs (m,n), this time with (m,n) equivalent to (p,q) when m+q=n+p. One is thinking of (m,n) as being m-n.)
Finally, the positive integers are defined inductively out of sets, with 1 being {0} (here, owing to the deficiencies of HTML, I write 0 for the empty set, which in fact it is according to this definition), and n being defined as {0,1,2,...,n-1}. The successor of an integer n is defined to be the union of n and {n}.
For this last stage of the reduction to work, we must check that the Peano axioms - the axioms we need to make the integers do what we want of them (listed here along with related material ) are valid. For example, we need to show that if S is a subset of the positive integers (as I have just defined them), if 1 belongs to S and if n+1 belongs to S whenever n does, then S consists of all the positive integers. Checking that the Peano axioms are valid means deducing them from the axioms we have chosen for sets.
Thinking about real numbers abstractly means concentrating not on their essence but on the role they play in the real number system (that is, the unique complete ordered field, up to isomorphism). Similarly, the abstract approach to linear algebra involves regarding a vector not as something like a magnitude and direction (though it is very helpful to bear this picture in mind) but rather as an element of a vector space .
In exactly the same way, in the abstract approach to set theory there is no need to define a set. Rather, a set is something like `an element of the system of all sets'. To make this more precise, let us briefly consider vector spaces again. What are the axioms they satisfy? Well, the main two take the form of closure properties : a vector space is closed under addition and scalar multiplication. That is, if v and w are vectors then so is v+w, and if c is a scalar then cv is a vector. In other words, the axioms give us rules that allow us to create new vectors from old ones. They also give us certain properties that will be satisfied by the new vectors, which imply various relationships with the old ones. If we start with n vectors and see what we can generate using those rules, we obtain a vector space.
What is the analogue for set theory? One defines a model of Zermelo-Fraenkel set theory (or ZF for short) to be a set which is closed under some rather complicated operations given by the ZF-axioms.
Before I go any further, I must quickly correct myself. A model M of ZF is not a set closed under various operations, since its elements consist of all the sets (by which I mean not all the sets that have some transcendent existence but all sets in the model, which satisfies the ZF axioms). If M were a set, in the sense of belonging to the model, then we would have a set which was an element of itself, which would lead to Russell's paradox. It would also contradict the axiom of foundation.
Nevertheless, in order to understand the abstract view of set theory, it can be helpful to stop calling it set theory, just temporarily, and instead call it something else, such as s** theory. A model of s** theory is a set, whose elements are s**s, which satisfy some complicated axioms due to Zermelo and Fraenkel. This set has associated with it a relation, which we happen to call `is an element of', but again all that matters about it is that it has certain properties. The axioms allow you to build new s**s out of old ones in certain ways, and the new s**s have various relationships to the old ones. The model is what you generate by starting with the empty s** and using the axioms. The model itself is not a s** (just as a vector space is not itself a vector in the space) but it is a set.
Now let us revert to usual practice and call the elements of the model M sets and `subsets' of M, which may be too large to be sets, classes. What we are doing when we use these words is somehow standing outside the whole of set theory (and hence mathematics) and realizing that actually, from this godlike perspective, we can think of set theory as just another branch of mathematics.
But, one might object, isn't set theory supposed to be the foundation of the rest of mathematics? If we stand outside set theory in order to reason about it, then what is the status of this reasoning, which is itself mathematical in character?
If you are seriously worried by this question, then you should think about what can possibly be expected from any attempt to provide a foundation for mathematics. As Lewis Carroll noted, if I ever try to demonstrate that some reasoning is sound, then you can always ask why the demonstration itself is sound. All justification has to come to an end somewhere (and now I am quoting Wittgenstein out of context). There is a further, very serious obstacle, which is that, as Gödel showed, it is impossible to prove the consistency of ZF within ZF. This means that we have to take the consistency of our axioms on faith, or else adopt the pragmatic attitude that there doesn't seem to be any short proof that they are inconsistent, and if anybody does manage to find an inconsistency, then somebody else will probably manage to find a new set of axioms that avoids the inconsistency and still allows us to do all normal mathematics.
Given these difficulties, what can we say about the great results of logic, such as the independence of the axiom of choice and the continuum hypothesis? The rough structure of such a result is as follows. To prove, for example, that the axiom of choice is unprovable in ZF, one assumes that a model of ZF exists (i.e., that ZF is consistent) and builds out of this model a new model in which the axiom of choice is false. The process of building uses the axioms of ZF. This shows that if ZF+(not AC) is inconsistent, then so is ZF. Of course, the reasoning might be invalid because ZF was not to be trusted, but that does not affect the conditional statement: if you are happy with ZF (from the point of view of consistency) then you are just as happy with ZF+(not AC).
It is fair to say that, amongst professional mathematicians, there is much less interest in the foundations of mathematics than there was a hundred years ago. I myself do not feel particularly reassured to know that the integers can be constructed out of sets - I am happy to take the integers, with their familiar properties, as given, and build up from there. This sort of attitude is fine for most purposes, but occasionally problems emerge from `normal' mathematics which turn out to be statements independent of ZFC. To establish the independence, it is of course important to understand in detail about how to construct models of set theory (in a way that, as I hope is not too painfully obvious, I do not).