Set Implementations

In contrast with other programming languages from its time, Pascal (the programming language) also supports a set type, implemented as a bit pattern. The bit patterns associated with the Pascal set type are $256$ bits wide, but this limitation is not essential and can been replaced with other (larger) values eventually. See Wikipedia for a reference. A rather detailed description of the set type implementation can be found at the Delphi Basics website. So we have the following practice:

a bit pattern in a computer is a set type

However, we also know that

a bit pattern in a computer is a natural number type

Indeed, everybody knows that a natural number can be represented as a binary i.e. a bit pattern. The word "type" has been employed here in order to avoid confusion with other (i.e the standard mathematical) "set" and "number" definitions.

The text in this article will be accompanied with source code in Delphi Pascal. Most of the time, only the headers of a possible implementation are displayed in the text body. The rest is stored as an archive and can be downloaded from here:

Delphi Pascal software

It is emphasized that all (source) code are just rapid prototype programs for demonstration purposes. So please don't expect a plug and play package.

Clearly it is possible to implement a set within the hardware and software of common digital computers in a number of ways. Only four possibilities are employed in this chapter. They are:

a set is a bitmap type
a set is a number type
a set is an array type
a set is a string type

It will be assumed in the sequel that the size of all set implementations can be as large as needed, which is the same as saying that we have a virtually unlimited amount of memory at our disposal. Thus any possible limitations on current (as well as future) digital computer hardware are not relevant.

Let us clarify the above with a few simple examples. Consider the set: $$ \{b,d,d,c,a\} $$ Such a set can conveniently be implemented as an array of objects. In my favorite programming language (Delphi Pascal) it reads:

type
  objects = array of TObject;

A set with the above objects as members might be created as follows:

procedure only_here;
var
  a,b,c,d : TObject;
  S : objects;
begin
{ Urelements }
  a := TObject.Create; b := TObject.Create;
  c := TObject.Create; d := TObject.Create;
{ Set }
  SetLength(S,5);
  S[0] := b; S[1] := d; S[2] := d; S[3] := c; S[4] := a;
end;

But, suppose that we consider instead the following set, containing only natural numbers: $$ \{4,3,9,7,7,1\} $$ Then the same set can also be implemented by a series of bits, a bitmap, where a "bit up" $|$ means that the corresponding element is a member of the set:

    .. 1010011010
       | |  || |
    .. 9876543210

The bits are numbered from the right to the left, starting with zero. Thus bit 0 is down i.e. 0 is not a member; bit 1 is up i.e. 1 is a member; bit 2 is down, i.e. 2 is not a member; bit 3 is up i.e. 3 is a member; bit 4 is up i.e. 4 is a member; bit 5 is down i.e. 5 is not a member; bit 6 is down i.e. 6 is not a member; bit 7 is up i.e. 7 is a member, bit 8 is down i.e. 8 is not a member, bit 9 is up i.e. 9 is a member. The calling sequence for encoding an array of integers as a bitmap could be defined by:

function makeSet(S : integers) : bitmap;

But one and the same bitmap can also interpreted as a number:

    .. 1010011010 (binary) = 666 (decimal)

With natural numbers as members, something typical is observed: in the bitmap representation of the set, both numbers $7$ have only one bit up attached to them. Therefore we must conclude that $$ \{,4,3,9,7,7,1\} = \{9,7,4,3,1\} = \{1,3,4,7,9\} = 666 $$ where the sets on the right hand side are sorted in ascending or descending order. If the curly brackets $\{\}$ are replaced by square brackets $\left[\,\right]$ - what's in a bracket - then another important fact is observed:

a set type is is a natural number type is a sorted array type of natural numbers without duplicates

If now we devise an equivalent of the elementary number operations with sorted arrays, then we have virtual unlimited precision at our disposal. That this approach indeed works, has been demonstrated at hand of a question at the Mathematics (Stack Echange) forum:

What is the *middle* digit of $3^{100000}$?

At last, the set $O = \{b,d,d,c,a\}$ as well as the set $N = \{4,3,9,7,7,1\}$ can be implemented (more or less trivially) as strings. In Delphi Pascal this would read:

  O := '{b,d,d,c,a}';
  N := '{4,3,9,7,7,1}';

But in this form, there is not much useful to it. We shall employ the string definition in a far more restrictive way, which will be introduced later on. We shall also dismiss sets that have no bitmap equivalent.

Definition. The standard implementation of a set, as an array, is an ordered array, consisting of integers greater than or equal to zero. It is a terse set as well, meaning that all duplicates have been removed.

With the above definition, sets arrays can be implemented far more efficiently than in the "general" TObject case. Sorting and wiping out the duplicates with set creation, binary search with member finding, these shall be useful methods to enhance efficiency, indeed. See accompanying source code for details.

In order to be sure not to miss anything, several sections in this chapter are named as chapters in reference [1] : Paul R. Halmos, Naive Set Theory, The book is followed as closely as possible. (But not too closely :-)