More Collatz Fiction & Facts

This is an outstanding unsolved SPAM(*) anyway. And yes, I have found kind of a "solution" for the problem, though it may seem much less satisfactory than most people would like it to be. The gist of my "solution" is that the (3x+1) Conjecture is not a Theorem within the whole number system, at all. Instead, it's just a Statistical Property, which has a multitude of mappings, even within the real numbers, that is: outside the integers.
Maybe this is all that has to be said about the infamous Conjecture: that it is not very interesting, in the end. Herewith I mean that its contribution to Number Theory may be of NULL value. But Probability Theory is another story. probably ;-) Yes, I think the major hurdle to be taken is the idea: that the Theory of Probability can yet be used as a proper description, of processes which are quite Deterministic in nature ! (Meanwhile claiming that I am Not the first one who says this)

*) Sample of Pure Applicable Mathematics

Modified definition

To begin with, the definition of the problem (by Eric Roosendaal) will be modified slightly, as follows:
For any positive integer N a sequence Si can be defined
by putting
     S0 = N
and for all i > 0: 
     Si = Si-1 / 2             if Si-1 is even
     Si = (Si-1 * 3 + 1) / 2   if Si-1 is odd
Division by 2 in the 'odd' case eliminates the fact that any subsequent value in the sequence is necessarily an even number and hence eliminates trivial 'even' steps.

Statistical Approach

Suppose that the number N is completely arbitrary, essentially meaning that it is not known. Now consider an arbitrary large set of subsequent natural numbers, for example the collection:
   1  2  3  4  5  6  7 ... 132  133  134 ... 123421412  
And suppose the number N is in such a set, called S . If the number of elements in S is T, then there are two possibilities:
   If T = even then the number of even integers in S is T/2
                and the number of odd integers in S is T/2
   If T = odd  then the number of even integers in S is (T-1)/2
                and the number of odd integers in S is (T+1)/2
Now suppose that the number of elements in S is very large. Then we may define the probability P that an integer N in S will be even or odd:
   If T = even then P(even) = (T/2) / T = 1/2
                and P(odd) = (T/2) / T = 1/2
   If T = odd  then P(even) = ((T-1)/2) / T -> 1/2
                and P(odd) = ((T+1)/2) / T -> 1/2
In the limiting case, of an infinitely large set S, we find, quite as expected:
     P(even) = P(odd) = 1/2  
And this property does not change, no matter how large the set S will become. Such an limiting case - an almost infinitely large set S of contiguous natural numbers - could easily be renamed to: THE NATURAL NUMBERS. For who on Earth can possibly explain me what the difference would be with a "truly" infinite set of natural numbers ?

Cantor's Counter-Intuition

The
above means that there are "as many" even numbers as there are odd numbers. But it also means that Half of the natural numbers would be Even (and odd for the other half, of course). Quite in agreement with Common Sense.
But "NO", says the Pure Mathematician. The following reasoning has been coined up by Cantor, and has been swallowed quite uncritically ever since, by the entire mathematical community. A one-to-one relationship (bijection) is assumed to exist between all Natural numbers and all Even numbers:
         1  2  3  4  5  6  7  8  9 10 11 12 ...   
         2  4  6  8 10 12 14 16 18 20 22 24 ...   
I would say now: Give me more, give me more ! But no, these theorists have already drawn their "inevitable" conclusion: the cardinality of the set of all natural numbers must be equal to the cardinality of the set of all even numbers. In layman's terms: there are as many even numbers as there are natural numbers. And this is Wrong !
Let's re-create earthly circumstances in the first place. Instead of Cantor's Actual Infinity, which can only exist in the Heaven of Mathematics, there is only Potential Infinity, kind of "two times larger every time again". Re-consider now a very large, but nevertheless finite set of contiguous natural numbers. Then it is very clear that the number of even integers in that set is approximately half the total number of integers. And this approximation becomes better as the set becomes larger. This is the common way in which limiting cases are explored, everywhere in mathematics where it has accomplished its great success stories, such as in differential and integral calculus. But when it comes to number theory, all these wise lessons seem to have been forgotten. Thus rendering further fruitful mathematical research virtually impossible ! So please, let us finally speak the Truth and only but the Truth:

      There exist precisely Half as many Even numbers as there exist Whole numbers.

But Han, what are you doing ?! Don't you understand that so many great results in Mathematics will no longer be valid, if you dare to take such a position !
My answer is simple: this only proves that sound scientific reasoning gradually has been replaced by intimidation. I have been involved with mathematics for 40 years. And I have never encountered any significant theory which could not have been developed without the Cantor doctrine. But yes, some of the "results" which have been rendered with help of Cantor's methodology certainly will become unavailable. Such as the (in)famous Continuum Hypothesis. As a physicist, I find it utterly incredible that the whole discussion about Transfinite Numbers is not treated on the same footing as the stories about Flying Saucers. But NO ! Is it a coincidence that at least one of the board members of the sceptics organization in Holland is a mathematician ? By creating another smoke screen, it seems, they are effectively preventing the de-mystification of their own profession.

Resuming Statistics

Having said all this, consider the 'even' step in the Collatz algorithm:
     Si = Si-1 / 2  
This results in:
   Si-1 = 2  4  6  8  10 ... 132  134  136  138  ...
     Si = 1  2  3  4   5 ...  66   67   68   69  ...   
Hence the probability that the result of the division will become even is equal to the probability that it will become odd. Both probabilities are equal to 1/2.
Rewrite the 'odd' step in the Collatz algorithm as follows:
     Si = Si-1 + (Si-1+1)/2   
This results in:
        Si-1+1 = 2  4  6  8  10 ... 132  134  136  138  ...
    Si = odd  +  1  2  3  4   5 ...  66   67   68   69  ...
Hence the probability that the result of the addition will become even is equal to the probability that it will become odd. Both probabilities are equal again to 1/2.

Summarizing. Given an arbitrary (that is: unknown) positive integer, which is either even or odd with a probability of 50 % , then any step in the Collatz process will generate another positive integer, which is even or odd with the same probability, namely 50 % . Numbers are enlarged by the odd steps, while they are diminished by the even steps.

Infinite Loops

It is not known (yet) whether infinite loops can actually occur with the original Collatz sequence (with exception of the trivial loop (1,2)). But if we modify the procedure a little bit (: 5 instead of 1), then non-trivial closed loops definitely can occur:
     S0 = N
and for all i > 0: 
     Si = Si-1 / 2             if Si-1 is even
     Si = (Si-1 * 3 + 5) / 2   if Si-1 is odd
We find, for example:
  19  31  49  76  38  19  
Which from now on will repeat itself.

Not every modification of the (3x+1) problem necessarily has the property that it will loop somewhere. Let us consider, for example, the sequence:

     S0 = N
 and for all i > 0: 
     Si = Si-1 / 2             if Si-1 is even
     Si = (Si-1 * 3 + 3) / 2   if Si-1 is odd
It can easily be shown that this "(3x+3)" sequence is, in fact, completely equivalent to the original Collatz problem, where every number in the Collatz sequence corresponds with 3 times this number in the (3x+3) = 3(x+1) problem.

Some Lemmas

Let O = the number of 'odd' iterations and D = the total length of the sequence.
An arbitrary (part of a) Collatz sequence can be written in the form:
     SD = PD . S0 + QD
with PD = 3O / 2D
While QD is defined by a (3x+1)-like algorithm:
     Q0 = 0
and for all 1 < i ≤ D : 
     Qi = Qi-1 / 2             if Si-1 is even
     Qi = (Qi-1 * 3 + 1) / 2   if Si-1 is odd
The proof of this Lemma is by complete induction to D . The Lemma is certainly true for D = 0, since:
     S0 = P.S0 + Q   where  P = 30 / 20  and   Q = 0
Suppose that the Lemma is true for a sequence of length D. Then for D+1 and for odd numbers:
     SD+1 = ( 3 ( PD.S0 + QD ) + 1 ) / 2 = 3O+1 / 2D+1 . S0 + (QD.3+1)/2
And for even numbers:
     SD+1 = ( PD.S0 + QD ) / 2 = 3O / 2D+1 . S0 + QD/2

Let O = the number of 'odd' iterations, D = the total length of the sequence and N = S0 the number the sequence starts with.
Then, for an arbitrary (part of a) Collatz sequence, the following holds:

    SD < 3O/2D ( N + 2D-O )
It is easy to see that Q in the previous Lemma has a maximum value, which occurs in sequences of the form eeeee ... ooooo, where e are even steps and o are odd steps. Experimentally, we find that this maximum value of Q is given by:
   Qmax = 3O/2O - 1  
It's easy to verify this by complete induction. The Lemma is obviously true for O = 1, since, starting with Q = 0:
   Qmax := (3*0+1)/2 = 1/2 = 31/21 - 1
Suppose that the Lemma is true for O, then for O+1 we find:
   Qmax = (3*(3O/2O-1)+1)/2 = 3O+1/2O+1-1
Rearranging terms gives for the maximum value of S:
    SD = 3O/2D.N + 3O/2O - 1 <  3O/2D ( N + 2D-O )

Real Valued Bound

From the above Lemma, we may conclude that an arbitrary but finite (part of a) Collatz sequence has always another, real valued, sequence as its Upper Bound. For define the other sequence by:
     R0 = N + 2D-O
and for all i > 0: 
     Ri = Ri-1 / 2       if Si-1 is even
     Ri = 3 * Ri-1 / 2   if Si-1 is odd
The the outcome after D steps will be:
   RD = 3O/2D ( N + 2D-O ) > SD
It's possible to construct a much tighter Upper Bound by traversing the Collatz sequence in reverse order:
   y = (3.x + 1)/2  ==>  x = (2.y - 1)/3 < 2.y/3
           y = x/2  ==>  x ≤ 2.y
Resulting in the following starting value for the bounding Real Valued sequence:
   R0 = 2D/3O   
Any sequence of integer numbers may thus be conveniently bounded by a similar one of real-valued numbers. We have already seen, in addition, that the choices 'odd' and 'even' are in fact random choices, with equal probabilities.
The convergence behaviour of Collatz sequences is thus mimicked by a procedure where random decisions are made, for example by generating random numbers in the interval (0,1) and setting up the following algorithm, for some real and sufficiently large number S:
   R := Random;
   if R > 0.5 then S := S + S/2;
   if R < 0.5 then S := S - S/2;  
This can even be generalized to arbitrary 0 < p < 1:
   p := Random;
   while true do
   begin
     R := Random;
     if R > 0.5 then S := S*(1+p);
     if R < 0.5 then S := S*(1-p);
   end;   
A sloppy argument already reveals that, in the long run, there will be as many even steps as there are odd steps, on the average. Thus the behaviour of S is roughly described by:
    S := (1+p)D/2(1-p)D/2.S = (1 - p2)D/2.S   
Which converges to zero, given the fact that (1 - p2) < 1.

Binomial Distribution

A definite advantage of the Real number sequences, associated with the integer Collatz sequences, is the fact that their convergence behaviour is solely determined by the factor 3O/2D :
   RD = 3O/2D.R0  
And we know that the accompanying Collatz sequence is entirely below it. Hence if the Real number sequence converges, then the Collatz sequence must also converge.
Now the convergence of the Real number sequence is only dependent upon the factor 3O/2D, that is: quite independent on the precise layout of the even/odd sequence. Thus we may consider all even/ odd sequences with just the same number of even/odd choices as equivalent. Then we may consider the Probability P that an even/odd sequence of length D contains a number of O odd choices. It is easy to see that this gives rise to a Binimial Distribution:
   P(O) = D! / ((D-O)!.O!).(1/2)O.(1-1/2)D-O
        = D! / ((D-O)!.O!) / 2D   
Mean value and variance of this distribution are given by:
   <O> = D.1/2 = D/2  ;  <O2> - (<O>)2 = D.1/2.(1-1/2) = D/4   
The Binomial Distribution may be approximated well by a Normal Distribution for moderate and large values of D. Consider values of O/D instead of O, thus restricting the independent variable to the interval [0,1]. Then the mean value becomes 1/2. And the spread of the Gaussian curve is determined by the square root of the variance, which now becomes sqrt(D/4) / D = 1/(2 sqrt(D)). We see that the bell shaped curve will converge to a sharp peaked "delta function" in the long run, that is for large values of D. On the other hand, we can see that concergence of the real valued sequence will be guaranteed in the long run if, at a certain moment:
   3O/2D.x < x  ==>  3O/2D < 1  ==>  O.ln(3) - D.ln(2) < 0  ==>  

O/D < ln(2)/ln(3) = 0.630929753571457

This is the same as demanding that the starting value of the real valued upper bound sequence should be a number which is greater than 1 :
   2D/3O > 1  <==>  3O/2D < 1   
When combined with the above, this means that it will be increasingly difficult to find sequences where the ratio O/D of the odd and the total number of transitions differs much from the mean value 1/2. The Convergence Criterion will become more and more satisfied as the size of the sequences grows to infinity. This could be written as:
      O/D < ln(2)/ln(3) = 0.630929753571457 >> 0.5  

Endless Loop, in the end ?

An escape from this "reality" is remaining, though, because we are not dealing with real but with integer numbers, in the end. While such is highly improbable for a real number (due to round-off errors), an integer number can "find itself" again - quite by coincidence, of course (;-) - exactly as it is. Resulting, finally, in an Endless Loop. Such endless loops are quite common, even with small variations on the problem: (3.x+5) , (3.x+7) , (3.x+11) , (3x+13) , (3x-1) . And we find even more of them by allowing negative numbers with the original (3x+1) sequence. But  . . . does not already the original sequence end in an infinite loop, namely: 2 1 2 1 2 1 . . . ? Giving rise to the following Note. Any trivial sequence, with length T :
    1  2  1  2  1  2  1  2  1  2  ...  
can be used for extending an arbitrary Collatz sequence to one of predetermined length D+T. According to:
     3O/2D ==> 3O+T/2/2D+T = 3O/2D . (3/4)T/2     if T is even
          ==> 3O+(T+1)/2/2D+T = 3O/2D . 2.(3/4)(T+1)/2    if T is odd   
Meaning that all sequences are contained in the collection of sequences with large size D, maybe so large that they are approaching infinity. Then for the smaller sequences, it becomes even more certain that O/D is close to the value 0.5 .

Accompanying Software

Please remember: anything free comes without guarantee !
Usage: Make a directory. Unzip in that directory and Run the 'exe'.