An Experiment in Evolution

Software developers have a nice advantage in life: they can create a world on their computers, in which they are a god, setting the environment, the rules and the behavior of their programs. A skilled and experienced software engineer can create games and simulation. While the games are primarily for entertainment, the simulations are tools by which one can study the real world. Of course, they are useful and true only as much as they accurately follow the laws of the real world.

About a year ago I had an idea on how to simulate the process of Darwinian evolution. This is not a first. Many people have tried it, including the famous atheist, Richard Dawkins. His simulation of evolution suffers of a conceptual problem, which many people have noticed, and I will explain below.

Dawkins’ simulation

His idea was quite simple. The experiment “proved” the power of Darwinian evolution by attempting to take a scrambled phrase, in which the original letters are rearranged in a random order, representing an unevolved system from where evolution starts.

The point of the experiment is to see if Darwinian evolution can find the unscrambled phrase

  • METHINKS IT IS LIKE A WEASEL

A possible character string to be used as a starting point may be

  • MWR SWTNUZMLDCLEUBXTQHNZVJQF

Now the evolution starts, which is based on random mutations and selection of the fittest. A random mutation consists of a number of random permutations of the original string. Fitness is defined as similarity to the target phrase (“methinks it is like a weasel”). A selection process eliminates the resulting strings which are far from the target and retains the ones which are closest to the target. In the next step (or generation) the process is repeated. After ten generations, the string looks like

  • MDLDMNLS ITJISWHRQEZ MECS P

After thirteen generations the string becomes

  • METHINGS IT ISWLIKE B WECSEL

Finally, after about forty generations, evolution finds the target represented by the string (1). According to Dawkins, this proves the power of the evolution to create a fit result after just a few generations of random mutations and selection of the fittest.

Is there anything wrong with Dawkins’ simulation experiment? Yes, there is. Darwinian evolution does not have a target, which makes this simulation unrealistic.

At the center of any evolution simulation is the meaning of “fitness.” In Darwin’s theory of evolution, reptiles did not evolve to birds because “bird” was a target. Rather, reptiles went through mutations and selections based on fitness parameters which have no target. If a reptile was able to make big jumps and eventually fly, it might have gained an advantage in finding and collecting food and being protected from predators.

The concept of fitness has some other problems in the current theory of neo Darwinian evolution. What does fitness mean? It is generally defined as the capability to survive, reproduce and leave many descendants. To make it short, let’s call this just survivability.

Now the definition of fitness becomes

“An organism is fit if it survives and leaves descendants.”

… but when does an organism survive?

“An organism survives because it is has fitness.”

Please notice the circular character of these definitions. Fitness is survivability. Survivability is fitness.

My simulation: SortWorld

My simulation avoids Dawkins’ mistake. In my simulation there is an environment, (digital) organisms, random mutations and selection based on survival of the fittest. There is a calculated fitness, without any preordained target. Organisms evolve based on objective fitness (see the definition of fitness below) and in each generation the most fit organisms survive.

I will give the description of the SortWorld, attempting to avoid going into technical details. Such technical details can be made available if anybody is interested in them.

In the SortWorld the digital organisms are algorithms (or programs) capable to sort lists of numbers. For example, if one has the list L1 = {3,5,4,1,3}, a sort algorithm will change it to L2 = {1,2,3,4,5}, in which the numbers appear in ascending order.

It is known in Computer Science (kind of lesson 101) that there are many sort algorithms, some better or more efficient than others. A sort algorithm is better than another if it can perform the sorting with fewer operations. There are two classical sort algorithms, called Bubble Sort and Quick Sort. Quick Sort is generally much more efficient than Bubble Sort. It is possible that on a particular list Bubble Sort does better, but across all lists, Quick Sort is superior.

If you read the description of the SortWorld below, please notice that, unlike the Dawkins simulations

  • Evolution process has no target
  • There is an objective fitness function
  • There is an environment and organisms “living” in this environment.
  • Like in the biological world, organisms must do something to survive and live descendants (namely to sort lists)
  • Organisms are complex

The objects of SortWorld

(If this seems too technical, you may skip this section.)

Environment

The environment of SortWorld consists of lists of numbers to be sorted. To be more exact, I considered not just lists, but “list sets” each containing one or more (for instance one hundred) lists.

Organisms

The organisms of the SortWorld are programs which are more or less capable to sort lists from the environment. One may look at the sorting process as metabolism.

Random mutations

The mutations consist of adding, changing and deleting instructions from the existing programs. The mutations are of such a nature that they do not destroy the syntactical integrity of a program. For instance, the instruction x = y + 5 may become x = z – 3, but not x??3=.

Fitness

An organism is more fit than another if it is capable to sort a list in less operations then the other. Thus, if program P sorts the list in 30 swaps of the list elements, it is more fit than a program Q which accomplishes the sorting in 100 swaps. To complicate a little, the lists do not have to be perfectly sorted. Fitness takes it into account, such that a program which perfectly sorts a list is more fit than one which only partially sorts it, as for example generating the list (1, 2, 3, 5, 4).

Reproduction

Each program can generate a number of descendants (in my experiments usually about 100), which are copies of the parent, but with slight small mutations.

Selection

In each generation the most fit programs (capable to more efficiently sort the list sets) are selected, while the others are discarded.

Simulation parameters

I can run the evolution process as many times I want, with different parameter. I can vary

  • The number of lists to be sorted in a list set
  • The number of descendants of each organism
  • The number of generations
  • The number of survivors in each generation
  • The starting point (usually a functional sorting program)
  • The mutation schema, which defines the probabilities of various types of mutations (changes, insertions or deletions of program instructions)

Based on these varying parameters, I run a number of experiments. The big question of this exercise is to find if positive evolution is possible. In a more concrete form, the question is this:

Can a Bubble Sort program (with lower efficiency) evolve into a program as efficient (or almost as efficient) as a Quick Sort?

SortWorld results

I want to emphasize from the beginning that processes in SortWorld are based on some random variables. For instance, there is neither predetermined order of mutations, nor predetermined places in which a program is modified by them. This means that even when running with the same parameters, different runs with the same parameters may result in different end points. However, running many times with the same and different parameters, I have obtained some consistent results.

I would like to first present some intermediate results, leaving the conclusions to the very end of this article.

For simplification, I will describe here a pivot scenario from which I constructed other scenarios, with small variations from the pivot. The pivot scenario is defined by the following parameters:

Case 1: L=5 M=3 D=10 S=5 G=300

Environment: Lists to sort5
Mutations per generation3
Descendants of each organism10
Survivors per generation5
Generations300

It is interesting to observe a number of characteristics which emerged from evolution.

Fitness

Figure 1: Evolution of fitness

You may notice that fitness continues to increase until about generation 200, after which it becomes flat.

Failures

As five survivors produce ten descendants each, some of these descendants fail (they die). The reason is that some of the resulting organisms crash while performing the sorting. Some crash because they enter infinite loops, some because they attempt to divide by zero. However, there are enough survivors to continue the process.

Figure 2: Evolution of failures

Program size

Mutations add instructions (genes), change instructions and delete instructions. Deletions are generally dangerous so I calibrated the mutation scheme to produce more changes and insertions. Unfortunately this leads to a growth in program size. Moreover, the many extra instructions (genes) are mostly neutral, having no consequences for the fitness of the program. Here is, below, how the program size evolved in this scenario:

Figure 3: Evolution of program size

Performance of survivor across other environments

This is really the most interesting and telling graph.

The X axis does not represent numbers, but environments. They are denoted as Lx (L1, L5, L20, L100) where x indicates the number of lists to be sorted. For example, L20 indicates a set of 20 lists to be sorted.

The Y axis indicates the fitness of a particular sorting algorithm corresponding.

The three curves show the fitness of three particular sort programs in each if the four environments considered. They are represented as follows:

Current Scenario Best Survivor:  Black continuous line

Bubble Sort:                                       Blue dotted line

Quick Sort:                                           Red dashes line

Figure 4: Fitness comparison

As one can observe, after evolving through 300 generations, the best survivor of evolution compares with the Bubble Sort and Quick Sort as follows:

  • On a 1-list environment, the survivor of evolution has a lower fitness than Bubble Sort and Quick Sort.
  • On a 5-lists environment, the best survivor does better than Bubble Sort, but less than Quick Sort. The reason is that the best survivor is the result of evolution in a 5-list environment.
  • On a 20-lists environment and a 100-lists environment, the best survivor has a lower fitness than Bubble Sort and Quick Sort.

Conclusions

Although I have only shown one scenario here, results on various scenarios show consistent results, from which we can draw certain conclusions.

  1. In a particular environment consisting of a number of lists to be sorted, there is a real evolution of the initial program, as shown in Figure 1.
  2. Since the mutations are random, some lead to failures, as shown in Figure 2.
  3. During the evolution process, the programs (digital organisms) accumulate a lot of redundant instructions, which do not necessarily degrade the performance (although they may) but are in must cases unnecessary for the goal of sorting the lists. This is shown in Figure 3.
  4. Although the fitness of the organism improves in a particular environment, it generally degrades in other environments. This can be seen in Figure 4.
  5. Figure 4 also shows that if the evolution started from the original Bubble Sort organism, it never reaches the efficiency of Quick Sort. The Quick Sort program is different in structure from the Bubble Sort program, which was the original program in the evolution process. The evolution was not capable to change the structure of the program to reach the Quick Sort fitness.
  6. A look at the code of the best survivor shows that the increase in fitness is based on SPECIALIZATION ON A PARTICULAR ENVIRONMENT. For instance, if a list is {1,2,3,5,4}, the best survivor learns that it does not have to bother with the first three numbers {1,2,3}, and it is sufficient to sort the last two {5,4}. However, this strategy of skipping the first three, does not work in another environment, for instance in the list {3,2,1,5,4}.

Further discussion

The results of this experiment have an almost perfect parallel in the natural world. Let’s consider the following parallelism:

We replace the Bubble Sort with a specie of apes.

We replace the Quick Sort with a specie of birds.

Both species feed on certain fruits found in trees.

There are three types of environments:

E1 – trees very close to each other, but with a large variety of heights. The apes must be capable to climb high trees and perfectly jump to trees of lower height.

E2 – trees of approximate same height, but far from each other. The apes have less to climb, but must be able to make very large horizontal jumps.

E2 – a random combination of E1 and E2.

The six conclusions listed above will sound like this:

  1. Apes which live in either E1 or E2 will become more efficient if they live for a long time in E1 or E2. This is adaption which is observed in nature and which is predicted by the theory of evolution. We may call this micro-evolution, since it consists of small changes.
  2. If the apes suffer random mutations, some of these mutation lead to diseases or simply make some apes non-viable. This is consistent with observations in nature, however in my experiment I have a larger number of failures than expected.
  3. The apes which evolve in a particular environment will accumulate junk DNA. It was believed for a long time that most of a species’ DNA consists of junk (i.e. non-functional code), but in recent years it was found that very little of the DNA is really junk, most of it having some function or another. Since my simulation is based on Darwinian evolution, it follows that Darwinian evolution does not correctly predict the amount of junk DNA. It should be much higher than observed in nature.
  4. It is expected that apes which adapt to E1 (tall trees very close to each other) will be less adapted to E2 (trees of about the same height, but far from each other). In the fist case, apes will be able to climb very well and jump from high trees down, but not make large jumps between more distant trees. In the same way, apes adapted to E2 will be less fit for E1.
  5. Birds are capable to raise to the top of the highest trees and move without difficulty between trees that are distant from each other. Apes will never be able to reach the same ability.
  6. Adaptation to a particular environment, E1 or E2, results in less capability to survive in the other. For instance, in E1 the apes do not have to make very long horizontal jumps, and over many generations will lose this capability. Also, apes in E2 do not have to climb or jump down from very high trees and land graciously, and in many generations will lose this capability.

The final conclusions are these:

Conclusion 1: Microevolution does happen. It is observed in nature, and it is observed in my simulation.

Conclusion 2: Microevolution with adaptation to a particular environment may result in loss of function. As Michael Behe observed, you can make a car more fuel efficient by throwing out the radio and the air conditioning. It becomes better in one dimension of fitness (fuel efficiency) but worse in another (comfort). It is improvement with loss of function. A species which adapts to a particular environment becomes less adapted to other environments.

Conclusion 3: Species do not evolve into other species. Evolved Bubble Sort programs do not become Quick Sort programs, which are based on a different plan. Neo Darwinism assumes but cannot demonstrate that reptiles or mammals can evolve into birds. Evolution can improve marginal characteristics (color, size, speed) but not change body plans.

One may observe that evolution in the SortWorld follows exactly the rules of Darwinian evolution as it is believed to happen in the biological world (let’s call it BioWorld).  Of course, one may argue that in BioWorld there are some other factors which I did not include in SortWorld.

Let’s assume that BioWorld contains a Factor X which makes macro evolution possible. Until we know what this Factor X is, the SortWorld is a perfect parallel to the BioWorld, and any conclusions based on observing evolution in the SortWorld should also apply to the BioWorld.