[Third draft. Futher formatting changes plus some additions. This article was originally published elsewehere]
Evolution
Let us begin with a narrative... Let us talk about evolution.
Roughly speaking, what we mean by evolution is a generational process that consists of a heritable variation and mutations and selection pressures (on that variation). We can describe a model for the fundamentally complex and powerful effects in this process. But, before that, we must emphasize that the common idea that evolution is a sequence of individual beneficial mutations, like climbing a ladder, is misleading and insufficient. If it were analogous to climbing a ladder, it would have been impossible for evolution to produce the kind of complexity we have today. A far better model is one in which every single mutation that occurs is either neutral or harmful. This is a more plausible model for evolution.
A good place to start is with the common complaint that mutation and evolution cannot create information. This is only valid in the initial mutation phase when a mutation occurs -it simply introduces noise and hence tends to degrade information. But look what happens the moment that mutation gets passed on to an offspring: That mutation is now no longer random noise, it now carries a small bit on information. It carries a little tag saying this is a nonfatal mutation. The presence of this mutation in the offspring is a new and created information, the discovery and living record of a new nonfatal mutation. Over time, the population builds up a library of nonfatal mutations. This library is a vast accumulation of new information.
That information actually undergoes even more processing and synthesis. Over generations, beneficial mutations would obviously multiply, but we are assuming there are none of those here. However, there's nothing stopping entirely neutral mutations from accumulating and multiplying. Furthermore, nearly harmless mutations would also accumulate and multiply but to a lesser extent, as well as those mutations that are somewhat harmful. Then, there will be those mutations that are extremely harmful (but-nonfatal) -these will pop up and disappear at the rarest frequencies.
So, what we have here is a process where we not only build up a library of nonfatal mutations, we also have a library of mutations that are tagged with a frequency, the percentage of the population carrying that mutation etc.. In other words, each mutation contains measurement tag -a cost/benefit information tag at the population level. The best ones have a high percentage representation and the most harmful ones have a near zero representation percentage. Clearly, we have now a library containing far more valuable and sophisticated newly created information.
The individuals in the population are on average going to carry a roughly stable load of harmful mutations, and a roughly constant cost in harmful mutations. Individuals loaded with more than the average cost are generally going to die and remove a more-than-average load of harm out of the population pushing the average up; while individuals with a less than average load will multiply and pull the population average upwards even further. The cleansing effect of selection removing damage from the gene pool will automatically scale to offset the exact rate that mutation is causing damage. Negative effects (high cost, or harm, or damage) will be weeded out by selection at the same rate as it is added by mutation. Neutral mutations will steadily accumulate in the library, and negative mutations will remain at a roughly fixed level constantly measured and scaled by the cost of each. Some mutations will disappear while new ones appear.
The real power in evolution is the recombination. Every offspring contains a random mixture of mutations from that library. Every offspring is a test case searching for a jackpot beneficial combination of mutations. Let us assume an individual has a 1,000,000 random mutations across its entire library. There are 500,000,000,000 mutation-pairs being simultaneously tested within that individual in parallel. Perhaps one is a mutation creating a toxin and another mutation for mutant skin pores. Either mutation alone may be harmful, but the pairing could be breakthrough protecting against predators.
There are 160,000,000,000,000,000 mutation-triples. Each individual is also testing all of these triples in parallel. One mutation might be for a toxin, a second one might accelerate the production of that toxin to fatal levels (which would ordinarily result in a fatal, and hence, an evolutionary dead-end), and the third one might be a costly yet ordinarily useless anti-toxin. The triplet is now a breakthrough, either a powerful defense against predators or a weapon for a predator to use, or even both at once.
Each individual is also testing 40,000,000,000,000,000,000,000,000 mutation quadruples in parallel for free. Maybe those four mutations individually yield useless proteins and enzymes, but the chain of four together may yield a new breakthrough digestive pathway. Each individual also tests mutation pentuplets and mutation sextuplets and more -acts as a test of a near infinity number of possibilities and it does this testing in parallel and it does so for free. This is called implicit parallelism. It multiplies the power of evolution astronomically to search for jackpot breakthroughs.
Another point raised and not actually applied yet is the fact that each mutation is present with a frequency percentage in the population -the measurement of the cost/benefit of that mutation. When you want the most efficient search pattern, you want to minimize wasted effort and minimize your costs while maximizing your return-on-investment for your available resources. Well, each offspring is an investment of resources and a test effort. When you are investing your effort looking for a payoff, you want to expend most of your effort on the mutations that have paid off the best in the past, whereas on the almost-fatal mutations, you will go for those with the least effort. You mostly want to test combinations of good stuff with good stuff, and you almost never want to bother testing two nearly fatal mutations that will most likely combine to cause a dead offspring and a wasted investment.
However, you do still want to make a very rare test of two nearly fatal mutations because it might just be a jackpot payoff. This exact investment-of-effort and search pattern had already been studied and a mathematical optimization pattern found. And, not surprisingly, by an almost staggering coincidence, the evolutionary population frequency on each mutation in the population and in the offspring exactly matches and produces the mathematically optimal and most efficient search pattern for the next generation of offspring. You invest lots of effort and lots of offspring on testing the best mutations and groups of the best mutations and you invest exactly the right level of very rare testing of really bad combinations that will probably be fatal but which might just find a jackpot payoff. Mutations at all levels are tested proportionally to the measured cost it imposes on the host.
So, evolution has a nearly infinite multiplier on its search power and it just happens to invest its search effort in the mathematically optimal most efficient search allocation. Two fairly deep and powerful mathematical results that are hardly apparent in the usual way evolution is explained.
A further point is that once some beneficial mutation or combination of mutations is found, evolution then searches that vast library of stored nonfatal mutations. Most new breakthroughs will be extremely crude at whatever it is they do, and they will probably come with harmful side effects. A set of organs might be mutated into some useful form for getting some new food source, yet be horribly mutated and otherwise dysfunctional. Evolution then searches the library for mutations that combine to further improve that new breakthrough, and it also searches the library for mutations that will repair or offset any harmful side effects of the breakthrough. A search for ways to further improve the mutated organs for the new purpose, and a search for modifications to repair problems caused by these malformed organs.
Evolution is very rarely a simple ladder-climb series of beneficial mutations. Evolution is an information processing system building vast database of information and synthesizing complex measurements of that information and doing an incredibly powerful search and mining of that information database to discover and refine improvements.
And, this fits in perfectly with punctuated equilibrium. During the quiet phase the library is accumulating new mutation contributions and measuring those mutations into a percentage of the population, and then when there is a breakthrough discovered or there is an environment shift then evolution goes into overdrive. It mines the database for contributions to the new development or to adapt to the new environment. The frequencies of all of the mutations also get re-measured to their cost/benefit ratio re-weighted in light of the new development or in the new environment. Not only can this radically shift the frequency of vast portion of the genes and mutations in the population, it can also quite easily trigger the discovery of other independent breakthroughs. If the population underwent a heavy selection pressure, if most of the population was exterminated or displaced by this change, then the gene pool gets decimated. Much of that accumulated library gets wiped out along with the losing majority of the population. With a depleted library in the new population you are naturally going to see little change and progress. You see a stable population, equilibrium, until that library can be very slowly rebuilt through accumulation of new mutations.
Complexity
The above narrative describes a complex and complicated process that is easier to fathom in that form -a narrative. It isn't so easy to express in scientific terms, and it is definitely not very scientific. Why?
There's an indirect answer to that. Let us look at a different but relevant aspect: how we can distinguish a world which can be explained by science from one that cannot? How do we tell whether something we observe in the world around us is subject to some scientific law or just patternless and random?
This question is nothing very new: Leibniz (or Leibnitz), for example, in Discours de M?taphysique, discusses this. Imagine, Leibniz says, that someone has splattered a piece of paper with ink spots, determining in this manner a finite set of points on the page. Leibniz observes that, even though the points were splattered randomly, there will always be a mathematical curve that passes through this finite set of points. Indeed, many good ways to do this are now known. For example, what is called Lagrangian interpolation will do.
So the existence of a mathematical curve passing through a set of points cannot enable us to distinguish between points that are chosen at random and those that obey some kind of a scientific law. How, then, can we tell the difference? According to Leibniz, if the curve that contains the points must be extremely complex (fort compos?e), then it's not much use in explaining the pattern of the ink splashes. It doesn't really help to simplify matters and therefore isn't valid as a scientific law -the points are random (irr?gulier). The important insight here is that something is random if any description of it is extremely complex -i.e. randomness is complexity.
The next person to take up this subject, is Hermann Weyl (The Open World, 1932), points out that Leibniz's way of distinguishing between points that are random and those that follow a law by invoking the complexity of a mathematical formula is unfortunately not too well defined: It depends on what functions you are allowed to use in writing that formula. What is complex to one person at one particular time may not appear to be complex to another person a few years later -defined in this way, complexity is in the eye of the beholder. We need to do better than that.
We can turn to a new field called algorithmic information theory to provide a possible solution for the problem of how to measure complexity. The main idea is that any scientific law, which explains or describes mathematical objects or sets of data, can be turned into a computer program that can compute the original object or dataset.
Say, for example, that you haven't splattered the ink on the page randomly, but that you've carefully placed the spots on a straight line which runs through the page, each spot exactly one centimeter away from the previous one. The theory describing your set of points would consist of four pieces of information: the equation for the straight line, the total number of spots, the precise location of the first spot, and the fact that the spots are one centimeter apart. You can now easily write a computer program, based on this information, which computes the precise location of each spot.
In algorithmic information theory, we don't just say that such a program is based on this underlying theory, we say it is the theory. This gives a way of measuring the complexity of the underlying object (in this case, our ink stains): it is simply the size of the smallest computer program that can compute the object. The size of a computer program is the number of bits it contains: As you will know, computers store their information in strings (sequences) of 0s and 1s, and each 0 or 1 is called a bit. The more complicated the program, the longer it is and the more bits it contains. If something we observe is subject to a scientific law, then this law can be encoded as a program. What we desire from a scientific law is that it be simple - the simpler it is, the better our understanding, and the more useful it is. And its simplicity -or lack of it- is reflected in the length of the program.
In our example, the complexity of the ink stains is precisely the length in bits of the smallest computer program which comprises our four pieces of information and can compute the location of the spots. In fact, the ink spots in this case are not very complex at all. We have added two ideas to Leibniz's 1686 proposal. First, we measure complexity in terms of bits of information, i.e. 0s and 1s. Second, instead of mathematical equations, we use binary computer programs. Crucially, this enables us to compare the complexity of a scientific theory (the computer program) with the complexity of the data that it explains (the output of the computer program, the location of our ink stains). As Leibniz observed, for any data there is always a complicated theory, which is a computer program that is the same size as the data. But that doesn't count. It is only a real theory if there is compression, if the program is much smaller than its output, both measured in bit count of 0s and 1s. And if there can be no proper theory, then the bit string (sequence) is called algorithmically random or irreducible. That's how we define random string in algorithmic information theory.
Let's look at our ink stains again. To know where each spot is, rather than writing down its precise location, you're much better off remembering the four pieces of information. They give a very efficient theory which explains the data. But what if you place the ink spots in a truly random fashion, by looking away and flicking your pen? Then a computer program which can compute the location of each spot for you has no choice but to store the co-ordinates that give you each location. It is just as long as its output and doesn't simplify your dataset at all. In this case, there is no good theory, the dataset is irreducible, or algorithmically random.
In 1936, Alan Turing stunned the mathematical world by presenting a model for the first digital computer, the Turing Machine. And as soon as you start thinking about computer programs, you are faced with the following, very basic, question: Given any program, is there an algorithm which decides whether the program will eventually stop, or whether it will keep on running forever?
Let's look at a couple of examples. Suppose your program consists of the instruction take every number between 1 and 10, add 2 to it and then output the result. It's obvious that this program halts after 10 steps. If, however, the instructions are take a number x, which is not negative, and keep multiplying it by 2 until the result is bigger than 1, then the program will stop as long as the input x is not 0. If it is 0, it will keep going forever. In these two examples it is easy to see whether the program stops or not. But what if the program is much more complicated? Of course you can simply run it and see if it stops, but how long should you wait before you decide that it doesn't? A week, a month, a year? The basic question is this: Is there a test which in a finite amount of time decides whether or not any given program ever halts? As Turing proved, the answer is no.
Now, instead of looking at individual instances of Turing's famous halting problem, you just put all possible computer programs into a bag, shake it well, pick out a program, and ask: what is the probability that it will eventually halt?. Let's call that probability the number Omega. And what is the most surprising property of Omega? It's the fact that it is irreducible, or algorithmically random, and that it is infinitely complex. Let's try to explain why this is so: like any number we can, theoretically at least, write Omega in binary notation, as a string of 0s and 1s. In fact, Omega has an infinite binary expansion, just as the square root of two has an infinite decimal expansion
The question, Why can't we have a TOE (Theory of Everything)?, is now easy to answer. A mathematical theory consists of a set of axioms -basic facts which we perceive to be self-evident and which need no further justification- and a set of rules about how to draw logical conclusions. So a Theory of Everything would be a set of axioms from which we can deduce all mathematical truths and derive all mathematical objects. It would also have to have finite complexity, otherwise it wouldn't be a theory. Since it's a Theory of Everything, it would have to be able to compute Omega, a perfectly decent mathematical object. The theory would have to provide us with a finite program which contains enough information to compute any one of the bits in Omega's binary expansion. But this is impossible because Omega, as we've just seen, is infinitely complex -no finite program can compute it. There is no theory of finite complexity that can deduce Omega.
So, this is an area in which mathematical truth has absolutely no structure, no structure that we will ever be able to appreciate in detail, only statistically. The best way of thinking about the bits of Omega is to say that each bit has probability 1/2 of being zero and probability 1/2 of being one, even though each bit is mathematically determined. That's where Turing's halting problem has led us, to the discovery of pure randomness in a part of mathematics. Turing and Leibniz would be delighted at this remarkable turn of events. G?del's incompleteness theorem tells us that within mathematics there are statements that are unknowable, or undecidable. Omega tells us that there are in fact infinitely many such statements: whether any one of the infinitely many bits of Omega is a 0 or a 1 is something we cannot deduce from any mathematical theory. More precisely, any maths theory enables us to determine at most finitely many bits of Omega.
Based on the incompleteness phenomenon discovered by G?del in 1931, coupled with Turing's later work, as well as those of the more recent ones, the Omega number we reach is a mathematical incompleteness further than the one found by G?del in 1931; and that it, therefore, forces our hand philosophically in a quasi-empirical direction: Due to incompleteness, it is not sensible, even in mathematics, always to demand proofs for everything. It could even be argued that, rather than attempting to prove results such as the celebrated Riemann hypothesis, mathematicians should accept that they may not be provable and simply accept them as an axiom.
In the above, we have seen how we can produce a complex dataset -one that we can not reduce to a simple formula -or something that looks like random information. This dataset could either be a true random dataset (i.e. created without our intervention), or something that we produced (described) but can not simplify. In either case, for any real-life purpose, it can be called a random dataset, one which has irreducible complexity. Now, we are faced with the question of what does this irreducible complexity mean to us?.. Well, it means just that: We are unable to express the information (contained in the dataset) in a scientific manner. Or, in other words, whatever information is in there, it is beyond our capability of scientific expression. Nothing more, nothing less.
So, if we return to our original topic, evolution, what we are left with is this: We can narratively describe the mechanism of it, we can philosophize about the process; but when it comes to simplifying it so that we can express it scientifically, we fail. It is just too complex for us.
And here, by it, we have to mean two things: The process and the product of that process.
Inferring design out of complexity
I think I should give a simple example for how we –mortal humans– can set up a relatively well-designed processes that produce products that contain complex information.
Consider the art of 'paper marbling' [see Wikipedia page here] where we can control almost everything, starting from how big the tank is, all the way to the dye colors and water temprerature. Yet, the product contains complex information –we cannot produce the exact same thing ever again.
In this case, what we have is a designed process; not the product.
The fact that we can tell the process itself was designed is simply because we designed it.
I mean, a piece of paper might simply have been landed flat on a tank of water with some floating dyes. The result would probably be quite similar. Thinking about it, while I have no way of knowing, I would hazard a guess that this is how the 'art of paper marbling' must have come about. Which means, the very first original sample was not a product of designed process.
So.. how do we tell whether an item that contains complex information isthe product of a designed process ?
To highlight how hard this is, let's look at man-made diamonds issue –a multi-billion dollar business. After years of denial, the mining cartels have had to give in, simply because they had no scientificly sound way of knowing whether the diamond they were looking at was a 'natural' one or it came out of a man-made (designed) process. The only way to tell is only if they already know that it is a product of a man-made process.
In other words, the simple answer has to be: Whether we can emulate the process or not, we cannot –with any scientific certainty– tell if the item (that contains complex information) is a product of a designed process.
The only way we can tell is only if we have designed the process; or, alternatively, if we are absolutely certain that it was a designed process. This is the case of 'reducible complexity' –somewhat an oxymoron, since –by definition– it is not complex anymore.
In all other cases, i.e. when we do meet with a case of 'irreducible' complexity, it (i.e. the process or the product) really tells us nothing more –nothing more than the fact that we have encountered 'irreducible' complexity.
It would be nice if we could equate complexity (irreducible or otherwise) with, say, design (intelligent or otherwise); but, we cannot. Irreducible complexity is random data –as far as science is concerned.
Consequently, it would hardly be meaningful to attempt to infer either its content or the origin or its designer or creator, or, indeed, that it is a product of design.
Notes:
For further details of Omega number please see G. J. Chaitin's home page which I have made heavy use of. Although not directly related, an interested reader is invited to read the article in New Scientist; The Omega Man
Definition of 'irreducible complexity', as made by Michael Behe, for the purposes of Intelligent Design, can be found here [a copy of which is below].
By irreducibly complex I mean a single system composed of several well-matched, interacting parts that contribute to the basic function, wherein the removal of any one of the parts causes the system to effectively cease functioning. An irreducibly complex system cannot be produced directly (that is, by continuously improving the initial function, which continues to work by the same mechanism) by slight, successive modifications of a precursor system, because any precursor to an irreducibly complex system that is missing a part is by definition nonfunctional. An irreducibly complex biological system, if there is such a thing, would be a powerful challenge to Darwinian evolution. (p. 39)
On the subject of irreducibly complexity, a Wikipedia page is also available.