Home » Evolution, Informatics, Intelligent Design » “Life’s Conservation Law: Why Darwinian Evolution Cannot Create Biological Information”

“Life’s Conservation Law: Why Darwinian Evolution Cannot Create Biological Information”

Here’s our newest paper: “Life’s Conservation Law: Why Darwinian Evolution Cannot Create Biological Information,” by William A. Dembski and Robert J. Marks II, forthcoming chapter in Bruce L. Gordon and William A. Dembski, eds., The Nature of Nature: Examining the Role of Naturalism in Science (Wilmington, Del.: ISI Books, 2009).

Click here for pdf of paper.

1 The Creation of Information
2 Biology’s Information Problem
3 The Darwinian Solution
4 Computational vs. Biological Evolution
5 Active Information
6 Three Conservation of Information Theorems
7 The Law of Conservation of Information
8 Applying LCI to Biology
9 Conclusion: “A Plan for Experimental Verification”

ABSTRACT: Laws of nature are universal in scope, hold with unfailing regularity, and receive support from a wide array of facts and observations. The Law of Conservation of Information (LCI) is such a law. LCI characterizes the information costs that searches incur in outperforming blind search. Searches that operate by Darwinian selection, for instance, often significantly outperform blind search. But when they do, it is because they exploit information supplied by a fitness function—information that is unavailable to blind search. Searches that have a greater probability of success than blind search do not just magically materialize. They form by some process. According to LCI, any such search-forming process must build into the search at least as much information as the search displays in raising the probability of success. More formally, LCI states that raising the probability of success of a search by a factor of q/p (> 1) incurs an information cost of at least log(q/p). LCI shows that information is a commodity that, like money, obeys strict accounting principles. This paper proves three conservation of information theorems: a function-theoretic, a measure-theoretic, and a fitness-theoretic version. These are representative of conservation of information theorems in general. Such theorems provide the theoretical underpinnings for the Law of Conservation of Information. Though not denying Darwinian evolution or even limiting its role in the history of life, the Law of Conservation of Information shows that Darwinian evolution is inherently teleological. Moreover, it shows that this teleology can be measured in precise information-theoretic terms.

  • Delicious
  • Facebook
  • Reddit
  • StumbleUpon
  • Twitter
  • RSS Feed

197 Responses to “Life’s Conservation Law: Why Darwinian Evolution Cannot Create Biological Information”

  1. Atom, with regards to Shannon information:

    Shannon info is a relative measure based on epistemic probability. It measures the reduction of uncertainty in the receiver, so it’s explicitly relative to the receiver’s prior knowledge.

    The active info framework, on the other hand, attempts to provide absolute measures by regressing probabilities to a point of no prior conditions. But it can’t do so, because an ultimate, unconditional search space is an undefined search space, and there’s no way to derive probabilities from it.

  2. Joseph (#153):

    You don’t need Dembski to respond.

    All you need to do is take something that is alleged to have active information and show it can arise via nature, operating freely.

    I could be wrong here, but IF I read Dembski and Marks correctly, then if active information was found to arise “via nature, operating freely”, then this would, in fact, have been caused by intelligence smuggling the information in somehow.

    IF this is the case, then I guess that the oft repeated argument that ID would be falsified if natural processes could produce CSI is wrong.

  3. R0b,

    I agree that there may have been some talking past points. I’ll try to get this back on track.

    I thought you had expanded the higher order space to include different search strategies (this wasn’t clear to me), so if I mischaracterized your argument, I apologize.

    Let us assume an evolutionary strategy is a given.

    We will further assume a base search space, which will be the permutation space of all 24 letter long base 27 (per our alphabet) strings. We agreed this space had 27^24 elements.

    Now our evolutionary strategy will have to use a fitness function (following the standard implementation of an evolutionary search). What is the search space of possible fitness functions?

    First, we’d want to use deterministic functions that assign only one fitness value to each element in our original space. Furthermore, we want to limit the number of fitness functions, which we will do in two ways. First, we limit the function to only take as inputs the elements of our original search space (in other words, the domain is all x such that x is a permutation in our original set.) Secondly, we will limit the possible output values of the function to integers between 0 and n, so that our search space becomes well defined. This I will label Reduction A.

    Given this set-up, we can now calculate the informational cost of choosing one fitness function from that new set in a straight forward manner. (Call this Reduction B.)

    But the question becomes, if I understand you correctly, why do we include the informational costs of Reduction B and not of Reduction A (which is infinite)? More importantly, if we can ignore the informational cost of Reduction A, why can’t we also ignore the cost of Reduction B?

    If this is not you position, then please clarify, because I have misunderstood you.

    If so, I will reiterate my earlier response. Reduction A is the reduction from the set of all possible fitness functions (setting n to ∞, effectively) which as you correctly point out is a reduction of an infinite set of possibilities to a finite set, which would incur an infinite informational cost.

    But as I correctly pointed out, Reduction A does not improve search performance over blind search. Showing this is easy. Imagine you perform your search using fitness function 1 of your reduced set (assuming that you can order the fitness functions in our reduced set, which you can), then use fitness function 2, then fitness function 3, etc, until you’ve performed the same search using all of the fitness functions. You then average the performance of all the functions and will find that your evolutionary strategy performed only as well as blind search.

    So if Reduction A results in a subset that still only performs as well as blind search, either a) the reduction didn’t improve search performance, and so incurs no informational cost, or b) it did improve search performance, which means that the original set of all possible functions (the set prior to Reduction A, which is infinite) somehow performs worse than blind search. But since that set includes all possible fitness functions, it will perform as well as blind search, per the NFL theorems. (I could be wrong on this point, since my understanding of the NFL isn’t as strong as some of the other commenters, but I think the NFL would apply in this case as well.)

    So if Reduction A didn’t improve search performance, it is irrelevant to our calculation. We can go further than Dembski (I believe) and define our higher order search baseline as the smallest set that 1. assigns a value to each and every permutation in the original space and 2. still, when averaged, performs only as well as blind search. That would be an objective baseline to measure our subsequent reductions from.

    If you find this disagreeable, then please show a reduction from a set that performs as well as blind search to one that performs better, and show how this reduction does not incur an informational cost of at least the active information.

    Atom

    PS You are correct in your point on Shannon info about reduction in receiver uncertainty. However, I don’t think you understood my larger point, being that we could inflate any uncertainty/probability/information measure, even that of an observer, by including irrelevant reductions. (What about the reduction for the receiver to limit him to his current state, from all possible messages he could have expected, to just a few? We only consider the issue from a baseline, being defined for us in the Shannon case as the receiver’s current state of uncertainty, but implicitly defined for us in the Dembski case, using the criterion I outlined.) Regardless, it was a side issue which isn’t necessary to understanding my argument and I brought it up only as a way of hopefully getting you to see my original point. I will drop it.

  4. Addendum,

    I should make explicit that I’m not considering the trivial set of only one fitness function that assigns the same value to all permutations as the baseline set, or for that manner, any set that is smaller than our reduced set. For a reduction to make sense, the reduced set needs to be a subset of the baseline set. Sorry I didn’t spell that out explicitly.

  5. 185

    Atom (and R0b),

    It seems that the key claim is that a regress of mechanism gets materialistic explanation nowhere in accounting for active information because there are at least as many alternatives in a higher-order space of material configurations as in the lower-order space. Dembski and Marks seem not to object to treating the known universe as a finite computing machine, and I’m going to proceed more or less along those lines. A huge fraction of alternatives we can allude to in mathematics have no physical realization simply because they require excessive resources to “fit in the universe.”

    In the third theorem, there are

    (M + 1) ^ K

    fitness functions, where K is the size of the base-level search space Omega. For any binary representation (e.g., the machine language of the computer you’re using), almost all fitness functions have no description of length much less than K log (M + 1). Even though Dembski and Marks indicate that M is large, I set M = 1 for simplicity. Now the typical fitness function requires K bits to describe.

    As Dembski and Marks observe, if Omega is the set of all length-100 sentences over a 20-amino-acid alphabet, then K is about 10^130. But Seth Lloyd estimates that the observed universe registers at most 10^120 bits of information. The upshot is that if the entire known universe were searching a space of descriptions of fitness functions, only a minuscule fraction of the descriptions would be sufficiently compact to arise:

    1 / 2^10000000000 [10 zeros].

    I have to note the absurdity of this scenario. We are within the universe, and to posit the existence of an entity that can observe a succession of states of the universe begs the question of the existence of a supernatural entity. Similarly, we cannot regard the evolution of the universe as a search process. We cannot frame the universe that has in its unfolding included us as an alternative to a null universe. There is no way to assign a physical probability to the universe. Thus we cannot associate active information with the universe. Some fitness functions are physically possible, and others are not — and you cannot attribute the mere existence of physical constraints to intelligence.

  6. Dr. English,

    Thank you for your contribution. I feel that your exasperation, however, has led you to make a couple leaps towards the end of your comment.

    You wrote:

    We are within the universe, and to posit the existence of an entity that can observe a succession of states of the universe begs the question of the existence of a supernatural entity.

    Whoa whoa whoa. No one I’m aware of was discussing a “supernatural entity” nor assuming one. Demsbki and Marks paper is about the mathematics underlying conservation of information; to begin discussing metaphysical interpretations is beyond this thread, as the contents of the paper itself have barely begun to be discussed.

    The universe can only instantiate at most a fraction of the total number of possible fitness functions, which is correct. So a reduction has already taken place due to the physical constraints. But this reduction, in as much at it improves the performance of our original search, would incur an information cost of at least the active information, if the math in the paper holds. You have not criticized the math, only its application, so I’ll assume it does hold.

    Now, you may argue “You cannot calculate this informational cost, since we don’t know the ‘probability’ of the universe.” It is true that we don’t know the probability of the universe. But the paper also provides a measure theoretic version of the theorem, which would apply if the probability distribution differed substantially from the uniform in a way that eventually assisted our lowest level search. (Even if the constraints are necessary, that is probability of 1 for that one state and zero for the others.)

    In short, you’d have a search-for-a-search-for-a-search. The universe would be assigned a (non-?)uniform probability (reduction 1, measure theoretic version) for assigning the set of possible fitness functions (reduction 2, measure theoretic version), from which we choose our actual fitness function (reduction 3, fitness theoretic version.)

    Unless I’m missing something (which is always a possibility) the LCI would seem to also hold vertically, for your tri-layered search.

    Atom

  7. Atom, I think we’re getting pretty close to the same page.

    I think a major discrepancy in our thinking is your association of information cost with performance averaged over all of the functions in the higher-order space. For instance:

    So if Reduction A results in a subset that still only performs as well as blind search, either a) the reduction didn’t improve search performance, and so incurs no informational cost

    [Emphasis mine]

    One counterintuitive aspect of Marks and Dembski’s framework is that information cost is not based on the average performance of elements in the higher-order search space. Rather, it’s based on the fraction of those elements that perform at a level of at least q. Information cost does not tell us whether the average performance of the higher-order space is better or worse than the null search. It only tells us what the odds are of randomly selecting a search that performs at least as well as the given alternate search.

    Consider that the set of functions that indicate proximity to a target performs no better on average than the larger set mentioned in endnote 49, i.e. they both perform on average the same as the null search. Yet Marks and Dembski say that the reduction from the latter to the former entails a heavy information cost.

    More later, probably after Mothers’ Day.

  8. Dr. English, an addendum,

    I have been thinking about my response and wanted to make a distinction. When I say we could possibly deal with the physical constraints (which I referred to as a form of “necessity”), what I meant was physical necessity, given the number of particles in the universe. I don’t want this confused with logical necessity, which wouldn’t make sense to treat as contingent (obviously, by definition).

    I just wanted to make sure I was clear on that point. Given that there is no logically reason we’re aware of that the universe has this number of particles, which causes a reduction to take place, then measuring a cost on that reduction could be meaningful (via the tri-level search outlined above.) If however there is a logical necessity to that number of particles, the reduction requires no explanation, as necessary entities are their own explanation.

    Atom

  9. “no logically reason”* => “no logical reason”

  10. 190

    Atom,

    You dropped my qualifier in “physical probability.” See more on this in the new thread Bill started.

    I’m guessing that you, like me, are more engineer than philosopher. I accuse myself of a serious error in neglecting computational complexity in my investigation of NFL. Dembski and Marks are making the same error in focusing entirely on information costs. There are huge distinctions in search programs when time and memory are limited. I don’t have to go with Seth Lloyd in saying that the universe literally is a computer to say that there are analogous distinctions in nature.

    This discussion has turned interesting at just the wrong time. I really need to put on the blinders and deal with the end-of-semester drudge work.

  11. Atom,

    Hopefully we’ve gotten past the confusion about average performance vs. information cost. I can’t remember my train of thought from a few days ago, so I’ll just reiterate the point that you’re disputing:

    1. Information cost depends on the definition of the higher-order search space.

    2. We can define the higher-order search space to contain only good searches, thus making the information cost zero and falsifying the LCI.

    3. In response to the objection that this higher-order search space must incur an information cost from an even higher-order search space, we can point out that this is true for all search spaces that have a non-zero probability of yielding a good search. If the LCI requires us to regress probabilities all the way up, then we’re stuck with an infinite information cost in every case.

  12. R0b wrote:

    One counterintuitive aspect of Marks and Dembski’s framework is that information cost is not based on the average performance of elements in the higher-order search space. Rather, it’s based on the fraction of those elements that perform at a level of at least q. Information cost does not tell us whether the average performance of the higher-order space is better or worse than the null search. It only tells us what the odds are of randomly selecting a search that performs at least as well as the given alternate search.

    R0b,

    You’ve almost got it. I didn’t say that the average performance of the higher level search was used to calculate the incurred cost, only that it can be used as an objective basis for deciding which informational costs are relevant, and hence, must be accounted for. It also provides a handy method for setting an objective baseline for for the higher level informational cost measure.

    My reply has been consistent and I fail to see any issue with using the method I outlined to define the higher order space in a non-ad hoc way.

    Atom

  13. R0b wrote in the next post:

    1. Information cost [of the higher order reduction] depends on the definition of the higher-order search space.

    Correct and agreed.

    2. We can define the higher-order search space to contain only good searches, thus making the information cost zero and falsifying the LCI.

    No we can’t, since doing so would result in a search performance on the lower level search. If a reduction leads to search performance on the lower level search, then we cannot ignore that cost. If it leads to no search improvement (and no hinderance, since we can contribute negative active information), then we can ignore it.

    3. In response to the objection that this higher-order search space must incur an information cost from an even higher-order search space, we can point out that this is true for all search spaces that have a non-zero probability of yielding a good search. If the LCI requires us to regress probabilities all the way up, then we’re stuck with an infinite information cost in every case.

    Either that, or a source that can generate information without relying on search spaces. But this is a side issue.

    Atom

  14. ” in a search performance on the lower level search ” = ” in improved search performance on the lower level search”

    Sorry, I type too fast sometimes.

    Atom

  15. Atom,

    Okay, I think I’ve finally got it. Sorry it took so long to sink in. I think your idea for defining a higher-order baseline is a good one, but I don’t believe it works with Marks and Dembski’s framework.

    First of all, back in [168] where I agreed with your point about all algorithms performing equally over the whole set of fitness functions, I was wrong. Marks and Dembski’s model is not, in general, NFL-compatible. The problem is that Wolpert and Macready define the goodness of a search in terms of the codomain of the fitness function, but Marks and Dembski define the target independent of the fitness function, as Tom English pointed out above.

    Consider an algorithm that finds the WEASEL target with the following logic: It randomly selects points in the search space until it finds a point whose fitness plus the number of the query is even. In other words, if it’s the 3rd query and the fitness is 127, then the condition is satisfied. After finding such a point, it immediately goes to “METHINKS IT IS LIKE A WEASEL”.

    No matter what fitness function we use, this algorithm will likely find the target within a few queries. So how do we apply your condition that the higher-order space of fitness functions must have the same average performance as the null search?

    I think that coming up with generally applicable constraints on the higher-order space definition is harder than meets the eye. As it says in the paper, the ways to search and to metasearch are endlessly varied, and the higher-order space definition can include or exclude any aspect of any conceivable search.

  16. R0b,

    Thank you for the reply. You wrote:

    Consider an algorithm that finds the WEASEL target with the following logic: It randomly selects points in the search space until it finds a point whose fitness plus the number of the query is even. In other words, if it’s the 3rd query and the fitness is 127, then the condition is satisfied. After finding such a point, it immediately goes to “METHINKS IT IS LIKE A WEASEL”.

    This strategy would no longer be using a standard evolutionary strategy, which could find different targets simply by using different fitness functions, but would constitute a new search strategy/algorithm. We could also say “What about an algorithm that simply tries one query, no matter what the fitness function, then goes to the target?” or any other variation of that. But these are different search strategies, so the fitness function method I outlined isn’t directly applicable, since they aren’t really evolutionary strategies in the normal sense of the word.

    However, your example wouldn’t escape the LCI.

    Going with your new set-up, we can see that there exists a similar set-up for every target in your lower level search space: for example, it could go to “Meblinks it is like a weasel” after satisfying the condition. So why did we choose the one algorithm that goes to our target rather than to “Meblinks…”, “Rethinks…”, “hstjdins…” or any other of the 10^40 permutation choices we have?

    More importantly, what is the minimum informational cost incurred by going from the set of all such algorithms (bounded by our original search space) to the set that chooses “Methinks…” with the same efficiency as the algorithm you constructed?

    As I mentioned, the “goto” target of your algorithm could have been any of the roughly 10^40 permutations in the original search space, so we have at least 10^40 algorithms to choose from. The search for your particular algorithm (or one that performs equivalently well) is as hard as, and likely much harder, than our original search.

    The LCI still holds.

    Atom

Leave a Reply