Home » Intelligent Design » ScienceBlogs praises disses Dembski-Marks paper on Conservation of Information

ScienceBlogs praises disses Dembski-Marks paper on Conservation of Information

ScienceBlogs has just posted what can only be called a rant (go here) against the paper by Robert Marks and me that was the subject of a post here at UD (for the paper, “Life’s Conservation Law,” go here; for the UD post, go here).

According to ScienceBlogs, the paper fails (or as they put it, “it’s stupid”) because

(1) As a search, evolution is a multidimensional search. Most of our intuitions about search landscapes is based on two or three dimensions. But evolution as a landscape has hundreds or thousands of dimensions; our intuitions don’t work.

(2) Evolution is a dynamic landscape – that is, a landscape that changes in response to the progress of the search. Pretty much every argument that Dembski makes can be thrown out on the basis of this one fact: all of his arguments are based on static landscapes. Once the landscape can change, every single one of his arguments become invalid – none of them work in dynamic landscapes.

(3) As a search, evolution doesn’t have to work on all possible landscapes. It doesn’t even need to work on most landscapes. It works on landscapes that have a particular kind of structure. It doesn’t matter whether evolution will work in every possible landscape — just like it doesn’t matter that fraction notation doesn’t work for every possible real number. What matters is whether it works in the particular kind of landscape in which our theory says it works. And on that question, the answer is quite clear: yes, it works.

Regarding (1), the work by Robert Marks and me typically focuses on compact metric spaces, which can include infinite dimensional spaces; for the purposes of this paper, which simplifies some of our previous work, we went with finite spaces. But even these can approximate any dimensionality we like for empirical investigations. Regarding (2), we explicitly point out that our approach is general enough to model time-dependent fitness functions (see section 8 — hey, why bother reading a paper if you know it’s wrong and can simply intuit the mistakes the authors must make). What ScienceBlogs appears not to appreciate or understand is that time-dependent fitness functions can be modeled by time-independent fitness functions (“static landscapes”) provided that one represents the search space with sufficiently many dimensions (by going to a Cartesian product — we point this out explicitly in our paper). Regarding (3), our point is that precisely because evolution works with constrained landscapes, those constraints require prior information. Yes, the environment is pumping in information; so where did that information come from? ScienceBlogs resents the very question. But what’s the alternative? Simply to say, “Oh, it’s just there.” The Law of Conservation of Information, despite ScienceBlog’s caricatures, provides cogent grounds for thinking that the information had to come from somewhere, i.e., from an information source.

  • Delicious
  • Facebook
  • Reddit
  • StumbleUpon
  • Twitter
  • RSS Feed

82 Responses to ScienceBlogs praises disses Dembski-Marks paper on Conservation of Information

  1. Atom:

    No we can’t. You measure the fraction of “efficient” functions from the total number of elements in the next largest set inducing an average performance equal to blind search.

    Atom, four answers to this:

    1) I’m not sure what you mean by “next largest”. Next to what? A lot of sets of functions can have an average performance the same as the null search, and some of the sets can be very small. Consider the set that consists of a single function in which every point has a fitness of 0. For some algorithms, this will result in a performance equal to the null search.

    2) I don’t see anything in the paper that states or implies the bolded part above. If your idea remedies this problem, then Marks and Dembski need to add it as a condition to the LCI.

    3) Having said that, I don’t think it remedies the problem. Consider a case in which q=2*p. To falsify the LCI, we need to show that more than 1/2 of the higher-order space consists of searches that succeed with a probability of at least q. We can define our higher-order space so that, say, two-thirds of it consists of these “good” searches, and the other third consists of searches that are bad enough to offset them, so the average of the whole set is the same as the null search.

    4) Your condition doesn’t seem to be generally applicable. See my comment here.

  2. Good morning R0b.

    1) I already discussed this trivial set in a previous post and said we’re looking for the next largest set from the “reduced” set. Since Dembski/Marks’ paper begins with a set-up where someone shows an improved performance over blind search (such as by using a fitness function, f1), we begin with that set and add to the higher level space until we reach a null performance baseline. Then we measure the fraction of “good” functions (with efficiency at least as good as the first proposed function, f1) to this total set. According to the paper, the informational cost of this reduction will be at least the active information.

    2) You are completely correct, though I believe this is implied in the paper due to the way they set-up the problem. It is a straight-forward extension of their work and I agree they should probably state it explicitly.

    3) If you can do what you propose – begin with a higher level search space with performance averaged to blind search on the lower level search then reduce that set to a good fitness function that increases performance such that the active information gained is greater than the informational cost incurred by your reduction – I will concede. If my ideas were not what Dembski and Marks had in mind with their paper they may clarify and argue against your point, but I won’t. So you will have proved your point to me, at very least.

    Atom

  3. Joseph:

    As I said there isn’t anything to search for.

    So Darwinian selection in a scenario without something to search for would be nature, operating freely.

    Darwinian selection with a target is not nature, operating freely.

    From page 8 of the Dembski/Marks paper:

    In other words, viability and functionality, by facilitating survival and reproduction, set the targets of evolutionary biology. Evolution, despite Dawkins’s denials, is therefore a targeted search after all.

    Do think that you and Dembski agree with eachother?

  4. R0b, continued,

    4) I posted a reply here. Although a fitness function method would not work well when we’re using a different search strategy, a similar way of setting a baseline could be used in other cases as well. But since I can’t enumerate all cases (being an infinite number), I can explain the applicability on a case-by-case basis until it is clear to you that the problem you posed, while insightful and demonstrating a good place that the paper could have been more explicit, does not represent an insurmountable obstacle.

    Atom

  5. Hoki,

    See comment 12

  6. Joseph,

    This is just going around in a circle and getting quite tedious.

    Dembski is arguing that Darwinian evolution is teleological and a targetted search. Agree?

    If not, why not?

  7. Atom, you were right and I was wrong. You’re a genius, man. (Not that it takes a genius to be right when I’m wrong.)

    Not only does the LCI follow from your condition, but you’ve also pointed the way to much easier proofs for the three CoI theorems in the paper.

    Here’s a way that the LCI can be derived from your condition.

    Definitions:
    p,q: Same as in the paper
    O2: Higher-order space
    Q: Set of “good” functions in O2
    sum(X): Sum of all probabilities in set X
    |X|: Cardinality of set X

    Derivation:
    1. Since the probabilities in Q are at least q:
      sum(Q) >= q*|Q|

    2. Since sum(O2) >= sum(Q)
      sum(O2) >= q*|Q|

    3. Divide both sides by |O2|:
      sum(O2)/|O2| >= q*|Q|/|O2|

    4. Your condition is sum(O2)/|O2| = p. So:
      p >= q*|Q|/|O2|

    5. So:
      p/q >= |Q|/|O2|

    And that’s the LCI.

    And since your condition obviously holds in the scenarios posited by the three CoI theorems, the above constitutes a simple proof for those theorems also.

    Unless I’m wrong again. Did I mess up somewhere?

  8. Re #68

    I aplogise for being too lazy to trace back all the posts – where did Atom’s condition:

    sum(O2)/|O2| = p

    come from?

    Also, even if:

    p/q >= |Q|/|O2|

    Is not the LCI unless you assume all members of O2 are equally probable. D&M do assume this when they write of their “epistemic rights” to assume a uniform probability distribution. But there are massive problems with this assumption and it is key to the whole paper.

  9. R0b,

    I’ve just gone through your proof step-by-step and you are in fact correct: it is a simpler method of proving the COI. You didn’t make any mistakes in your derivation (that I saw) and the final step is equivalent to Dembski’s function-theoretic derivation.

    I wish I could take credit for being a genius, but you’re the genius who built the proof. So let’s just say we’re both pretty smart guys. :) (Feel free to share any for that discovery as your proof was elegant.)

    Mark Frank,

    The condition

    sum(O2)/|O2| = p

    is based on the definition of O2, which is the next largest set containing Q as a proper subset and has an average performance (on the lower-level search) equal to null, blind search. This is what I said was the logical definition of our higher order search space and as R0b and I have shown, is a sufficient condition for the LCI to hold.

    As for your second objection, you can assume that O2 has a non-uniform probability distribution on its elements that makes “good” functions more likely than bad, the same way that O2 induces a higher probability on “good” elements in the original search space, O. Since the probability distribution on O2 is only one of many possible, you now have to explain what the cost of choosing that probability distribution over the others was. So you have a search-for-a-search-for-a-search. Dembski has proven a measure-theoretic version for probability distributions and demonstrated that the LCI still holds. So your regress doesn’t solve the problem, it only exacerbates it.

    Atom

  10. Mark Frank,

    On further reflection I think you may not even need the uniformity assumption to get from step 5 (p/q >= |Q|/|O2|) to the LCI.

    I don’t believe we have made use of the uniformity assumption in the initial steps (steps 1-5), or in our definitions. (Though I could be wrong…) All that we’ve said is that p is some probability and that q is a greater probability than q, so it has been improved over p. Furthermore, Sum(O2) / |O2| = p, so that the O2 set has on average the same performance as the original search p and so can serve as an objective baseline. Those were the important definitions and I don’t think we’d have to change anything if p differed from a uniform search probability, since we left p as a variable. The above will work for any value of p.

    From there, we do the following to get LCI:

    6. Rearrange, by multiplication and division
    |O2|/|Q| >= q/p

    7. Take the log (base 2) of both sides
    log(|O2|/|Q|) >= log(q/p)

    8. log(q/p) is the active information (I+), by definition
    log(|O2|/|Q|) >= I+

    9. Break up log, using quotient rule
    log(|O2|) – log(|Q|) >= I+

    10.Rearrange logs and factor out -1
    -[log(|Q|) - log(|O2|)] >= I+

    11. Combine, using quotient rule, we get
    -log(|Q|/|O2|) >= I+

    …which is the LCI.

    Atom

  11. Atom

    Loads of comments I could make – but to quickly address your last post. Taking logs makes little difference – except to confuse things slightly.

    The LCI is that:

    -log(probability(Q))>=I+

    But probability(Q) only equals |Q|/|O2| if you assume that all functions in O2 are equally likely i.e. a uniform probability distribution.

  12. Atom

    More on uniform probability distributions (UPDs)

    D&M’s measure-theoretic version assumes a UPD itself. It assumes that all pdfs across the search space are equally probable. So it can’t be used to prove that a UPD is justified.

    See Häggström 2007 (pp 6-7) for some of the problems with UPDs. One of them is that UPDs are not closed under non-linear transformations. In most real situations there is more than one UPD to choose from. Häggström uses the example of the size of a square. Do we say all lengths of the side are equally likely or all areas are equally likely? We can’t have both. Something similar applies to choosing algorithms. For example, M&D give three “definitions” of an algorithm. All three assume UPDs. However, in at least some cases, the UPD assumptions of the definitions are incompatible.

    I can illustrate with a simple example. Suppose:

    The space we are searching (?) is the digits 1 2 and 3.

    The target (T) is the digit 1.

    So p=1/3

    Using the function theoretic approach let the other space (?’) be the two letters a and b.

    Then here is the set of all possible functions from ?’ to ? and the associated value of q

    a b q
    1 1 1
    1 2 0.5
    1 3 0.5
    2 1 0.5
    2 2 0
    2 3 0
    3 1 0.5
    3 2 0
    3 3 0
    ?
    We could assume that each of these is equally likely. But each function is also associated with a probability distribution function on ?. Thus (sorry about the formatting):

    a b -1- -2- -3-
    1 1: 1.0 0.0 0.0
    1 2: 0.5 0.5 0.0
    1 3: 0.5 0.0 0.5
    2 1: 0.5 0.5 0.0
    2 2: 0.0 1.0 0.0
    2 3: 0.0 0.5 0.5
    3 1: 0.5 0.0 0.5
    3 2: 0.0 0.5 0.5
    3 3: 0.0 0.0 1.0

    And you will see that there are only six unique pdfs (e.g. 1 2 and 2 1 give the same pdf).

    But in the measure-theoretic version M&D assume that all pdfs are equally probable. In which case the function 1 2 and the function 2 1 should count as one algorithm. Which UDP is it?

  13. I see that WordPress has turned my greek omegas to ?. I hope it still makes sense.

  14. Folks:

    Pardon a quick note:

    H = SUM (pi log pi) does not at all assume a uniform probability distribution. (We do use info theory with say English text, which has a significant degree of redundancy, i.e non-uniformity of probability. Also cf Bradley’s working out of ICSI for 110-aa Cytochrome-C here, which treats of the non-uniformity per Yockey et al.)

    [I would be most interested to find out that the laws of physics and chemistry had in effect written into them, the DNA code, processing algorithms and associated molecular nanomachinery; onward the integration of proteins to form the complex, interwoven systems of life in the cell! If that is the effective objection to inference to design on seeing FSCI in DNA and its cognates, that looks a lot like jumping form the frying pan into the fire.]

    Also, that much derided uniform probability distribution is saying that this is the maximum uncertainty case, where the symbols i are least constrained. (It is a generally accepted principle of probability that absent reason to constrain otherwise, we default to equiprobable individual outcomes. Bernouilli and Laplace among others, if I recall. A classic and effective approach to statistical mechanics is based on just that.)

    We can then make shifts to account for non-uniformity; and H the average information per symbol is an application of that.

    GEM of TKI

    PS: Atom et al — good stuff.

  15. Mark Frank:

    The LCI is that:

    -log(probability(Q))>=I+

    Marks and Dembski’s stated formulation of the LCI is vague on the condition of that probability; that is, probability(Q) given what? But their examples make it clear that they’re talking about a null higher-order search. Notice that each of their three CoI theorems ends with the statement, “Equivalently, the (higher-order) endogenous information … is bounded by the (lower-order) active information…”

  16. Mark Frank:

    D&M’s measure-theoretic version assumes a UPD itself. It assumes that all pdfs across the search space are equally probable. So it can’t be used to prove that a UPD is justified.

    Yes, as they regress probabilities up the hierarchy, they keep moving their assumption of uniformity to a higher level. Ultimately they justify that assumption by the principle of insufficient reason. As you point out, Haggstrom and others have explained why this justification doesn’t work.

    As you also point out, Marks and Dembski’s “information cost” is arbitrary, as it depends on how we define the higher-order space. Without Atom’s condition, the information cost can range from 0 to infinity. With Atom’s condition, the lower bound is at least log(q/p), but the upper bound is still infinity.

  17. Mark Frank:

    where did Atom’s condition:

    sum(O2)/|O2| = p

    come from?

    Atom’s position is that this condition is implied in Marks and Dembski’s work. Indeed, in each of their examples, they define the higher-order space with a symmetry that evenly distributes probabilities over the lower-order space, which satisfies Atom’s condition.

    This symmetry is how Marks and Dembski neutralize any deviation from uniformity. Here’s the game:

    1) They posit a completely unbiased search space.

    2) You counter with a fitness function (or search space translation, or probability distribution, etc.) that biases some points over others.

    3) They counter with a uniform space of fitness functions (or of search space translations, or of probability distributions, etc.) that again renders the original search space unbiased.

    4) etc.

    Without Atom’s condition, the LCI is easily falsified. With Atom’s condition, the LCI is easily proven. Interestingly, the paper says that the LCI is neither falsified nor provable.

  18. R0b

    I think this thread is pretty much dead, but for completeness.

    “With Atom’s condition, the LCI is easily proven.”

    I don’t think that is true unless you also assume whichever UPD fits your needs (see #72 above).

  19. Mark, yeah, the thread is mostly dead. Where’s Miracle Max when you need him?

    Yes, your function-theoretic higher-order space does not meet the conditions of the measure-theoretic CoI. A measure-theoretic higher-order space would look like this (the set should be infinite, but I’m setting the granularity to 1/3 to make it finite):

    -1- -2- -3-
     1   0   0
     0   1   0
     0   0   1
    2/3 1/3  0
    2/3  0  1/3
    1/3 2/3  0
     0  2/3 1/3
    1/3  0  2/3
     0  1/3 2/3
    1/3 1/3 1/3

    But this set has something in common with your function-theoretic set, namely that the average of the distributions is:

    1/3 1/3 1/3

    In each of Marks and Dembski’s three CoI theorems, the assumptions of the theorem entail a uniform average distribution on the lower-order space. This means that Atom’s condition is met, and the LCI conclusion follows.

  20. R0b

    It may well be that we agree but I am reluctant to overstate Atom’s position.

    Atom’s condition is that the average value of column one is 1/3. However, this does not necessarily mean that the probability of finding 1 in the lower order set from this set of searches is 1/3. For that to be true you need an additional assumption that each row in the set of searches is equally probable. This is the UPD assumption. It is the combination of this (unreasonable) assumption and Atom’s condition that leads to LCI.

    Maybe that was what you were saying – I just wanted to be clear.

  21. Mark, you’re correct — both assumptions are needed. Marks and Dembski’s one-sentence statement of the LCI doesn’t explicitly state either of them, but elsewhere they state that the comparison is between the lower-order active information and the higher-order endogenous information, which entails the higher-order UPD assumption. Atom’s condition, on the other hand, isn’t stated anywhere, although all of their examples meet it.

  22. Wm. Dembski writes:
    An environment with Karl Marx, paper, and pen in it will output Das Kapital.

    Not necessarily.

    Yet this sort of thinking demonstrates one of my pet peeves with the ID movement’s claims in this area – they must know the outcome (in this case, that Marx wrote Das Kapital) prior to being able to give their equations/claims/analogies/filters a chance at success.

    Sort of like how biblical creation scientists KNOW that the b ibical version of history is 100% true, then seek facts and evidence to support their conclusion.

Leave a Reply