# CSI Confusion: Remember the Mechanism!

November 24, 2013 | Posted by Winston Ewert under Comp. Sci. / Eng., Complex Specified Information |

A number of posts on Uncommon Descent have discussed issues surrounding specified complexity. Unfortunately, the posts and discussions that resulted have exhibited some confusion over the nature of specified complexity. Specified complexity was developed by William Dembski and deviation from his formulation has led to much confusion and trouble.

I’m picking a random number between 2 and 12. What is the probability that it will be 7? You might say it was 1 in 11, but you’d wrong because I chose that number by rolling two dice, and the probability was 1 in 6. The probability of an outcome depends on how that outcome was produced. In order to calculate a problem, you must always consider a mechanism and an outcome. Any attempt to compute a probability in the absence of a mechanism is wrong.

Specified complexity is essentially probability+. In order for an outcome to exhibit specified complexity, it must be highly improbable while also being specified. That probability, as was just discussed, depends on the mechanism. Consequently, specified complexity also depends on the mechanism. You cannot look at specified complexity in a vacuum. Specified complexity must always be considered in the context of a mechanism.

With that in mind, let’s consider a comment from a recent blog post:

For that matter, take a robot in a room full of coins that have random heads-tail configurations. The robot orders them all heads. The final CSI inside the room (the Robot’s CSI plus the coin’s CSI) is now greater than what we began with!

Remember that the CSI has to calculated based on the actual mechanism in operation. In this case, we have to calculate the CSI taking into account the actions of the robot. Assuming the robot has no chance of failing, the probability of all coins being heads is 100%. Thus there is zero bits of CSI. The robot has drastically decreased the amount of CSI, not increased it.

The purpose of CSI is not to determine whether an artefact shows signs of being designed. The purpose of CSI is to evaluate whether various proposed mechanisms can explain the artefact. If an artefact exhibits high specified complexity with respect to a mechanism, that mechanism is a poor explanation of the artefact. It would have to be very lucky to produce that artefact. In fact, one can consider the CSI as a measurement of how much luck would be required to produce the artefact.

To see this, let’s consider the example of 2000 heads up coins. We want to know how they came to be all heads up. A first hypothesis would be that they were all flipped randomly, but all just happened to have come up heads. This has an probability of 1 in 2^2000 and a specified complexity of 2000 bits. We conclude that the hypothesis is incorrect. It simply requires way too much luck for all the coins to have come up heads. A second hypothesis would be that a robot or something similar came through and turned all the coins so that they were heads up. The probability of this is 1 in 1, and thus have 0 bits of specified complexity. Thus we do not reject the hypothesis. This does not mean that the hypothesis is incorrect, but the specified complexity gives us no reason to reject it.

One might be inclined to reject to view specified complexity as useless. It seems to basically just be a probability argument. As a recent comment said:

We can simply say, that on the assumption it is a fair coin, it is improbable — 1 out of 2^2000 and it violates expectation value by many standard deviations. We can make the design inference without reference to information theories.

But the question is: what’s so special about 2000 heads? My own coin sequence: TTHHHHHTHTHTTHTHTTHTTHHHHTTTHHHH… is just as improbable. There are various possible justifications, but it comes down viewing some sequences as special and others as random noise. What does it mean for the sequence to be special? It means that it follows an independant pattern, it is specified.

Specified complexity is nothing more than a probability argument that takes specification into account. Any valid probability argument must explicitly or implicitly have a specification. All probability arguments are specified complexity arguments. All specified complexity arguments are probability arguments. They are one in the same even if you don’t call them by the same name.

Another question raised was whether two copies of War and Peace contained more CSI than one copy. By now you should know the answer: it depends on the mechanism. Let’s assume for the sake of argument that the probability of producing a single copy of War and Peace by some mechanism is 1 in 2^1000 and thus exhibits 1000 bits of specified complexity. How plausible is that both books were produced independently by the same mechanism? In this case, the probability multiply and it exhibits 2000 bits of specified complexity. On the other hand, how plausible is that one book was produced by the mechanism, and the other is a copy? The probability is 1 in 2^1000, and the copy has a probability of 1 in 1. Thus the total specified complexity is 1000 bits.

Remember, CSI is always computed in the context of a mechanism. Specified Complexity is nothing more than a more rigorous form of the familiar probability arguments. If you try to measure the specified complexity of arbitrary artefacts you will run into trouble because you are trying to use specified complexity for something it was not designed to be. Specified complexity was only intended to provide rigour to probability arguments. Anything beyond that is not specified complexity. It might be useful in its own right, but it is not specified complexity.

### 19 Responses to *CSI Confusion: Remember the Mechanism!*

### Leave a Reply

You must be logged in to post a comment.

Thanks Winston Ewert!

Thank you Winston for posting this. Sorry we have some disagreement, but it is worth discussing. As always, I salute the very very fine work you did with Robert Marks and Bill Dembski, and I’m envious you succeeded where I did not at the EIL.

With that in mind, let’s consider a comment from a recent blog post:

Remember that the CSI has to calculated based on the actual mechanism in operation. In this case, we have to calculate the CSI taking into account the actions of the robot. Assuming the robot has no chance of failing, the probability of all coins being heads is 100%. Thus there is zero bits of CSI. The robot has drastically decreased the amount of CSI, not increased it.

But a human could do the same and order all the coins. Would we still say the CSI in the room decreased? If a quantum atomic random number generator were inside the robot such that there is actual uncertainty as to whether he will invoke his coin ordering program, can we say the probability of coins will be 100%? No, there is still uncertainty in the outcome.

I will say, my view is that there are limits to what the robot can do, and those constraints are in the robot’s software. Although the actual limits of the robots abilities might not be exactly known (even to the robots builders), the limits exist. We can say, if he does something, it was within the limits of what was front-loaded into the robot.

I’m not trying to cause trouble here, but when I was trying to explain ID to students at Cornell’s ID class (taught by Allen MacNeill), I began to see all these paradoxes.

Ways that I resolved the paradoxes:

1. allow CSI to grow in open systems

2. allow weak AI to be classed as a form of intelligence

How this came up, I’ll just cut and paste a comment.

I’m not putting forward anything that students of ID themselves might not put forward themselves. These concerns are not coming out of the Darwin camp, they have been shared privately by various IDists. We’re coming forward now with some of these concerns.

… because there are only 9 whole numbers between 2 and 12, not 11

Let’s say that the robot has 25% chance of succeeding in his mission, and 75% chance of failing for some reason. Thus there is approximately a 25% chance of all the coins being heads up (I neglect the odds that the robot failed, but the coins just happened to be all heads anyways). This means that there is approximately two bits of CSI exhibited by the coins.

I’d like to develop the relationship between humans and CSI in a separate post, as well as the notion of weak AI being intelligence.

This is what Dembski has explicitly said, see Page 163 of

No Free Lunch.Cantor,

When someone asks you to pick a number between 1 and 10, do you think that 1 and 10 are off limits?

See Also: http://english.stackexchange.c.....rom-a-to-b

It’s all a matter of specification!

So CSI is rehashed probability.

ApparentlyDr.Dembski in NFL page 155-156 says Specified information of less than 500 bits can be generated by chance, so it is also wrong to assume that new information can’t be created -as many here believe. Law of Conservation of Information is about limit of new information.selvaRajan @7:

I don’t even need to look up the reference to know that is not what Dembski said. There is an important difference between:

- specified information of 500+ bits cannot be generated by chance; and

- specified information of <500 bits can be generated by chance.

Further, no-one is disputing that new information can ever be created by chance. But the known examples of new information being created are utterly trivial. They don't even approach the 500 bit threshold, and as a practical matter chance doesn't stand a chance (no pun intended) of producing anything even remotely close to that.

The 500-bit threshold was set to take a task that is improbable (generating new information) to the point where it is virtually impossible, given the resources of the known universe. This throws out many instances in which information was purposefully created, but is a reasonable approach in order to avoid false positives.

Thanks Winston Ewert, for an interesting post. We certainly appreciate you taking time to weigh in.

I realize you were directing your comments primarily toward Sal’s posts, but if you’ll forgive me, I’d like to jump in for a moment and think through where your terminology/approach leads.

This is an interesting way to frame the question. Yet in all practical cases the two competing proposed mechanisms we are evaluating are (i) purposeless chance vs. (ii) a purpose-driven mechanism (design). So as a practical matter, if the chance hypothesis is rejected – due to the CSI being beyond the universal probability bound – then design becomes, not a definitive, logical deduction, but a properly accepted inference to the best explanation.

OK, but that assumes we know what a robot is, assumes we know how it works, and assumes we know it was “designed” to do what it did. In other words, your example of identifying CSI as zero bits presupposes that we already

knowdesign to be the correct mechanism.In other words, the process of inference that we usually have to go through has been flipped on its head. Instead of looking at the artifact of unknown provenance and determining, based on the CSI in the artifact, what the probable mechanism was, we know the mechanism to be design and then declare that as a result, there is no improbability and, therefore, no CSI.

That seems like a bit of an unusual way to look at the situation. It seems there is value in looking at the artifact to determine whether it manifests the properties of complex specified information. Then, with that as a known, we can evaluate whether the two possible mechanisms (chance or design) are reasonable candidates.

In substance, it is two ways of saying the same thing. Either:

(i) we look at the ‘C’ part of CSI as what exists in the artifact – admittedly under a reasonable chance mechanism – and then see whether we can infer design or chance as the best explanation; or

(ii) we say that we can’t calculate the ‘C’ part of CSI unless we know the mechanism, and therefore run (a) a calculation of ‘C’ under a reasonable chance mechanism as well as (b) a calculation of ‘C’ under design (which, per your description, essentially collapses to a probability of 1), and then see whether we can infer design or chance as the best explanation.

As a result, from a practical standpoint in either case where the mechanism is not known beforehand, the analysis collapses to a calculation of ‘C’ based on a reasonable chance mechanism.

So we can either say that we are calculating the “actual” CSI in an artifact based on a chance mechanism, or we can say that we are calculating the “potential” CSI in an artifact (meaning, again, the CSI that would exist under a chance mechanism).

The calculation is the same. The result of the analysis is the same. The only difference is the use of terminology in whether we say that we view an artifact as actually containing complex specified information or whether we view complex specified information as being just a pro-forma construct that describes possible competing mechanisms.

Up to this point, I think we are fine, and it is just a question of semantics.

However, the thing that seems strange about taking the latter approach, is that, based on your example, known design yields 0 bits of CSI, or in other words has no CSI, which is precisely the opposite of how we would normally think of it in normal language terms.

Anyway, this is too long already. I believe I understand what you are saying in terms of how CSI is used to evaluate whether various proposed mechanisms can explain the artifact. However, using terminology that suggests things that come about by chance can have large amounts of CSI, while those that come from known design have no CSI, seems completely backwards from the normal use of language and is likely to generate much more confusion than light.

Perhaps we are on the same page and it is just Sal’s specific robot example that you felt didn’t result in CSI, while design itself can be said to result in CSI?

I think this is a good point, and one worth pondering.

If I am understanding what you are saying, by “arbitrary” you mean without taking into account either a potential mechanism (again, typically done as a chance mechanism by default), or a proper specification or both. Fair enough.

—–

Thank you again for taking time to share your thoughts.

Winston Ewert:

Thank you for a very important clarification.

I recently wrote here:

First of all, I would point out that dFSCI (I will use my restricted definition from now on) is not a property of an isolated object: it can only be assessed for an object in relation to a specifically defined system, which is the system that we believe generated the object in its present form.

I think that many problems arise from the fact that many imagine CSI as some “absolute” property of the object itself, out of context, while it is a property that we assess in a definite context and with definite assumptions.

When I use my personal definition of a subset of CSI (digital Funcionally Specified Complex Information), I always emphasize that the computation of it is relative to a system, to its mechanisms and probabilisitc resources, and to the defined function. That’s why the same object can exhibit different values of dFSCI for different defined functions, or for different systems.

Thank you again.

With respect to 2000 fair coins found in a room all heads, I would use the chance hypothesis appropriate for a fair coin, thus the number of bits is 2000. It passes the EF.

The mechanism of how it got in that configuration could be:

1. human

2. human that made a machine that ordered the coins

3. a collection of humans that configured the coins

4. a collection of humans that built the robot that ordered the coins

But the mechanism of how it got there as far as looking at the set of coins in isolation is immaterial to saying the CSI is 2000 bits. This is certainly the spirit of how the EF will calculate the number of bits for 2000 coins.

So prior to the coins being all heads (some random configuration) we judge the CSI to be 0.

After the coins are all heads we judge it the CSI to be 2000. Thus change in CSI is +2000 going from random to all heads — the delta-CSI of the set of coins is +2000 bits.

But if we insist the combined Robot and coin system have no net change in CSI when the coins go from random to all heads we have some inelegant looking numbers.

The CSI for the coins goes up by 2000, then where do we get a -2000 bit decrease in CSI to make the delta-CSI for the combined robot and coin system equal zero? When we redraw the system boundary from merely the coins to include the robot, the numbers don’t have nice elegant addition to them.

It seems to me the information entropy of the combined system should go up as well — meaning we have more bits after the robot orders the coins, otherwise you get some strange looking cases where delta-CSI for the coins goes up, but one has to justify that the delta-CSI for the robot+coin system staying the same — which seems bizarre.

One is of course free to adopt any convention one wishes to analyze artifacts, but it doesn’t seem very elegant to say somehow the net CSI of the robot+coins system stayed at zero even though the CSI of the coins by themselves went up by 2000 bits.

How would we treat a human+coins system? Will we say the delta-CSI for the system is zero going from random coin configuration to all heads is +2000 bits for a human+coin system?

If we start changing the bit numbers based on the mechanism design (we give one set of numbers for robots, and another for humans, even though the final artifacts are the same), then we are making the EF incorporate more information than a simple chance hypothesis. One of course is free to do this, but that would mean redoing the EF to distinguish between intelligent mechanisms up front, which sort of defeats the purpose of the EF, which wasn’t supposed to have access to a description of the designing agency.

Further, let us suppose we say the delta-CSI going from 2000 coins in random configuration to all heads is:

1. 0 bits for the robot+coins in a room

2. 2000 bits for human+coins in a room

we’re now affixing bit values to the system based on knowledge about the designers — but that sort of seems the wrong way to go because most design inferences of interest assume we don’t have access to the nature of designer(s).

NOTE #1

Even in thermodynamics, when there is a transfer of heat there from the hot to the cold, the entropy of the total system goes up. Energy is conserved, but the number of degrees of freedom are increased.

The problem with conservation of information, is that there aren’t good metrics for the baseline amount of information. To model conservation of information we’d have to say something like algorithmic information remains conserved, but not the Shannon entropy.

If were to assert some sort of conservation law, that’s the way I’d phrase it. The algorithmic information remains constant, but the shannon entropy goes up. That is analogous to conservation of energy but increasing entropy. That’s essentially what happens when a ZIP file is decompressed — algorithmic information is conserved, but Shannon entropy goes up.

NOTE #2

My fundamental objection is CSI is expressed in shannon entropy. Like thermodynamic entropy, it is not a conserved quantity.

Energy is conserved. Algorithmic information is conserved, but CSI is not the appropriate measure of algorithmic information (at least not the basic form as laid out in the EF) since it is stated in terms Shannon entropy.

NOTE #3

If we distinguish between Algorithmic Information (AI) versus CSI (measured in Shannon metrics) we could say the AI is conserved in the robot+coin system before and after the coins are ordered, but the CSI (the Shannon entropy of the CSI) goes up. That would parallel the way we view things in physics, and it also seems more sensible.

It would also give a way to describe the evolution of a compressed file going into a decompressed state whereby algorithmic information is conserved but Shannon entropy is increased.

Sal:

I don’t agree with you. I would treat your example this way:

a) Define the system and the time span:

Let’s say the system is the room, amd the time span is, say, one year.

b) Define the configuration we observe and about which we are assessing dFSCI, and which we in define as functional:

In this case, the 2000 heads

c) Verify that this configuration has appeared in the system during the time span, and was not already present in it:

For example, the 2000 coins were at the beginning of the time span in a random configuration, or just in a completely unrelated configuration, and that in some way they were tossed or reordered during the time span.

d) Verify that no known necessity mechanism present in the system can be responsible for that outcome, or can simply strongly favor it.

In the case of a highly compressible configuration, like this one, we have to rule out simple necessity algorithms (for example, the coins are not fair). If a robot programmed to order the coins is already present at the beginning of the time span, again that is a known mechanism that can explain the observed outcome by necessity.

Please, note that it is not relevant here is that mechanism is designed (the robot) or “natural” (unfair coins, not specifically designed to that purpose). The important point is that we are trying to explain the observed configuration in that system and in that time span. We are not trying, here, to explain the robot or the coins.

e) Only if all the previous points are satisfied (and therefore no known mechanism existed in the system that can explain the configuration) can we infer that the coins were in some way ordered in that configuration by a conscious intelligent agent, for some conscious intelligent purpose. IOWs, we can reasonably infer design.

Sal:

Another important point. I think that all your problems with algorithmic information and CSI can be easily solved if you consider Kolmogorov complexity instead of simple complexity.

IOWs, an algorithm generating functional complexity has its own complexity. If the complexity of the algorithm is less than the complexity of the output, than we have to take the complexity of the algorithm for measure.

I have many times made the example of a software computing the digits of pi. If the output has a low number of digits, its complexity is lower than the complexity of the software. IOWs, if we consider the hypothesis of generation from random variation, the output is more probable that the possible algorithm which can compute it.

But if we consider an output of pi with so many digits that it is more complex than the software which can compute it, than the true Kolmogorov complexity of the output becomes that of the software. IOWs, the emergence of the software is more probable, in a random system, than the emergence of its output.

In that sense, algorithms can certainly increase the complexity of an output, but never the Kolmogorov complexity of the whole system. IOWs, algorithms can never generate true new original CSI.

Only conscious intelligence can explain the “miracle” of dFSCI. The reason is simple enough. The generation of new dFSCI is made possible only by the conscious intuitions of meaning and purpose, and those experiences are not algorithmic. They are phenomena that can only take place in a conscious agent.

gpuccio,

Thank you for you comment. And I actually think we are collectively getting somewhere in this discussion, though most may not think so. I thank Winston for bringing this up, and this is important enough, I’m considering writing a formal paper on this.

As Eric points out, there are semantics going on, but maybe a better way of phrasing it is “what metric conventions should the ID community adopt?”

Exactly! Algorithmic information is understood in terms of Kolmogorov complexity. However, a priori Shannon information is not, because that’s not how Shannon conceived of information (especially over a communication channel).

If we adopt the convention that algorithimic information is conserved (unless it is destroyed, essentially leaking out of the system boundary) but not the Shannon entropy, then these complications have resolution.

It resolves the problems. You have your PI algorithm. Just because you decompress it and it starts actually printing the digits, the data in evidence does not have any more conceptual (algorithmic information), but clearly once its starts printing, the Shannon information on the paper has increased substantially. Algorithmic information is conserved, but the combined entropy, (the delta-S) of the paper plus computer has gone up — very analogous to thermodynamics where you have one conserved quantity (energy) and one non-conserved quantity (themal entropy).

There is the classic example of a hot brick and cold brick put in contact and then reaching thermal equilibrium. The energy of the two bricks is conserved but the entropy of they joint system goes up.

Here we have an analogous situation, because algorithmic information is K-complex, when it decompresses we have to represent it with more bits in the Shannon sense, but the actual algorithmic information is conserved even though we’re now distributing is representation over more bit (more bits is more Shannon entropy).

In your example, the the conserved quantity is algorithmic information (that is Komogorov compressed in the PI algorithm) — that information level never changes, but Shannon information on the paper that is having the digits of PI written on it can increase.

This is exactly what happens when a decompression algorithm is at work, or we’re making tons of copies of

War and Peace. It resolves the paradoxes.This is the more sensible convention. It’s not a matter of who is right or wrong, but which conventions are more workable.

I’m not doing the topic justice here in this informal discussion, I’ll have to write it up more formally.

I’m sure I’m not the first to conceptualize things in this way — its probably stated in some journals on compression algorithms.

Hi Winston Ewert,

Thanks very much for a very thought-provoking and stimulating post. Much food for thought here. Thanks again.

If I walk into a room and find 1000 coins heads, I don’t need to have the faintest idea about what mechanism produced the 1000 heads to know that I’m not seeing CSI. 1000 heads in a row

is notcomplex. There is no ‘C’.That doesn’t mean it wasn’t designed. It just means that we can’t use CSI to infer design. If we add the ‘S’ consideration, we might rule out chance. But necessity is still a very realistic possibility, because we have no ‘C’.

This is all easily ascertained by looking at the artifact itself. We need not wring our hands and speculate about potential mechanisms.

Sal @2:

Please, oh please, don’t go down this “open system” route. Nothing but miscommunication and confusion can result, particularly for any listener.

The whole “open system” vs. “closed system” issue is a red herring and, ultimately, is nothing but a semantic game. Worse, the Darwinists have already seized upon the idea that “anything can happen in an open system,” so if you start using that terminology (even if you have a different nuance in mind) you play right in to the hands of the materialist “explanation” for how CSI can arise. Gee, after all, the Earth is an “open system,” and so on . . .

There is no CSI paradox here. You are trying to resolve an issue that is not an issue, and as a result are creating confusion in the process. Again, what new CSI do you imagine exists if we make a copy of an artifact that already exists? We are back to the definition of “more” and “new” that I discussed at length in the prior thread. It is nothing but a semantic game.

There is nothing inherent about an “open system” that would allow CSI to increase. Let’s not go there. If we want to talk about intelligent input into a system, fine. If we want to talk about front-loading, fine. But simply talking about purely natural processes and then adding “open system” doesn’t mean that new CSI now becomes possible. It is simply not true.

I apologize for speaking so strongly on this point and please know that I have a great deal of respect for many of your insights.

That’s why I suggest dropping the word “Complex” from CSI — it’s a misnomer. Bill’s alternate “Specified Improbability” was a much more germane term, he would have spared so much agony if he went with his alternate intuition.

Sal, just because ‘C’ does not exist in a particular artifact (1000 heads in a row) does not mean that ‘C’ is irrelevant to the analysis!

Indeed, the lack of ‘C’ in 1000 heads is part of the reason we cannot infer design for 1000 heads. In other words, the CSI analysis worked flawlessly. We checked for ‘C’ and it wasn’t there; thus no definitive inference of design.

Your proposal is like saying we measured the voltage in a particular wire and found none, so we are now going to throw out out voltmeter because it isn’t needed. No — it did what it was supposed to do and gave us the right answer for the particular instance we were looking at.