Home » Mathematics » Wanted: Mathemagician to work with extremely large values of 1 …

Wanted: Mathemagician to work with extremely large values of 1 …

About this odd recent job posting (math fix for neo-Darwinism), Doug Axe at Biologic Institute offers “Oxford seeks mathemagician” (May 5th, 2011):

Scientists employ different rhetorical strategies to accomplish different things. That shouldn’t be surprising, perhaps, but for some it is. The reason is that while the public is very familiar with rhetorical shiftiness in some occupations, they tend to see only one side of science—the confident, assertive, authoritative, we-know-what-we’re-talking-about side. Science-speak often comes across with a hint of arrogance, but since science itself depends on the goodwill of the public for its very existence, it usually corrects itself on those occasions when it oversteps its bounds.There are a few peculiar exceptions though, …

But the question has been raised: To what extent is the public so inured to Darwn nonsense that the big ta-da! – we proved there is enough time for Darwinism by building in just a few leetle fixes! – will just roll through the pop science press, dutifully followed by more evolutionary agony aunts, Darwinian brand marketers, and “cre-uh-theist” circuses?

  • Delicious
  • Facebook
  • Reddit
  • StumbleUpon
  • Twitter
  • RSS Feed

33 Responses to Wanted: Mathemagician to work with extremely large values of 1 …

  1. this is a gem from the job description:

    ‘Grand theories in physics are usually expressed in mathematics. Newton’s mechanics and Einstein’s theory of special relativity are essentially equations. Words are needed only to interpret the terms. Darwin’s theory of evolution by natural selection has obstinately remained in words since 1859. …’

    Perhaps Larry Moran & PZ Myers, would care to apply????

  2. Dr. Larry Krauss, Arizona State University likes to wear a shirt that says: “2+2=5 for extremely large values of 2.”

    Its funny, but not true**. Truth (or facts) don’t get in the way for people like Krauss who, as a member of a leading secular humanist organization, has an agenda beyond scientific activities.

    ** If the value of 2 was 2.5 or more, then it would have an integer value of 3.

    This canard also shows up in the book 1984.
    http://en.wikipedia.org/wiki/2_%2B_2_%3D_5

  3. I for one don’t much care for all the publicity this is getting in the ID community.

    The man behind these openings has done a serious amount of work, so why don’t we address that instead of taking pot-shots at the job description?

    http://users.ox.ac.uk/~grafen/cv/

  4. Well Mung, Thanks a million! So when Grafen says,,,

    ‘natural selection has obstinately remained in words since 1859′,,,

    ,,,He knows exactly what he is talking about? i.e. with Fisher, Haldane, and Wright, especially thrown in??? What this is saying to me is that the top Darwinists who have worked on the math, throughout the last century, have always known that there were severe deficiencies in their mathematical models, yet the unwashed masses were always told otherwise.

  5. And then Grafen went out and did something about it and is continuing to do so, which is why he’s looking for assistance. And we’re acting like the situation now is just as it was in 1859.

    The unwashed masses wouldn’t recognize a second-stage derivative of the Quastler dynamic equilibrium if it burbled up in their morning coffee.

  6. Mung: ‘And then Grafen went out and did something about it and is continuing to do so, which is why he’s looking for assistance. And we’re acting like the situation now is just as it was in 1859.

    The unwashed masses wouldn’t recognize a second-stage derivative of the Quastler dynamic equilibrium if it burbled up in their morning coffee.’

    HMMM Mung??? something tells me that a ‘prescriptive information generating equation’ is never going to be in the works, no matter how many ‘second-stage derivatives of the Quastler dynamic equilibrium’ you throw at it;

    THE GOD OF THE MATHEMATICIANS – DAVID P. GOLDMAN – August 2010
    Excerpt: we cannot construct an ontology that makes God dispensable. Secularists can dismiss this as a mere exercise within predefined rules of the game of mathematical logic, but that is sour grapes, for it was the secular side that hoped to substitute logic for God in the first place. Gödel’s critique of the continuum hypothesis has the same implication as his incompleteness theorems: Mathematics never will create the sort of closed system that sorts reality into neat boxes.
    http://www.faqs.org/periodical.....27241.html

    The Law of Physicodynamic Insufficiency – Dr David L. Abel – November 2010
    Excerpt: “If decision-node programming selections are made randomly or by law rather than with purposeful intent, no non-trivial (sophisticated) function will spontaneously arise.”,,, After ten years of continual republication of the null hypothesis with appeals for falsification, no falsification has been provided. The time has come to extend this null hypothesis into a formal scientific prediction: “No non trivial algorithmic/computational utility will ever arise from chance and/or necessity alone.”
    http://www.scitopics.com/The_L.....iency.html

    The main problem, for the secular model of neo-Darwinian evolution to overcome, is that no one has ever seen purely material processes generate functional ‘prescriptive’ information.

    The Capabilities of Chaos and Complexity: David L. Abel – Null Hypothesis For Information Generation – 2009
    To focus the scientific community’s attention on its own tendencies toward overzealous metaphysical imagination bordering on “wish-fulfillment,” we propose the following readily falsifiable null hypothesis, and invite rigorous experimental attempts to falsify it: “Physicodynamics cannot spontaneously traverse The Cybernetic Cut: physicodynamics alone cannot organize itself into formally functional systems requiring algorithmic optimization, computational halting, and circuit integration.” A single exception of non trivial, unaided spontaneous optimization of formal function by truly natural process would falsify this null hypothesis.
    http://www.mdpi.com/1422-0067/10/1/247/pdf
    Can We Falsify Any Of The Following Null Hypothesis (For Information Generation)
    1) Mathematical Logic
    2) Algorithmic Optimization
    3) Cybernetic Programming
    4) Computational Halting
    5) Integrated Circuits
    6) Organization (e.g. homeostatic optimization far from equilibrium)
    7) Material Symbol Systems (e.g. genetics)
    8 ) Any Goal Oriented bona fide system
    9) Language
    10) Formal function of any kind
    11) Utilitarian work
    http://mdpi.com/1422-0067/10/1/247/ag

    Dr. Don Johnson explains the difference between Shannon Information and Prescriptive Information, as well as explaining ‘the cybernetic cut’, in this following Podcast:

    Programming of Life – Dr. Donald Johnson interviewed by Casey Luskin – audio podcast
    http://www.idthefuture.com/201....._life.html

  7. Dr. Don Johnson explains the difference between Shannon Information and Prescriptive Information

    I knew we were in trouble when I read:

    On this episode of ID the Future, Casey Luskin interviews Dr. Donald E. Johnson about his new book, Programming of Life, which compares the workings of biology to a computer.

    That said, I don’t think he explained the difference between Shannon Information and Prescriptive Information.

    At 4:25 into the podcast Casey asks:

    What is Shannon Information and why is it not a sufficient measure of biological information?

    Dr. Johnson never answers the second half of the question, but here’s what he says about what Shannon Information is:

    Shannon Information is basically probabilistic uncertainty. For example, a completely random string has the very highest Shannon Information because it’s very improbable due to the random sequence, but it has zero functional information.

    The more improbable a thing is, the more Shannon Information it contains?

    The more random a sequence is, the more Shannon Information it contains?

    And the higher the Shannon Information the lower the functional information?

    And you think this guy is a friend of ID?

    At 5:55 into the podcast Casey asks:

    What is Prescriptive Information and how does it help us measure biological information?

    Again, Dr. Johnson never address the second half of the question, but here’s what he says about Prescriptive Information:

    Presecriptive information is not only functional but it is a recipe, a sequence of instructions.

    So supposedly Shannon information cannot contain instructions and prescriptive information isn’t a measure of information at all but is rather a particular kind of information.

    Tell me how any of this helps the argument for Intelligent Design?

  8. I’m going to just love using Schneider as a source, lol.

    I’m Confused: How Could Information Equal Entropy?

    Information Is Not Uncertainty

  9. Prescriptive Information (PI) – Abel
    Excerpt: Semantic (meaningful) information has two subsets: Descriptive and Prescriptive. Prescriptive Information (PI) instructs or directly produces nontrivial formal function (Abel, 2009a). Merely describing a computer chip does not prescribe or produce that chip. Thus mere description needs to be dichotomized from prescription. Computationally halting cybernetic programs and linguistic instructions are examples of Prescriptive Information. “Prescriptive Information (PI) either tells us what choices to make, or it is a recordation of wise choices already made.” (Abel, 2009a)

    Not even Descriptive semantic information is achievable by inanimate physicodynamics (Pattee, 1972, 1995, 2001). Measuring initial conditions in any experiment and plugging those measurements appropriately into equations (e.g., physical “laws”) is formal, not physical. Cybernetic programming choices and mathematical manipulations are also formal.

    The specific term PI originated out of a need to qualify the kind of information being addressed in peer-reviewed scientific literature. Shannon measured only probabilistic combinatorial uncertainty. Uncertainty is not information. It is widely recognized that even reduced uncertainty (“R,” poorly termed “mutual entropy”) fails to adequately describe and measure intuitive information. Intuitive information entails syntax, semantics and pragmatics. Syntax deals with symbol sequence, various symbol associations, and related arbitrary rules of grouping. Semantics deals with the meanings represented within any symbol system. Pragmatics addresses the formal function of messages conveyed using that symbol system.

    Most research into the nature of intuitive and semantic information has unfortunately centered primarily around description. But the formal function instructed or actually produced by PI is far more important than mere description. PI prescribes and controls physical interactions so as to create and engineer sophisticated formal function. The latter is the subject of both cybernetics and systems theory.

    Semiosis is the sending and receiving of meaningful messages. PI is often contained within meaningful messages. The sender must choose with intent from among real options at bona fide decision nodes. Letters, for example, must be deliberately selected from an alphabet at each locus in a string of symbols in order to spell words and sentences. In a sense, even description is a subset of prescription. All descriptions must themselves be prescribed.

    Both sender and receiver must be privy to and abide by the same set of arbitrary rules for the message to be understood at its destination. By “arbitrary” we do not mean “random.” Arbitrary means, “Could have been other” despite occurring in a physicodynamically determined world. No random number generator has ever been observed to generate a meaningful message or a non trivial computational program. No physical law can determine each selection either. If selections were dictated by law, all selections would be the same. This would make recording PI impossible. Uncertainty (measurable in bits) is necessary at bone fide decision nodes. But bits of uncertainty cannot measure purposeful choices, the essence of PI. The regularities described by physical laws oppose uncertainty and information potential. Law-like behaviors manifest a probability approaching 1.0, while maximum binary uncertainty approaches a probability of 0.5 in the opposite direction. Maximum quaternary uncertainty (with four independent and equiprobable possibilities) approaches a probability of 0.25. Neither physicodynamic law (necessity) nor random coursing through mere “bifurcation points” can explain the formal semiosis and pragmatic controls of PI.

    Formal choices of mind can be recorded into physicality through the purposeful selection of unique physical objects called “tokens.” A different formal meaning and function is arbitrarily assigned to each token. Formal rules, not laws, govern the combinations and collective meaning of multiple tokens in a Material Symbol system (MSS) (Rocha, 1997 6069). The recordation of successive purposeful choices into a MSS allows formal PI to be instantiated into a physical matrix.
    http://www.scitopics.com/Presc.....on_PI.html

    So Mung prescriptive information entails the ability to ‘see’ into the future so as to ‘record wise choices’, something that consciousness is capable of doing, yet unconscious material processes are not capable of.

  10. As far as entropy and information are concerned, there is a deep correlation:

    Moleular Biophysics – Setlow-Pollard
    Ed. Addison Wesley pp 66-74

    Information theory. Relation between information and entropy.
    http://www.astroscu.unam.mx/~a.....ecular.htm

    Bertalanffy (1968) called the relation between irreversible thermodynamics and information theory one of the most fundamental unsolved problems in biology.”
    Charles J. Smith – Biosystems, Vol.1, p259.

    “Gain in entropy always means loss of information, and nothing more.”
    Gilbert Newton Lewis

    Information and entropy – top-down or bottom-up development in living systems? A.C. McINTOSH
    Excerpt: It is proposed in conclusion that it is the non-material information (transcendent to the matter and energy) that is actually itself constraining the local thermodynamics to be in ordered disequilibrium and with specified raised free energy levels necessary for the molecular and cellular machinery to operate.
    http://journals.witpress.com/journals.asp?iid=47

    “Is there a real connection between entropy in physics and the entropy of information? ….The equations of information theory and the second law are the same, suggesting that the idea of entropy is something fundamental…” Siegfried, Dallas Morning News, 5/14/90, [Quotes Robert W. Lucky, Ex. Director of Research, AT&T, Bell Laboratories & John A. Wheeler, of Princeton & Univ. of TX, Austin]

    etc.. etc..

  11. Mung:

    Re: The more improbable a thing is, the more Shannon Information it contains?

    Indeed, as Ik = log(1/pk) = – log pk

    In short, you are measuring the information of a symbol from a set by its observed relative rarity in messages. You are surprised a lot — learn a lot — when a rare symbol shows up, much less so when a common symbol does. So, the metric weights that and says a highly unusual symbol carries more info than a common one. X is more informational than E in English. Notice how many E’s are in this para, and just two X’s, including the one just put in.

    The Shannon metric is average info per symbol, on the Ik = – log pk definition suggested by Hartley.

    So, weighting the average by using the relative frequency of occurrence interpretation of pk:

    H = – [SUM on i] pi * log pi

    H is of course Shannon Info, the average info per symbol under a given circumstance that allows us to do that weighted sum. Wiki gives a summary:

    In information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits. Equivalently, the Shannon entropy is a measure of the average information content one is missing when one does not know the value of the random variable.

    Info removes uncertainty about the state of the source, and may surprise us in so doing. (This is perhaps puzzling but makes sense: think about how knowing bin Laden’s courier’s nickname, on waterboarding KSM, in 2003, led on to being able to trace and track down where he was. But even so, it took a raid then DNA testing to be effectively certain. There sure has been a lot of surprise that he was living under the noses of the Pakistani main military academy.)

    It turns out that H is mathematically the same “shape” as one of the stat thermodynamic equations for Entropy [Gibbs as opposed to Boltzmann, who developed the other eqn s = k log w], and as Jaynes pointed out about a decade later than Shannon’s initial info theory paper there is a connexion. In the past few years, this link has been confirmed enough to begin to be a lot less controversial. [Cf my discussion in my always linked here, and note the onward discussion in Harry S Robertson.]

    Schneider’s anecdote about Shannon and how the name entropy came to be used is true as far as it goes. But in fact it has been shown that there is a physical connexion. The same wiki article goes on to summarise:

    connections can be made between thermodynamic and informational entropy, although it took many years in the development of the theories of statistical mechanics and information theory to make the relationship fully apparent. In fact, in the view of Jaynes (1957), thermodynamics should be seen as an application of Shannon’s information theory: the thermodynamic entropy is interpreted as being an estimate of the amount of further Shannon information needed to define the detailed microscopic state of the system, that remains uncommunicated by a description solely in terms of the macroscopic variables of classical thermodynamics. For example, adding heat to a system increases its thermodynamic entropy because it increases the number of possible microscopic states that it could be in, thus making any complete state description longer . . .

    This ties back in to Maxwell’s infamous demon, who uses info about the speed of molecules approaching a gate to separate “hotter” from “Colder” molecules and do work.

    But, going back: it turns out that the H-eqn peaks when the pi’s are equal, i.e for a flat random distribution of symbols. The only way to get that is by the equivalent of a fair die. Real symbol sets that carry real messages will in general not be equiprobable.

    Information may be functional in many ways, and we can objectively identify that by observation. Prescriptive info is basically program code. It is one class of functional info, and of course it is relevant to computers and to the way DNA and mRNA work in the cell. Start with methionine, add AA’s per their codons, terminate. Fold per van Der Waals forces and H-bonding etc [find a minimum energy config, quire a complex thing to predict given the complexity of the AA chain], then agglomerate and activate if required; put to use.

    That sort of functionality is pretty exacting and easily leads to islands of function, e.g. consider on fold domains.

    So, we can see that he basic info metric has some unusual properties and connexions. However, a metric of relative frequency of occurrence, does not in itself say anything about the meaningfulness and function of the actual info carried by the symbols, and a flat random distribution of the same set of symbols would go to the peak of the H-metric.

    It is meaningful, appropriate and useful to conceive of FUNCTIONAL information, beyond the Shannon metric, and to then use the observation that such info comes from zones of interest in the field of all possible configs of symbols, to define functionally specific information. Metrics can then be constructed on the degree of isolation of the relevant zones, T, i.e. the islands of function.

    We may then profitably consider he question, are we plausibly able to arrive at observed events E from such zones T by blind chance and necessity?

    Once the T’s are sufficiently isolated, the only known, observed source is intelligence. That is, we are looking at an inference to best, empirically anchored explanation, backed up by the infinite monkeys type analysis.

    As was explored in previous threads, e.g. here, once we are able to define a metric say

    Chi_500 = Ip – 500, in bits beyond a threshold

    We may then profitably discuss the inference to design on observed information. [500 bits specifies 48 orders of magnitude more configs than there are Planck time quantum states for the atoms in our solar system since the usual time suggested for the big bang.]

    GEM of TKI

  12. OOPS: Previous thread.

  13. kairosfocus,

    Sweet. Another thread about information!

    I find it most difficult to believe that you accept Johnson’s definition of Shannon Information as being accurate.

    Shannon Information is basically probabilistic uncertainty. For example, a completely random string has the very highest Shannon Information because it’s very improbable due to the random sequence, but it has zero functional information.

    Not only is it not accurate, it’s misleading.

    As Schneider points out, correctly I believe, uncertainty is not information. And to say that it is, is absurd.

    As you yourself write:

    Info removes uncertainty about the state of the source, and may surprise us in so doing.

    I’m going to say you are correct about having maximal information if the symbol you receive is the least likely symbol of the set. But that doesn’t help Johnson.

    You are surprised a lot — learn a lot — when a rare symbol shows up, much less so when a common symbol does.

    Johnson was talking about the information content of a random string, not how much information could be gleaned from a single symbol.

    It’s amusing to me how Johnson claims that a randomly generated string has “the very highest” Shannon Information while Schneider claims a randomly generated string has zero Shannon Information!

    IMO, they are both wrong, but hey, what do I know, lol!

    Assume two symbols, 0 and 1. Assume a random generator which sends either a 0 or a 1 with equal probability. Which symbol, upon receipt, provides more Shannon Information?

    Say you receive a sequence of those two symbols, a string. Does that string of zeros and ones have “the very highest” Shannon Information or zero Shannon Information?

    Shannon’s equations still apply, right?

    By the way, I’ve ordered some books on information theory and I’m currently reading Information: A History, a Theory, a Flood which I am truly enjoying. So I am trying to educate myself. Your assistance is certainly appreciated.

    You are surprised a lot — learn a lot — when a rare symbol shows up, much less so when a common symbol does.

    How do I know whether a symbol is more or less common, and how rare it is?

    Do we require prior knowledge of the symbol set before receipt?

  14. BA77:

    The specific term PI originated out of a need to qualify the kind of information being addressed in peer-reviewed scientific literature. Shannon measured only probabilistic combinatorial uncertainty. Uncertainty is not information.

    That’s pretty much what I’ve been saying, right?

  15. Mung, I think kairos is the man you should talk to about the intricacies since he knows this stuff inside and out, better than anyone else who posts regularly here, and can guide you through the details you seem to continually get hung up on.

  16. kairosfocus:

    H is of course Shannon Info

    I respectfully disagree.

    According to your own source, Wikipedia, H is Shannon entropy.

    I also refer you to section 6 of Shannon’s paper:

    6. CHOICE, UNCERTAINTY AND ENTROPY

    See also Section 12.

  17. Mung, I think kairos is the man you should talk to … and can guide you through the details you seem to continually get hung up on.

    If you’ll take note, that is exactly what I am doing. :)

    You see, I am actually interested in learning. There is much that I do not know, or understand, and when I can I try to address that, especially if it relates to my faith or the subject of Intelligent Design.

    I find that more fulfilling than cutting and pasting and posting links.

    I don’t particularly dislike you BA77, at times you seem downright intelligent, and you’re not self-aggrandizing. But I really have to wonder how much you actually think for yourself. You post so many links, am I really supposed to believe that you have read them all understand them and have vetted them for mistakes?

    I’d much rather see you actually discussing matters rather than pasting quotes or posting links and asserting that you are right because your sources say you are right.

    If I can’t make an argument myself, from my own knowledge and capacities, I don’t like to pretend like I know what I’m talking about.

    At one point in this thread you referred me to a podcast starring Dr. Johnson, who’s material as you know from my posts elsewhere I don’t particularly respect.

    Then, when I critique him, you post yet more material and links that you cut and paste, and those links actually refute Johnson!

    Here’s yet another example:

    “Gain in entropy always means loss of information, and nothing more.”
    – Gilbert Newton Lewis

    Do you even know what that means?

    Do you understand the implications of that statement for what Dr. Johnson claimed?

    …a completely random string has the very highest Shannon Information because it’s very improbable due to the random sequence…

    Can you explain the connection between randomness, improbability, entropy, and information?

    If not, I invite you to come along for the ride! Cause I can’t either, lol!

    But here is what I do recognize.

    The concept of information is central to current arguments for Intelligent Design.

    As such, I’d better understand the concept of information and how it use used and how it is measured if I ever hope to be an effective advocate for ID.

    Join me.

    God bless.

  18. Mung:

    You are treading into a complex set of issues, at the intersection of several highly technical disciplines. It will take time and repeated reflection — as I can testify on experience — to pull together a coherent picture that answers to the perspectives and points raised by the diverse cluster of fields.

    There are many overlapping sets of terminologies, and different perspectives, all of which are credible in their own right, and all of which have to be carefully balanced and correlated. This will take time, and it requires the basic respect to realise that he different people involved here all earned their stripes the hard way. Men like Gibbs, Boltzmann, Maxwell, Brillouin, Szilard, Jaynes, Robertson, Shannon, Hartley, Hoyle and so on may be hard to follow, but they thought through what they were addressing very carefully, to answer to serious questions; and with a significant measure of success. And, some tragedy, too: Boltzmann was a suicide.

    So, we must beware of the bull in the china shop approach. Which, unfortunately, is exactly the approach of Schneider.

    The relevant issues are partly addressed in information theory, in statistical thermodynamics, in computer science, in robotics and more.

    Schneider, already [in the last thread], was found to be trying to “correct” the longstanding common definition of information suggested by Hartley and used by Dembski [as well as being a commonplace of telecomms thought], substituting a rarer SYNONYM.

    This led to a key confusion on his part.

    He is right that H is often described in ways that are confusing to the uninitiated, but in fact until quite recently it was not in widespread use outside of highly technical fields, where it was assumed that you would not be there unless you had been through the relevant courses in information systems and related mathematics and physics, etc. For instance, and as I excerpt in my always linked note, having been introduced to H as the weighted average discussed above, this is how I first encountered H as “entropy” at the hands of British telecomms author, F R Connor, in his wonderful little book, Signals, from his classic telecomms series:

    “it [H] is often referred to as the entropy of the source.” [p.81

    H is indeed AKA Shannon Information, AKA the average information per symbol in a string emitted according to the statistics of the symbols being used (as is shown in my always linked, here):

    H = - [SUM on i] pi * log pi

    It turns out that this is also the same in form as the Gibbs formulation of the entropy of the cluster of microstates consistent with a given macrostate.

    Accordingly, it is ALSO called Shannon entropy.

    And, in one of those odd twists of terminology, it is called as well Shannon Uncertainty. This last, as highlighted above, because in effect once you receive the info stream, H is a measure of the uncertainty on the source that has been removed. (Recall, H is an expected or average value, based on a weighted average of the code elements. Shannon, after all was interested in how much info would flow down a channel how fast, and ended up with the understanding that a theoretically ideal code that would squeeze out all redundancy and would have the statistical properties of a random string of bits, would have the maximum rate of communication. But that is not practical, as we need redundancy to correct for noise and corruption. Hence error correcting codes, starting with the conceptually simplest one of triple-sending the message and taking a vote on the most likely value. That works in cases where the statistical properties are such that a double error in a given place is utterly unlikely to be observed.]

    So, when Johnson used the term, uncertainty, he is correct regarding how the term is actually widely used; and whether or not it is a sort of backways around usage, we need to recognise that if you want to speak with the people who use this term, you have to recognise and respect how they use it, why. Schneider is wrong to — AGAIN! (bull in the china shop) — suggest a “correction” as though those who use the common term are incorrect and ill-informed.

    His “corrections” without acknowledgement of the common usage are doing little more than triggering confusion and contempt.

    Which, given the context, may unfortunately well reflect a rhetorical agenda. What it looks like to me is that he found a 1961 source that said the sort of things he was inclined to hear, and then looked no further to hear what others were saying and why.

    Johnson and Durston et al, are correct in their terminology [in terms of how the language is actually commonly used], but we need to recognise that the terminology is a bit backways around. But if you understand that until there is a communication event the specific, fine grained state of the source is uncertain, you can understand the relationships.

    For instance, in the normal case, we only have access tot he macro-state of a thermodynamic system, and must work up models of its internal behaviour at micro-level consistent with that macro-state. At a very crude level, look at the marbles in a box model of Maxwell Boltzmann statistics here in my always linked appendix 1.

    Jaynes applied these ideas to statistical thermodynamics thusly, as cited by Harry Robsertson:

    “. . . The entropy of a thermodynamic system is a measure of the degree of ignorance [= uncertainty] of a person whose sole knowledge about its microstate [i.e. specific "snapshot" config of the microscopic masses and lumps of energy in a body or system of interest] consists of the values of the macroscopic quantities [i.e. the lab scale observable properties like temperature, pressure, etc, which are consistent with a great many specific distributions of mass and energy] . . . which define its [macroscopic] thermodynamic state. This is a perfectly ‘objective’ quantity . . . it is a function of [those variables] and does not depend on anybody’s personality. There is no reason why it cannot be measured in the laboratory.”

    Notice, the significance of the judging semiotic agent acting as observer in the theory.

    Also, how in a materialistically oriented world of science, this was plainly a subject of controversy. Now, fast fading as further work has confirmed the heart of Jaynes’ view.

    This view was actually anticipated by famous chemist G N Lewis in 1930: “Gain in entropy always means loss of information, and nothing more.” Similarly, the well respected Physicist Brillouin had spoken of information in the context of molecular scale systems as negentropy, on a similar understanding.

    So, M, when you ask:

    Can you explain the connection between randomness, improbability, entropy, and information? . . .

    . . . please take time to work through the explanations that have already been offered.

    In summary:

    1: Randomness is a particular condition that in a symbol string [assuming a flat random distribution] would actually lead to the peak value of the H-metric.

    2: What is happening is that even though actual communication in a language is not random, the distribution of symbols [think ASCII characters or alphanumerical symbols] has statistical properties that are amenable to analysis as a random variable that may take a range of values with diverse probabilities si –> pi.

    3: So, we can use a random variable, statistical model to construct a metric of information, whereby we mathematically match the property that the more unlikely [more improbable] message element is more informative, more reducing of our uncertainty of the state of the source. In typical English X is far rarer than E, and so is much more informative. [An algebra textbook is going to be a very different story. BTW, I suspect printers knew the statistical distribution of letters a centuries ago, as they had an economic incentive to know this, to lay out pages of type. Cryptanalysts also knew of that distribution, and it is the reason why simple substitution codes have long been abandoned. Once you reduce a code to a form where you can spot symbol frequencies, you can decode very rapidly thereafter. So, in this relative frequency sense of probability, randomness and probability are important components of information measures. Remember a flat random distribution is such that here is an equal chance of any one of the set of symbols appearing in any position in a string [like the way letters are clicked together in the old child's toy or in words in text on a page]. And in turn strings can be elaborated to represent any complex data structure.]

    4: Add in the requirement of additivity where I1 and I2 together should give I = I1 + I2 [save of course for pathological cases like q + u in English, i.e the symbols there are NOT statistically independent but are highly correlated to be in that order in immediate succession], and the Hartley-suggested metric drops out:

    Ik = log (1/pk) = – log pk

    5: It turns out that we can look across the set of symbols, si, and assess their relative frequency in typical messages using a given code. From this we can deduce an average information value per symbol metric:

    H = – [SUM on i] pi * log pi

    6: This metric is in the same FORM, mathematically, as the Gibbs formulation of statistical thermodynamics entropy, which was a puzzle back in 1948 or so.

    7: But, the term entropy was therefore attached, being a synonym for Shannon information[-carrying capacity], and Shannon uncertainty [as removed by the emission, reception and understanding of a message by the source].

    8: It turns out that the odd result was in fact insight-ful. There is indeed a link to entropy, as summed up by Jaynes and others since. Wiki has a useful snippet summary:

    the thermodynamic entropy is interpreted as being an estimate of the amount of further Shannon information needed to define the detailed microscopic state of the system, that remains uncommunicated by a description solely in terms of the macroscopic variables of classical thermodynamics

    9: And, again, elsewhere, Wiki aptly sums up:

    in the discrete case using base two logarithms, the reduced Gibbs entropy is equal to the minimum number of yes/no questions that need to be answered in order to fully specify the microstate, given that we know the macrostate.

    10: The strange tale of Maxwell’s Demon may help bring these together. Wiki again:

    Maxwell’s demon is a thought experiment created by the Scottish physicist James Clerk Maxwell to “show that the Second Law of Thermodynamics has only a statistical certainty.” The thought experiment demonstrates Maxwell’s point by describing how to violate the Second Law. In the experiment, an imaginary container is divided into two parts by an insulated wall, with a door that can be opened and closed by what came to be called “Maxwell’s Demon”. The hypothetical demon opens the door to allow only the “hot” molecules of gas to flow through to a favored side of the chamber, causing that side to gradually heat up while the other side cools down.

    11: The trick is that the demon has to have some means of distinguishing fast from slow molecules, i.e he must have MICROSTATE level knowledge of the system. His uncertainty has to be reduced [and that in real time with molecules rushing about at 100 or so m/s on a scale of a few cm, i.e we are needing to detect, process and react in microseconds], to do the separation. He needs information, and an algorithm and an effecting mechanism that can do this, within the required timeframe, and probably in parallel too. When all of this is factored in, voila, there is no free lunch.

    12: Robertson is apt, in his Statistical thermophysics:

    . . . the standard assertion that molecular chaos exists is nothing more than a poorly disguised admission of ignorance [= uncertainty!], or lack of detailed information about the dynamic state of a system . . . . If I am able to perceive order, I may be able to use it to extract work from the system, but if I am unaware of internal correlations, I cannot use them for macroscopic dynamical purposes. On this basis, I shall distinguish heat from work, and thermal energy from other forms [pp.vii - viii]

    13: So, the concepts are all bound up together in a weird sort of Hungarian Goulash fashion.

    14: And they begin to make sense as you look back across the field of how such a strange cluster of elements got together to form a whole.

    ________________

    GEM of TKI

  19. F/N: M:

    How do I know whether a symbol is more or less common, and how rare it is?

    Do we require prior knowledge of the symbol set before receipt?

    PREZACTLY!

    The analysis presumes that you are in a position to assess the relative frequency of symbols [or message elements more generally], on a statistical basis.

    And, that was known when say Morse constructed his code [ever wondered why some letters take up less than others in the code]. The properties of English text have been known for a long time, so the knowledge of the general behaviour of the symbols involved was in fact “there.” And, remember, Shannon had in mind things like teletype machines as communicating systems.

    That is why we see the quantitative definition of information as summarised by Taub and Schilling:

    Let us consider a communication system in which the allowable messages are m1, m2, . . ., with probabilities of occurrence p1, p2, . . . . Of course p1 + p2 + . . . = 1. Let the transmitter select message mk of probability pk; let us further assume that the receiver has correctly identified the message [My nb: i.e. the a posteriori probability in my online discussion is 1]. Then we shall say, by way of definition of the term information, that the system has communicated an amount of information Ik given by

    Ik = (def) log2 1/pk (13.2-1)

    [Princs of Comm Systems, 2nd edn, Taub and Schilling (McGraw Hill, 1986), p. 512, Sect. 13.2 ]

    In the case of DNA and AA’s in proteins, we can study the patterns based on tabulations of findings from those who have analysed the observed patterns in nature, and then can generate a priori symbol probabilities. Just as, a long time ago, it was known how much of typical English text was an E or an X etc.

    GEM of TKI

  20. PS: Of course, one does not need to know the relative frequencies of E, X S, Z etc to read English text, but if you are going to statistically analyse such text to see how much info is in each symbol, you do have to know that as part of the background for analysis.

  21. PPS: You will also note that I am ever careful to define H as average info per symbol, the primary and most clear understanding. There are other terms, which are used in overlapping ways and sometimes in different ways too, by different people [and I suspect by the same people at different times or points in argument]. H is ALSO used as entropy and is ALSO used in the sense of uncertainty that I pointed out and excerpted on. If you looked at the Wiki article’s URL you would see that an article entitled “Entropy (information theory)” was accessed on a URL for “Shannon information.” In short, there is no “gotcha” there. Instead, there are varying usages, some more, some less preferred; some more some less acceptable or “correct.” To clear it all up, I simply prefer to use the most descriptive term, which you should have noticed: average info per symbol, on the Hartley neg log probability metric. This correlates to the reduction in uncertainty as to the state of a source on receiving a message from it. And, it correlates to the issue of the gap in info between the macro- and the micro-state of a body. Where the number of possible microstates is larger, the entropy is higher. I find it frustrating to deal with those who insist on a given usage, as the “correct” one, instead of recognising that here are several dialects out there. The point of language is to communicate, not to play bull in the china shop one upmanship games. And, when one has triumphalistically shattered the fine china in the shop, do you think it is a triumph, save of desstruction and imposing loss?

  22. PPPS: To get a picture of how that variety of dialects came to be, notice this from Shannon in Section 6 of his famous paper:

    Quantities of the form H= – [SUM on i] pi log pi (the constant K merely amounts to a choice of a unit of measure)
    play a central role in information theory as measures of information, choice and uncertainty. The form of H will be recognized as that of entropy as defined in certain formulations of statistical mechanics8 where pi is the probability of a system being in cell i of its phase space. [i.e being in a specific microstate distribution of mass and energy, position and momentum etc] H is then, for example, the H in Boltzmann’s famous H theorem. We shall call H = – [SUM on i] pi log pi the entropy of the set of probabilities p1, . . . , pn. If x is a chance variable we will write H(x) for its entropy; thus x is not an argument of a function but a label for a number, to differentiate it from H(y) say, the entropy of the chance variable y . . . .

    The quantity H has a number of interesting properties which further substantiate it as a reasonable measure of choice or information [Hence, the term Shannon Info! ] . . . .

    2. For a given n, H is a maximum and equal to logn when all the pi are equal (i.e., 1/n ). [the flat random situation] This is also
    intuitively the most uncertain situation . . . .

    3. Suppose there are two events, x and y, in question with m possibilities for the first and n for the second. Let p(i; j) be the probability of the joint occurrence of i for the first and j for the second. The entropy of the
    joint event is
    H(x;y) = – [SUM on i; j]p(i; j) log p(i; j) . . . .

    The uncertainty of a joint event is less than or equal to the sum of the individual uncertainties.

    In short, we can see in this section of the paper, Shannon himself using the terms as closely related and nearly synonymous. Yes, fine distinctions can be and are made, but they will be made differently by different people in different contexts, and a lot of people are going to use the terms as near-synonyms. So, we have to live with that.

  23. You are treading into a complex set of issues, at the intersection of several highly technical disciplines. It will take time and repeated reflection — as I can testify on experience — to pull together a coherent picture that answers to the perspectives and points raised by the diverse cluster of fields.

    You make an excellent point and I agree completely.

    I’ve made my way through complex issues before after much reading, study and reflection (and a little revelation!), so I am hoping to do the same here.

    As I mentioned earlier, I’ve ordered some texts on information theory. I’ve also ordered a few books on entropy, because I think that might actually be the best place to start.

    For example:
    Discover Entropy and the Second Law of Thermodynamics: A Playful Way of Discovering a Law of Nature

    He is right that H is often described in ways that are confusing to the uninitiated, but in fact until quite recently it was not in widespread use outside of highly technical fields, where it was assumed that you would not be there unless you had been through the relevant courses in information systems and related mathematics and physics, etc.

    Again I agree with you. But all of a sudden “information” has become the term du jour of Intelligent Design, and there are increasing references to it here and elsewhere with a decided lack of clarity.

    I just want to be able to talk about it and know what I’m talking about and know how to explain it to others. I’m the first to admit that I’m starting out on the ground floor.

    ME:

    Can you explain the connection between randomness, improbability, entropy, and information?

    If not, I invite you to come along for the ride! Cause I can’t either, lol!

    BA77 and I don’t have to argue. We can try to work together to the mutual understanding of both and anyone else who may be reading.

    Do we require prior knowledge of the symbol set before receipt?

    So we can’t just try to reconstruct or guess at the symbol set based on what we are receiving and use that?

    That’s sort of another way of saying, for there to be Shannon Information does there need to be a sender and a receiver?

    I’m trying to understand when and where it makes sense to speak of Shannon Information and when and where it makes no sense to speak of Shannon Information.

    What are the minimal requirements for there to be a presence of Shannon Information?

  24. BTW, I suspect printers knew the statistical distribution of letters a centuries ago, as they had an economic incentive to know this, to lay out pages of type.

    And, that was known when say Morse constructed his code [ever wondered why some letters take up less than others in the code]. The properties of English text have been known for a long time, so the knowledge of the general behaviour of the symbols involved was in fact “there.”

    It’s interesting you should mention these so close together, because I think I recently read (probably in Gleick’s book) that this is exactly what happened.

    One of Morse’s assistants went to the local printer and looked at how many of each character he had.

  25. Mung,
    I have to state that I really respect your dedication to learning the core concepts that seem to be vital. I’ve noticed one of the main complaint raised against ID proponents is that they don’t actually grasp the ideas they spout off. Of course the same can be said for a good portion of Neo-Darwinist cheerleaders, but I feel those in the ID community have a duty to hold the science and math to a higher standard. Thats why I like you. I’m certain you’re not part of some ND sleeper-cell. Lol. I believe you understand the importance of “burning of the chaff” in what those in ID present as evidence, and that those against ID are going to put EVERYTHING through the wringer and pick apart any inconsistencies or ill-informed arguments.
    Those in the ID realm can not insulate themselves in towers built of vague ideas. They can not be like young Christians who have been raised entirely on teen Bibles and Max Lucado, they go to college and are at a loss when they come face to face with Sartre, Fromm and Derrida.
    So I hope to piggy back a little on your learning process. Lol. If your on the ground floor I’m in the basement. I know you have an engineers background…i have nothing of the sort. I am utterly outside of the scientific community. But I know the NCSE states they want the average person to have more science education, well I plan on doing just that. And they are not going to like it. ;p

  26. Mung:

    Now, that is a snippet of history for you!

    (I was guessing, on economics, as in printers would empirically discover the patterns, and it would be worth their while to recognise it. Like the Erlang rule that — I think now, USED to hold — that traffic basically never got about 16% of a phone network. So switching resources were figured accordingly.)

    Beyond that, the whole context of Shannon’s work was in the context of his system model, i.e. the system is a system, and the coding part of a digital comms system is a part of it.

    You could try to reconstruct a code pattern form what you receive, but that is possibly distorted by the second probability issue: a posteriori probabilities influenced by noise. What you receive and what was sent are not necessarily the same, and every inference to message not noise is in effect an inference to design.

    H, as noted, has several closely related meanings.

    Information and communication systems are a dominant technology today, indeed. The connexion to statistical thermodynamics and to the fairly fierce statistics involved, make me a bit doubtful on just how well it will be commonly understood.

    But then, when I first studied electronics, it was a pretty fierce experience. I remember the shock at discovering my first really easy learning electronics textbooks.

    I remember when I was teaching my first classes and decided I was going to chuck the common base first amp and go straight to the common emitter amp, and that I would not bother about saturation currents. Then I decided to drastically simplify the h-parameter model to hie and beta.

    I remember it felt like cheating, but it worked, and later on students told me that they found that they really understood electronics for the first time. (We could then add back in complexity one step at a time, all the way up to the hybrid pi or a simplification to get the high frequency rolloff effects. I remember being touched when a student came to me years later and said how he still kept notes from his first t/comms course I taught him.)

    So, comms theory can doubtless be simplified, but it ain’t going to be easy to do. Especially if there is bleed-through from stat thermodynamics to think about.

    My own thought is that he best focus is on functionally specific complex organisation and associated information, in the context of the islands of function concept. The log reduced Chi metric or my own X-metric will allow a fairly simple break-apart of what is going on.

    I cannot shake the impression that Schneider misled himself by dismissing the quantitative definition:

    Ik = log(1/pk) = – log pk

    His attempts to make overmuch of the average info per symbol metric, H, run into its peculiarities.

    Meaningfulness and functionality of info based on string configs and/or things reducible to strings, is key. That has to do with messages, languages and the use of instructions and data structures.

    We are going to have to get over the point that he semiotic agent judging observer is very much a part of the information and communication process. For that matter, science is done by much the same, and we need to recognise that. Subjectivity is not the opposite of objectivity. The observing, warranting, knowing subject and the possibility of agreement on warrant make it that subjects can achieve objective knowledge.

    GEM of TKI

  27. MR:

    I hear your point, but have a concern on the level of math required.

    You will see that there has been an obvious struggle over a simple log reduction of the Dembski metric.

    What happens when we start tossing around random variables, statistical weights of macrostates, and the like, probability analyses, etc?

    My experience is that the level that — if it is allowed to be discussed [notice, MG targetted CSI not FSCI!!!] — that is sound enough and objective enough while being intuitive enough is FSCI.

    G

  28. KF: oh I dont plan on jumping in on topics that I only nominally understand. I feel that approach is utterly detrimental to ID as a whole. I plan on starting from the very bottom foundation and doing my educating puzzle piece by puzzle piece. If you have any suggestions of what my start should be I welcome them with open arms.

  29. PS: In my experience, the objectors to ID are generally NOT picking apart serious arguments. They are using red herrings led away to strawman caricatures of arguments and soaking them in denigratory ad hominems then igniting. And, they go particularly intense when their impositions of question-begging materialistic a prioris on science are exposed, or when the linked issue that they are projecting into a deep and unobserved past and so cannot properly censor out any reasonable candidate causal factor is raised. Hardly less contentious is when they find themselves challenged on principles of right reason, and when the inherent amorality of evolutionary materialism is highlighted, they often explode.

  30. MR: Why not look here and the onward linked?

  31. KF @ 29,
    Oh fully agreed. More than half the time I see arguments based on nit-picking of completely minor matters of grammar or subtle semantics.
    @30,
    Thank you. Bookmarked. I shall consider it a class and take it seriously as such.

  32. Okay, let me know your progress.

  33. F/N: On the 2 + 2 = 5 claim

    My note is that you round once to give number of significant figures, at the end usually.

    So, 2.4 + 2.3 = 4.7 ~ 5 (+/- 0.5)

    So, the idea that if you round 2.3 and 2.4 down, then add and then round up to 5, is wrong. If you were to round down at the early stage, you would get:

    2 (+/- 0.5) + 2 (+/- 0.5) = 4 (+/- 1)

    Much less precise.

    So the 2 + 2 = 5 tee shirt is a joke on a mistaken way to round that throws away precision in answers.

    G

    PS: For fairly simple calcs, safest thing is to work to one or two sig figs beyond what you need in the answer then round. HP calculators, notoriously, used to work to 15 sig figs internally. Citing to 3 or 4 sig figs (typical for most engineering work) would be very safe, and it was very nice for those of us coming from slide rules that would maybe give you 3 sig figs, maybe 4 if you pushed hard.

Leave a Reply