Home » Intelligent Design » How much information is needed to construct a human?

How much information is needed to construct a human?

A commenter in another thread prompted this. I didn’t approve the comment because it was so impoverished but thought the discussion warranted a thread of its own. The commenter basically said that 30,000 proteins w/regulatory regions is enough – a mere fraction of the DNA in a human egg – implying that plenty of DNA can be functionless junk.

While that number of regulated proteins might possibly be enough to define myriad cell types and tissue types there is an awful lot more required. The list of things I can think of (which is likely not complete) includes:

1) cell types
2) tissue types
3) organs
5) body plan
6) autonomic control system
7) instinctive behaviors

Since complex system design is what I did for a living I usually think in terms analogous to human designed systems and the information required in their specification. Let’s take the space shuttle for a comparison. Proteins would be equivalent to simple basic raw materials – plastics, metals, ceramics, and the like. Cell types would be equivalent nuts, bolts, fabrics, tiles, and other formed, milled, & molded parts. Tissue types would be equivalent to basic functional assemblages tubes, pipes, wires, nozzles, panels, tanks, transisters, batteries, and things of that nature. Organs would be even larger assemblages of like computers, control surfaces, engines, pressure locks, hatches, windows, atmosphere controls, and etcetera. Body plan would be the precise arrangement of larger assemblies into a specific functional whole. Autonomic controls would be mostly electro-mechanical regulators for gas flows, fuel flows, hydraulics, electrics, and other simple automated functions. Instinctive behaviors would be analagous to flight/mission command & control software.

Anyone that’s done any complex system design knows the materials are just a small part of the specification. Even basic assemblies are just more or less standardized parts and don’t require comparatively much specification. The real complexity lies in the precise arrangement and how they all work together. Anyone familiar with the hundreds of volumes of specifications in a complex system like the space shuttle has a feel for how much information it takes. If every last base pair of DNA in the human genome was utilized I still don’t think it’s nearly enough. It isn’t anywhere near enough for a space shuttle and a human is far more complex than a space shuttle. I suspect far more of the cell structure, often called epigenetic information, is required for the complete specification. That makes trillions more atoms potentially usable for information storage and it’s all heritable as each daughter cell is more or less a faithful copy of its parent cell.

  • Delicious
  • Facebook
  • Reddit
  • StumbleUpon
  • Twitter
  • RSS Feed

30 Responses to How much information is needed to construct a human?

  1. Dave,

    I too have an engineering background and as a result of that discipline’s way of thinking I often wonder why we let biologists ‘interpret’ biological machinery; it seems to be more in the domain of physicists and engineers that biologists.

    Please correct me if I’m wrong. Also tell me what is it about biology training that helps in detecting design.

    I have an evolutionist PhD friend who once told me he is not very skilled when it comes to designing things. I was a little surprised but concluded that design ability was not really a requirement for his work.

    Cheers.

  2. You know the famous Clemenceau’s statement: “war is too important to be left to the generals.”. A good analogy could be: “biology is too important to be left to the biologists.”

  3. I have long concluded that the total amount of information required to specify the human body must be much more than the protein coding portion of the genome, about 2% of the total 3 billion, or about 60 million nucleotides. To be conservative, doubling this would get to about 120 million nucleotides and 40 million specified amino acid “letters”. This would be equivalent to about 30 volumes of 400 pages each with 40 lines per page and 80 characters per line. Multiple frame translation and other data compression techniques may increase this, but not by orders of magnitude.

    It’s hard to believe that for years biologists have subscribed to the “central dogma” that the entire information in the genome consisted of the protein coding portion. At most this amount of information might be enough to specify just the internal developmental programs and structures for the many different cell types – the organelles and other intracellular parts like the mitochondria, ribosomes, Golgi apparatus, cytoskeleton, etc. This would include specifications for all the different proteins. Just this intracellular organized system has been likened to a vast automated factory complex.

    A data storage capacity equivalent to most of the rest of the genome must be used to encode the information to build and operate the body and brain. There are about 10 trillion cells total, working in innumerable specialized interconnected organs and tissues. The brain is supposed to be the most complicated single object known, with 100 billion neurons each of which is connected to several thousand others in a very ordered structure.

    The rest of the human genome (2.88 billion nucleotides) would be equivalent to another 720 volumes of the same size – one room in a library. I don’t know of any quantitative estimation of the information equivalent of building and maintaining this incredible system of systems of endless subsystems, but intuition says that it probably needs more than even the rest of the genome.

  4. 4
    Granville Sewell

    I’m surprised this hasn’t been brought up before. I received an e-mail from an acquaintance, who has a PhD in biology, saying he calculated that the human genome could contain about 6.25 Mb of information, “about the same as a couple of digital photos” (uncompressed, admittedly, but still…) and he was wondering if that could be right. I don’t know why he thought I would know, but I have always found it impossible to believe that DNA could really contain enough information to reconstruct a human or any complex animal. I think this will be confirmed someday.

    On a completely different topic, Dave, you (I believe it was you) had a post many months ago where you suggested that if evolutionary simulators wanted to better simulate reality, they should subject everything to random errors, their entire program, the compiler and OS and the hardware (well, that was the idea anyway). I thought that was one of the most significant points ever made at UD (though as I recall no one else seemed to). The fact that we assume the only random errors are in the DNA, is what makes us forget what a fantastically absurd violation of the second law the whole process of evolution is. Can you provide me with a link to that post (if you remember it); or better yet, reproduce it for everyone to reconsider?

  5. Dave,

    I believe in Sean Carroll’s book he says that the information to encode a human would take about 10,000 pages of print instructions single lined. These represent the switches that would have to be turned off and on in order to lay out the human body during gestation.

    In Behe’s new book, he spends most of a chapter on Carroll ideas on evo devo and has a chart of one aspect of a sea urchin (p 196). It looks like a logic circuit for some electronic device. It is incredibly how any one could think that such complexity just developed by chance.

    It also had to develop before the Cambrian Explosion, 520 mya because similar instructions are in different phyla.

  6. I don’t think anyone here understands the magnitude of what 20,000 (or 30,000 – whatever) different proteins affords by way of possibilities.

    With even a simple, “on-off” system, 20,000 proteins provides for 10^6000 different states. I think that’s plenty more than the cells in a human body, and even the many different states each and every cell might be expected to assume.

    Factor in lots of gray with these 20,000 proteins (low vs high concentration, localization, etc.) and the possibilities explode (as if 10^6000 isn’t already an astronomical number).

    As an aside, something I see in design-friendly engineers is almost an aversion to thinking in terms of chemistry. Once one grasps the role of chemistry in biology, and leaves behind the tinker-toy way of thinking that comes with engineering, then things become clearer (and much, much more interesting and challenging).

    IMO, at least.

    I’m glad to elaborate, because the disconnect that comes from leaving chemistry at the door of this room becomes a serious impediment for ID proponents. But I’m reluctant to write more until I see this comment posted, and the questions that it elicits.

  7. Granville,

    I think the human genome contains about 700 megabytes of information. Here’s the calculation based on information theory:

    In the following equation, B represents number of Bits and P represents number of possible different codes given the number of distinct characters and the length of the sequence of characters in the code.

    2^B=P

    So to count information in the binary number 11011 P has a value of 2^5. Therefore, B=5 and we say the code contains 5 bits of information.

    To count the information in the human genome, there are 4 characters (A,T,G,C) and the length of the character sequence is 3.1 billion.

    So starting with
    2^B=P

    we replace P with 4^(3.1 billion)

    2^B = 4^(3.1 billion)

    which can be rewritten as
    2^B = 2^(6.2 billion)

    Therefore, B=6.2 billion bits

    The human genome contains 6.2 billion bits of information. Divide by 8 to get bytes, by 1024 to get Kb and by 1024 again to get Mb.

    6,200,000,000/8/1024/1024 is about 700.

    Therefore, there are 700 megabytes of information in the human genome.

    I’m indebted to the work of William Dembski for understanding this calculation.

    Neal Roys
    Math Teacher
    Stevenson High School
    Lincolnshire, IL

  8. 8
    Granville Sewell

    OK, 6Mb or 700Mb, that still seems WAY to small.

  9. art

    Please explain how you think the human suckling instinct is encoded in DNA keeping in mind how many nerves and muscles are all in coordinated action during this simple process. Since humans are rather short on instinct I really like using birds for an example. Anyone who has raised a bird from an egg or featherless nestling without exposure to other birds to learn from should realize how complex their instinctive behaviors are – everything from pecking their way out of the egg to eating to nest building to song and flight. None of them are learned and all emerge characteristic to the particular species.

    I don’t think you an appreciation for the engineering in living systems. Chemistry one method of implementation. There is also electrical and mechanical. All three are intricately entwined and interdependent in organic machinery as well as the space shuttle. Physics is the ultimate science that wraps them all.

  10. Granville

    I think it was a comment not an article I’d written talking about needing to introduce errors into the hardware and O/S in any simulation of evolution. Gil Dodgen wrote more than I did in that regard.

    It was probably a typo in the 6.25 megabyte number you got from the acquaintance. That’s suspiciously like the actual number of 6.25 gigabits.

    nroys

    This is a very simple calculation. DNA is encoded in base-4 (4 possible bases ACTG at each locus corresponds to digits 0123). Converting from one number base to another is basic arithmetic learned in pre-algebra which IIRC was 7th grade for me but I was in an accelerated math program so it might be 8th grade for most students. Sadly I suspect a large fraction of adults would draw a total blank if you asked them to define number bases. For people in computer science thinking in bases 2,4,8, and 16 (powers of 2) becomes second nature. Base-10 is awkward. Proteins (coding genes) are encoded in base-64 (20 amino acids plus start/stop codes and much redundancy) in triplets of base-4 numerals called codons. This paradigm is not entirely accurate though as frameshifts are often used to encode additional functional proteins using the same sequence and sometimes reading a sequence in reverse (frameshifted or not) yields yet another different biologically active protein.

    Similar things are done in electronic engineering with regard to multiple methods of data encoding. The nucleic acid sequence can be likened to what’s called a carrier wave. In radio the carrier is the frequency of any particular station. Encoding information on the carrier is called modulation. Two principle ways, which can be used simultaneously with the same carrier, are frequency modulation and amplitude modulation. In broadcast television the black&white video is encoded using amplitude modulation while the sound uses frequency modulation. When color was added to black&white a new modulation method had to be invented that didn’t interfere with the other modulation methods. The new invention was phase modulation. All these are utilized to cram more information onto the same carrier. I suspect there are modulation methods on the DNA carrier that are yet to be discovered.

    One recently discovered modulation method helps to explain what’s called codon bias. There are 20 different amino acids but many of them are coded for by multiple codons. Certain codons for the same acid tend to be preferred. As it turns out this has to do in part with the ribosome and the speed with which it processes different codons for the same acid. Think of the ribosome like a grease gun. As the stream of grease emerges it folds. By varying the rate or speed at which the grease comes out it folds differently. The codons are not all translated at the same speed thus even though the same acid is translated from as many as 6 different codons they each have the potential of producing a different fold due to different processing speeds.

    This turned out to have very important implications for genetic engineering. In order to produce certain human proteins artificially we insert the gene for that protein along with promoter regions into bacteria. The bacteria then express that protein which can be harvested and used in medicine. There’s a catch though. The ribosomes in human cells and bacterial cells don’t have the same codon bias. Thus the human gene often produces an insoluble, unusable product when expressed in a bacteria. The human gene sequence has to be modified so that it reflects the different codon bias of bacteria.

  11. Granville,

    “OK, 6Mb or 700Mb, that still seems WAY to small.”

    Well let’s see. The Bible contains about 3 Mb of information. 700 Mb is the amount of information storage capacity of a typical CD. Seems like quite a “bit.”

  12. 12
    Granville Sewell

    nroys

    Yes, if you think all the information required to construct a human being could fit on a CD; I certainly don’t.

    It’s not a critical point in the ID debate, certainly if it were 1 Kb of information I would never believe such a program could be constructed by unintelligent forces. But I am convinced humans (and other animals) are much more complex than anything we have ever constructed, and I don’t believe you could reconstruct a human from information stored on a CD, and I believe history will prove me right. Until then, there’s not much point arguing it, I don’t see any way to prove it one way or the other right now.

  13. Granville:

    On a completely different topic, Dave, you (I believe it was you) had a post many months ago where you suggested that if evolutionary simulators wanted to better simulate reality, they should subject everything to random errors, their entire program, the compiler and OS and the hardware (well, that was the idea anyway). I thought that was one of the most significant points ever made at UD (though as I recall no one else seemed to). The fact that we assume the only random errors are in the DNA, is what makes us forget what a fantastically absurd violation of the second law the whole process of evolution is. Can you provide me with a link to that post (if you remember it); or better yet, reproduce it for everyone to reconsider?

    I believe you are referring to the brief essay below that I posted back in September. If moderators considering it worth reposting for further discussion, that would be fine.

    Gil

    http://www.uncommondescent.com.....n-biology/

  14. “700mb”

    The above discussion, of course, assumes that all the information is present in the DNA, compressed or otherwise. We must keep in mind, a child inherits not only chromosomes from both parents, but an entire cellular factory from her mother.

    How much additional information is needed to build that factory, with assosiated controls mechanisms and components?

  15. assosiated controls = associated control*

  16. Ah, at last! This is a really interesting problem which should be, in my opinion, at the very center of the ID debate.

    First of all, the numbers. Nroys’ calculations are, I believe, perfectly correct. The whole quantity of digital information in the human genome is approximately 700 Mbytes, little more than a CD.
    But that applies to the whole genome. The protein coding part, the part responsible for the approximately 20000 protein genes in our genome, is only 1 – 1.5% of that, and so about 7 – 10,5 Mbytes. The rest is the famous non coding DNA, or junk if you prefer.
    Well, I have said many times that I really believe in the importance of non coding DNA. But that does not mean that we understand its role, at least at present. While the 10 Mb of protein coding DNA work in a known way, according to a known code (the so called genetic code), that is certainly not true of non coding DNA. We know it is transcribed, we know it is conserved, we know it certainly has regulatory functions. But, for most of it (especially pseudogenes, transposons, and various repetitive sequences) it is really difficult, at present, to understand if and how it works.

    But even supposing that 700 Mbytes of information are available, and that they are higly efficient and very compressed information, is that enough? I don’t think so. Absolutely not.

    I would like to add some aspects of the complex information which should find some explanation in the human genetic program:

    1) Development of the embryo from a single cell to a completely formed individual, a multicellular community of about 10^13 – 10^14 individual cells. The embrional development is still a complete enigma, in spite of all the evo-devo interest and research. It implies not only a definite plan of orderly differentiation of the cells, but also a specific control of their tridimensional arrangent, according to a very complex space plan. It implies also specific control of very constrained times for development, and of many intermediate states of the organism which bear really little resemblance to the final form of the human body. What controls all that, we have no idea.

    2) A specific information for each kind of differentiated cell, which allows a different transcriptome to be realized in each different cell type, from a genome which is always the same. In other words, each cell type has to “choose” which of the 20000 protein genes it must implement, in what quantity, and with what sequence. There are probably thousands of different cell types, and even more different cell states, and each one must work in a very specific way, with a very specific transcriptome. If you think that the genome is always the same, not only the protein coding part, but also the regulatory part, in each cell (with the only exception of the immunitary system), you can understand that we are facing a very big problem, both quantitatively and qualitatively. In other words, where is all that information, and, whereever it is, how can it express itself so variously in each specific cell, if the information content is the same in all cells?

    3) The inter-cell regulatory network. Thousands of cytokines, that is specific molecules, usually proteins, connect all the cells of the organism through a very aspecific medium, that is the blood-extracellular fluid system. Although the connecting system may seem very gross (cytokines have to travel throughout the body, at great distances from where they are secreted, carried by the blood, and reach more or less equally all available tissues), the results are extremely precise and powerful. So, 10^14 cells are kept in connection mainly by a biochemical network whose origin is completely dispersed in the body. What controls all that?

    4) The CNS (Cebtral Nervous System) has been cited. That’s perfectly right. CNS is the supreme example of organized complexity in the body. the number of neurons and of their ordered connections is really beyond any conception. Does that come about by chance? Each time? Or is there a plan for neurons and their connections somewhere? In our 700 Mbytes? How? Where? Think, just to stay simple, that each motor neuron in the cortex has to be exactly connected to the right second motor neuron in the spine, which has to be connected to the right muscular fiber. And nobody knows how the higher connections in the brain can implement functions like memory, or the calculations necessary to realize the complex movements of the body in a tridimensional space, just to cite a couple of problems amongst many.

    5) Just a final comment about the statement made by Art2 (post #6):

    “I don’t think anyone here understands the magnitude of what 20,000 (or 30,000 – whatever) different proteins affords by way of possibilities.With even a simple, “on-off” system, 20,000 proteins provides for 10^6000 different states. I think that’s plenty more than the cells in a human body, and even the many different states each and every cell might be expected to assume.”

    Well, I can’t speak for all the others, but I think that I perfectly understand “the magnitude of what 20,000 (or 30,000 – whatever) different proteins affords by way of possibilities”. And that’s exactly the problem.
    I simply ask, to Art2 or to anybody else: how do you think that each cells knows exactly which of the so many, almost infinite, possibilities should be implemented in the specific case? How is the “simple on-off system” supposed to work? With the same logic, we could say: “I don’t think anyone here understands the magnitude of what 20,000 (or 30,000 – whatever) different words affords by way of possibilities.With even a simple, “on-off” system, practically infinite literary works of extreme beauty and complexity can easily be written!”

    Everybody should realize that the only explanation provided by science for the different implementation of so many transcriptomes from the same genome in different cells is that a “lucky” confluence of external influences and feedbacks allows all that. I cannot comment on that, no more than I can comment on total folly. For these people, the problem of information and complexity simply does not exist. They happily believe that information grows from sheer magic, and they probably expect that a new tridimensional rendering software may come about in their computer by chance during the night, if only the adaptive landscape of the bits in their RAM allows…

  17. Another enigma about the development of life: A single cell divides into two identical cells, which divide into four, etc. How do the cells, since they are identical copies of the original cell, know when and how to differentiate? Where does this information come from?

  18. Acknowledging that I’m out of my element in this discussion, I would like to offer a point of view anyway.

    700MB of data can be a pitiful amount, or it can be vastly sufficient, depending on the nature and application of that data.

    If we consider digital images for a moment; a 32 bit 640×480 image requires around 1.2MB of storage. Without compression, you may well fit the snapshots from your Hawaii vacation onto one CD. (Of course, lossy formats like JPG allow terrific compression, as long as we’re not bothered by some minor imprecision.)

    However if our images are procedurally generated, say with fractals and color maps, or with vectors, the storage requirements drop through the floor. We can generate our 1.2 MB bitmap with a few KB of code, and so only store a few KB worth of program code and data, representative of our bidimensional image, which contains much more actual data (although I know of no algorithms capable of generating images of the family dog catching a frisbee.)

    Anyone who has played a modern video game has observed gigabytes per second of bitmapped pixels flying across their screen. The vast majority of this 60 frames per second of 32 bit images is programatically generated (with the main exception being compressed 32 bit textures for terrain and models). The data for one video game can theoretically fit onto one 80 minute, 700 MB CD, and can generate hours upon hours of simulated 3 dimensional interaction projected bidimensionally, with trillions of bits, onto a computer screen.

    I expect the program that runs in living organisms is capable of tremendous compression, contains algorithmic genius, and is capable of extraordinary feats with only megabytes of data.

    I’ll speak above my pay grade and note that DNA code appears multidimensional in the sense that the same coding sequences are being discovered to serve more than one purpose depending on how they’re read. I remember seeing an article noting a histone binding site code superimposed upon protein coding regions of the DNA. I’m sure there will be other examples of this type of macro code compression found in DNA.

    Personally I’m more impressed with lower actual data storage requirements for the specification of living organisms than with vast requirements. If we find that much of the complex specification for our bodies is procedural in nature, and thus not requiring vast actual storage of every detail, it will only serve to advance the ID inference.

  19. Appolos

    First of all I think you’re missing the point mentioned by gpuccio that coding genes and regulatory regions occupy only a small fraction of human DNA.

    Second most of our anatomy definitely isn’t fractal in nature even if some of it could be. Some things that could conceivably be fractal are parts of the vascular system, aveoli branching in the lung, placement of hair follicles, fingerprints, retinal patterns, and neuron pathways.

    Consider just the skeletomuscular system. There’s nothing fractal about it. The only break you get in compression is bilateral symmetry. Think about the complexity of just this system alone. The precise shape and position of every bone, every bit of cartilage, ligament, tendon, and muscle and the precise attachment points of all of them. There’s some flexibility in scale but little to none in shape and position. I doubt you could compress just that alone into the storage space occupied by genes and regulatory regions and as far as anatomy goes the skeletomuscular system is one of the less complex systems requiring architectural specification in some form.

    P.S. I spent several years of my career developing vector graphic engines and a few more years coding video games. I’m well aware of what procedural encoding methods buy you. It ain’t that much except in a limited case for linear shapes and becomes downright ineffecient as a means of defining non-linear curves. As far as I can tell human anatomy is almost all non-linear.

  20. Dave, I didn’t mean to suggest that fractal algorithms could account for human design, or any part of it. I was only using fractals (or vectors) as an example of how apparently large data requirements (640x480x32 bits) could be accounted for procedurally, using only a fraction of the storage. I’m regretfully offering no suggestions as to what specific algorithms might be incorporated for generating living organisms, only suggesting the possibility that the data requirements for specifying human design would be greatly reduced if procedural methods were used in our construction, as opposed to raw storage of positional data and properties for every cell or organelle in our bodies, along with raw storage of every interconnecting element.

    The connectivity of neurons might be a prime candidate for procedural logic. Although doubtfully fractal in nature, the data requirements for specifying the interconnection of every neuron in the human brain would be greatly reduced if programmable logic was used during the construction, as opposed to having to follow a map (compressed or otherwise) for every neuron and every interconnecting branch.

    I also wanted to note that DNA design seems to allow it to specify more than one thing at once. I recently read this article on Evolution News that mentioned dual coding regions of DNA.

    If the same coding regions in DNA can be used for more than one protein, then we can imagine that data limitations begin to look a little less restrictive, and we can at least expect it possible to find other things that DNA “codes” for besides proteins.

    That is not to say that all the information required to construct and maintain a human being is present in the DNA. I expect other sources of information may well be discovered, or discovered to be necessary. I just wanted to point out that there is a whole lot that can be done with 700MB (or even 7-10MB) of data with clever enough methods, some of which may very well be employed in our design; and that apparent raw data requirements (like the gigabits* of data streamed to a computer monitor during a video game session) can be accounted for procedurally when the proper algorithms are employed.

    If I’m mucking up the thread, I apologize.

  21. suggesting the possibility that the data requirements for specifying human design would be greatly reduced if procedural methods were used in our construction

    I don’t recall suggesting that raster based images of bone structure are utilized as guides so I’m not really sure what you’re arguing with. Perhaps you can give me some idea of how the shapes of bones can be guided by any method whatsoever. The most efficient means I can think of is that certain proteins fold into the shape of various bits of anatomy and these are somehow amplified into large structures of the same shape. The bottom line is there’s an awful lot of 3 dimensional architectural information that must be stored in some manner. Things don’t just magically assume precise predefined shapes and arrangements without some means of specification.

    The analogy with video games is lame on several counts. First of all not even the most complex video game comes even close to approaching the complexity of a human body. Second the program & data storage requirements for the most complex games vastly exceed the handful of megabytes in coding genes and regulatory regions. Third you’ve totally forgotten that there’s a vast specification required for the hardware that executes the procedural code in the game software. You can’t come anywhere near getting the hardware design of an X-box or similar device into the space taken up by functional DNA in the human genome.

  22. Someone once noted: “Yet another parallel with human-engineered systems is discovered in the living cell…”

    I didn’t think I was drawing a direct comparison between video games and the human body. I don’t think there is one, so if the requirement for any sort of analogue is direct one-for-one comparison, complexity or storage wise, we’re all out of luck.

    I used the video game analogy because the data input is vastly smaller than the data output. The data required to store a model and animation, for instance, is far less than the data that is output displaying it on screen.

    I think it is apparent that human intellectual capacity for rivaling the engineering feats of living organisms leaves something to be desired.

    I don’t think I overlooked the specification for the hardware at all. The body itself is the hardware, is it not? And although I can’t account for the source of the specification of this hardware, besides the proteins themselves, neither can anyone else, as far as I can tell. I never claimed to.

    I only attempted to suggest that as limited as the storage capacity for all DNA appears to be, “junk” or otherwise, it most likely has more to its coding properties than we can decipher at this point, and that it might make sense to consider human engineering parallels, such as algorithmic models, as candidates for how a great deal of specification might be bundled, at least in part, into a relatively small package.

  23. Apollos

    I used the video game analogy because the data input is vastly smaller than the data output.

    But it isn’t vastly smaller. It’s the same data presented differently. It’s like steam coming out of a teapot. It only looks like more coming out than went in. In fact the amount of water that comes out is exactly the same amount that went in. It’s the same with an egg cell that turns itself into a newborn baby. All the information in the baby was in the egg, it’s just presented differently. The question is whether coding DNA, or even every last bit of DNA, is able to store all that information. We get a feel for how much information it is when it’s expanded out into a whole human being and we start reviewing the architectural complexity in all the aspects I delineated in the article. As a systems engineer it looks quite impossible to me and I really can’t think of anyone better qualified to gauge the amount of design information required for complex systems than a systems engineer. Other portions of the egg cell, which dwarfs DNA in its potential for information storage, must be utilized to a great extent and there’s no earthly reason why it shouldn’t be since it’s all heritable information if we presume that each cell had a parent cell stretching back in time to some kind of organic big bang that produced the first cell.

  24. The question is whether coding DNA, or even every last bit of DNA, is able to store all that information.

    I don’t think it is, and I suppose I could have emphasized this more to avoid misunderstanding. I certainly understand that what we have with coding DNA is specifications for proteins, and that there’s vastly more to the picture than having the constituent parts.

    As Atom noted earlier:

    We must keep in mind, a child inherits not only chromosomes from both parents, but an entire cellular factory from her mother.

    I think there’s more to understanding the specification of this factory than what can be found in coding DNA, so no arguments there. And I’ll refrain from a soliloquy on why game data for models and animations is incomparable to the raster data output in the video card memory. =D

  25. While obviously far from complete, Sean Carroll’s book (Endless Forms) discusses in detail how various body parts are constructed during gestation, mainly using arthropods but discusses such things as zebra stripes. He is certainly not aware of what all the DNA does since he apparently still thinks much of it is junk.

    I was not aware how much of his book was based on research or is just hypotheses since I only read it once and the book gets very technical in places. Behe spends a chapter discussing Carroll’s ideas in his new book which I am sure is part of the reason, Carroll was chosen to review Behe’s book in Science.

    Carroll’s book attempts to do what people have been speculating on here, namely lay out the information in the genome that controls the shape and extent of body parts and the sequence of construction during gestation. Apparently there are proteins that mark various sections of the embryo for specific development and I am not sure if this is part of the origin of different cell types.

    Based on Behe’s discussion of this, they are still a long ways from understanding it all but they do know something. I am not sure how much of the genome is involved in the construction but Carroll talks about myriad’s of switches controlling the gestation of an organism. The proteins are the nuts and bolts but the switches are the blueprints that control how much and where and when the proteins get used.

    I have not read Carroll’s latest book so I do not know what is in it about this subject but I would guess there would be additional information. This book, based on the reviews, is more a defense of Darwinism rather than a book on evo devo.

  26. Wonderful discussion!

    A few comments:
    Apollos’ observations are very stimulating, and in general I can agree that probably the human genome has levels of informational efficiency that we cannot yet conceive. I also agree that this simple fact, if true, will be a further evidence, very strong evidence, for design. Indeed, such a great informational efficiency could be realized only by a great designer, let’s say a genius of design.
    But there are some facts of which we must be aware, and which suggest some limitations to the possibilities discussed by Apollos. First of all, it is possible that the information in DNA be, at some level, compressed and/or procedurally super-efficient. But, if we look at the part of the genome that we understand, that is the protein coding genes, we see that the information is not compressed at all, indeed it is redundant. Three nucleotides to code for one of twenty aminoacids is redundant: 64 possible values to code for 20 (or something more, considering the punctuation codons). Besides, not only no compression of the data is possible, but also errors in the code are potentially very dangerous, so that no possible compression with loss of information can be conceived.
    This last concept should be also valid for the hidden regulatory code. I can’t easily conceive that a regulatory code, which should be corresponding to procedural code in normal software, can be compressed beyond some limits, because that would mean loss of information, and the consequences would be critical. The same is true for fractal algorithms: can you imagine a fractal regulation of complex, critical functions? We are not speaking, here, of images, textures, sounds, or other formal data, but of specific functional algorithms which must guide and control a network of 10^14 different cells.
    In the same way, the neural network, fractal? A fractal network which can perceive, elaborate, calculate, decide? In ways we cannot yet conceive?

    Another topic: the famous “switches”. I haven’t read Carrol’s books, but I am not aware of any convincing explanation in evo-devo literature for the problems which have been cited in this thread: cell differentiation, embryo differentiation, and so on. Yes, I have read the fuss about cellular gradients of homeobox genes in lower animals, but nothing which starts to explain how all that happens.
    Just a few more problems which are waiting for an answer: what is staminality? why the zygote can give birth to a new organism, while the inner cell mass cells (embrional stem cells) cannot? What is lost, apparently irreversibly, in a few cell divisions? And what continues to be lost as we pass to fetal stem cells, funicolar stem cells, or adult stem cells? Or to any differentiated cell, which has lost even the self-renewal power which is characteristic of all stem cell compartments?
    And, finally, another question about protein genes. When the real number of protein genes in the human genome (about 20000) was finally revealed a few years ago, we have all read the embarassed explanations of most biologists (if you want an explanation for some paradox, just ask a biologist, you will not be disappointed…): it is not a real problem, the proteins in reality are much more numerous, we just have to discard the old dogma, one gene-one protein.
    Well, for once they had it right: the old dogma is wrong, one gene can certainly code for many different proteins. We know various possible mechanisms, starting from differential transcription to differential maturation of the mRNA by differential removal of the introns, to post-translational differential maturation of the protein. So, it is one gene-many proteins (although nobody really knows how many).
    But, again, the problem which is not stated, the question which is not asked, behind the arrogant explanations, is always the same, and is hidden in that small word: differential. And the right question is: how? How is the difference achieved? How is the difference intelligently achieved? How does the cell, or the gene, or the transcription factor, or the switch if you prefer, know when possibility A, or B, or C, or D has to be implemented? For each gene? For each gene harmoniously, in each different cell type, in each different cell state? One gene-many proteins, OK, but which protein? Will our lucky feedbacks be enough to guide us?

  27. hidden in that small word: differential. And the right question is: how? How is the difference achieved?

    Right. We have a process that reduces uncertainty and differentiates (chooses). Thus, there must be a corresponding amount of information, since according to Shannon, information is measured as the reduction of uncertainty.

    This is why our intuition tells us that more information must be necessary. The more options there are, the greater number of bits needed to choose any one specific option (differentiate).

  28. gpuccio,

    Thanks for considering and indulging my comments. I’m understanding your point that the coding regions of DNA, as opposed to exhibiting some sort of compression, exhibit redundancy, the apparent polar opposite of compressional efficiency.

    In my first comment I suggested DNA might exhibit efficiency through multi-use, specifically the “histone code.” I found an abstract to an article at sciencemag.org on the subject (I couldn’t access the entire article. No loss, I probably couldn’t have understood much of it anyway.) From the abstract I read:

    The combinatorial nature of histone amino-terminal modifications thus reveals a “histone code” that considerably extends the information potential of the genetic code.

    I also linked to this article at EvolutionNews.org that briefly discusses dual coding genes.

    Would it make sense to consider that the apparent redundancy of coding DNA serves to allow its multidimensional properties? (Could we have a histone code without this redundancy?) Might we be able to elucidate an estimate on how many other “codes” exist in DNA by analyzing the extent of the redundancy?

    On the subject of neurons and synapses, I understand that there are vast difficulties encountered when trying to imagine a procedural regulatory system to build such a network (I never meant to suggest that it was necessarily fractal in nature). However prohibitive this might seem, is it any less prohibitive to imagine a complete specification for this network, considering an estimate of 10^11 neurons and 10^14 synapses in humans? Is there enough informational capacity in a human cell to account for this magnitude of necessary information? My questions are not rhetorical, I’m genuinely curious.

    If DNA doesn’t contain the complete specification for the human creature, then we might expect to find it elsewhere in the cell. If the cell itself is inadequate to contain this entire specification, then we need to look outside the cell. I can’t imagine where this would take the debate about science and ID, nor where we might begin to look for the source of the information. However if we hold to the Privileged Planet hypothesis, then we can at least assume that these questions are answerable by persistent and intrepid inquiry, without consulting the supernatural.

  29. Apollos,

    Again you raise very pertinent points. You say:

    “On the subject of neurons and synapses, I understand that there are vast difficulties encountered when trying to imagine a procedural regulatory system to build such a network (I never meant to suggest that it was necessarily fractal in nature). However prohibitive this might seem, is it any less prohibitive to imagine a complete specification for this network, considering an estimate of 10^11 neurons and 10^14 synapses in humans? Is there enough informational capacity in a human cell to account for this magnitude of necessary information? My questions are not rhetorical, I’m genuinely curious.”

    Indeed, I am curious too! We have very good questions here, but unfortunately not yet the answers. That’s no big problem, anyway, because one of my deepest convictions is that questions are by far more important than answers. Still, we can try to speculate about possible lines of answer, while we wait for experimental data to show the way.

    First of all, I think that we should emphasize more the role of cytoplasm. We have the model of animal cloning with nuclear transfer, which seems to demonstrate that the cytoplasm of the ovum can reverse the differentiation of an adult cell, which is not even a stem cell, and make the genome again available for a zygote performance. That’s really a stunning fact, if we think that it is as far as I know the only example of complete “de-differentiation” of a genome. Obviously, nobody has the faintest idea of how that happens, but whatever the cause is, it should be related to the ovum cytoplasm.
    In general, the mystery of stem cell condition, of how cells retain or lose the property of self-renewal and differentiation, is in my opinion a central problem, and any new acquisition of knowledge in this field will be welcome. Whatever it is that has the full potentiality of directing the embryo development can probably tell us much also about evolution (indeed, I think that the evo-devo approach is probably correct, provided that it is not interpreted in a purely reductionist, no-design frame). And, whatever it is, it is apparently and gradually lost (reversibly or irreversibly?) in the course of cell differentiation.

    About other codes in DNA. I have heard of potential “second genetic codes(s)” for a long time, and the idea that the known genetic code which is valid for protein coding genes may not be the only “code” implemented in DNA has surfaced many times. The two articles you cite are two different recent approaches to the problem: the “histone code” would refer to post-translational modifications in DNA related proteins (histones), which in turn would regulate DNA access and trancription. The dual coding article instead refers to the possibility that single protein coding genes may be read differently by a transcription shift, and so code for two different proteins. Another similar approach I read recently (I don’t remember the reference) refers to recurring sequences throughout DNA (both coding and non coding) which would regulate DNA folding, and therefore transcription.
    In all these cases, interesting possibilities are open for new levels of information, and indeed all these examples are new heavy evidence against NDE and in favour of design. However, each of these examples refers to regulatory processes which are apparently equally represented in all cells, and so the problem stays open of how differentiation of the myriad of cell types and cell states happens.

    Finally, I think that we could also take in consideration the possibility that the hidden levels of information which could explain the fine regulation of biological processes could be deeper than the conventional biochemical plane, involving for instance phenomena that only byophisics could start to understand, or even the quantum level. In the case of DNA, it seems quite obvious that the regulation of transcription should be linked to the complex, and still poorly understood, aspects of DNA physical conformational variations. Maybe byophysics, and not biochemistry, is the key to understanding the informational mysteries of the cell.

  30. Fascinating post as usual, gpuccio. I finally found the histone code article I had originally read. Here are some tidbits:

    Researchers believe they have found a second code in DNA in addition to the genetic code. The genetic code specifies all the proteins that a cell makes. The second code, superimposed on the first, sets the placement of the nucleosomes, miniature protein spools around which the DNA is looped. The spools both protect and control access to the DNA itself.

    The nucleosomes frequently move around, letting the DNA float free when a gene has to be transcribed. Given this constant flux, Dr. Segal said he was surprised they could predict as many as half of the preferred nucleosome positions. But having broken the code, “We think that for the first time we have a real quantitative handle” on exploring how the nucleosomes and other proteins interact to control the DNA, he said.

    In the genetic code, sets of three DNA units specify various kinds of amino acid, the units of proteins. A curious feature of the code is that it is redundant, meaning that a given amino acid can be defined by any of several different triplets. Biologists have long speculated that the redundancy may have been designed so as to coexist with some other kind of code, and this, Dr. Segal said, could be the nucleosome code.

    I’m not sure if this has yet been confirmed or not.

    Stephen Meyer mentions histone proteins in The Origin of Life and the Death of Materialism:

    The proteins histone 3 and 4, for example, fold into very well-defined three-dimensional shapes with a precise distribution of positive charges around their exteriors. This shape and charge distribution enables them to form part of the spool-like “nucleosomes” that allow DNA to coil efficiently around itself and to store information. Indeed, the information storage density of DNA, thanks in part to nucleosome spooling, is several trillion times that of our most advance computer chips.

    This is part of the reason I was thinking the way I was. These sorts of articles catalyzes something in my imagination, and I am then capable of all sorts of fanciful extrapolations.

    You said:

    That’s no big problem, anyway, because one of my deepest convictions is that questions are by far more important than answers.

    I agree. I might say it this way: that we ask questions is far more important than any single answer we receive.

    I think that we should emphasize more the role of cytoplasm. We have the model of animal cloning with nuclear transfer, which seems to demonstrate that the cytoplasm of the ovum can reverse the differentiation of an adult cell, which is not even a stem cell, and make the genome again available for a zygote performance.

    Cytoplasm on wikipedia is introductorily described as, “…a gelatinous, semi-transparent fluid that ‘fills’ most cells.” It is described other places as, a watery fluid inside the cell or a jelly-like substance that fills the cell. It would not offend my sensibilities to find that this humble material is more than it seems.

    Maybe byophysics, and not biochemistry, is the key to understanding the informational mysteries of the cell.

    Perhaps we can put a couple of cells in a super collider, and see what pops out? :lol:

    Thanks again for your consideration. As important as the questions are, you have provided some answers, and I’ll further restrain my curiosity for the short-term.

Leave a Reply