Home » Biology, Science » Extra Characters to the Biological Code

Extra Characters to the Biological Code

Even if compressed I’ve always thought that the known informational content was not enough data. This makes sense because from an engineering point of view because there doesn’t seem to be enough data storage space in a few billion base pairs of nuclear DNA to specify all the detail in a mammal or similarly complex animal. It’s enough room to store a component library of the nuts and bolts required to build individual cells of different types but not the whole animal.

Obviously no one can argue against the assertion that we do not fully comprehend the biological code. Unlike with computer code we cannot simply determine at a glance which informational content defines what biological function. The title of geneticist Sermonti’s book is “Why a Fly is not a Horse”. In it he writes the only thing we know for certain about why a horse is a horse and not a fly is because its mother was a horse.

Thus, based on our current level of knowledge, any calculations that quantify biological informational content are going to be rough estimates. Personally, when measuring the functional sequence complexity of code encoding proteins I’ve long biased any calculations I do by rounding up to several extra informational bits. And this action seems justified by this recent news:

“Anyone who studied a little genetics in high school has heard of adenine, thymine, guanine and cytosine–the A, T, G and C that make up the DNA code. But those are not the whole story. The rise of epigenetics in the past decade has drawn attention to a fifth nucleotide, 5-methylcytosine (5-mC), that sometimes replaces cytosine in the famous DNA double helix to regulate which genes are expressed. And now there’s a sixth: 5-hydroxymethylcytosine.

In experiments to be published online April 16 by Science, researchers reveal an additional character in the mammalian DNA code, opening an entirely new front in epigenetic research.

The work, conducted in Nathaniel Heintz’s Laboratory of Molecular Biology at The Rockefeller University, suggests that a new layer of complexity exists between our basic genetic blueprints and the creatures that grow out of them. “This is another mechanism for regulation of gene expression and nuclear structure that no one has had any insight into,” says Heintz, who is also a Howard Hughes Medical Institute investigator. “The results are discrete and crystalline and clear; there is no uncertainty. I think this finding will electrify the field of epigenetics.”

Genes alone cannot explain the vast differences in complexity among worms, mice, monkeys and humans, all of which have roughly the same amount of genetic material. Scientists have found that these differences arise in part from the dynamic regulation of gene expression rather than the genes themselves. Epigenetics, a relatively young and very hot field in biology, is the study of nongenetic factors that manage this regulation.”

Go to Science Daily for more.

  • Delicious
  • Facebook
  • Reddit
  • StumbleUpon
  • Twitter
  • RSS Feed

81 Responses to Extra Characters to the Biological Code

  1. gpuccio,
    Sorry, don’t know what key I hit to cut me off. Thanks for your explanations and links. I read the Durston paper and will read the discussions, but that will take a while. Since we are here on the thread, I’d like to ask how the comparisons of completely different molecules with the same functions can help quantitatively. For example, how can you really compare a typewriter quantitatively with a computer-printer system? Even though they both print letters, they are so different that I don’t see how you narrow down the space of possible letter-printing machines.

    BTW, “Fit” is a wonderful term for functional bit.

  2. 62

    gpuccio,

    Thanks for the links. They work. I will read them and try to understand the issues.

    In the interim, I’ve been reading and re-reading the Durston et al. paper and I have to confess that I don’t follow the math. Any help you’d care to give in explaining the authors’ argument would be welcome.

    I would especially appreciate an explanation of how the measure they term FSC relates to FSCI as measured by you or your colleagues here. Table 1 lists FSC (in Fits) for 35 protein families, in values ranging from 46 Fits to 2,416 Fits. What are we to make of those numbers?

    (How do those numbers relate to the argument from design?)

    Please excuse any apparent delays in responses by me. I’ve been placed in moderation for a perceived insult to you in an earlier post.

    I didn’t intend my reference to ignorance as an insult, and I hope that you didn’t take it that way. We are all ignorant of most things. I enjoy this site as a way to reduce my ignorance.

  3. 63

    Joseph,

    Your conclusion is unwarranted.

    By the way, where did you study marine biology?

    Do you have any publications?

  4. womanatwell:

    In the model we are interested in, that is proteins, there should not be so much a problem like the one you suggest. Protein function is usually tied to a specific 3D structure and active site conformation. Usually, if two proteins have the same function in different species, it is very likely that their 3D structure is similar. So, we could define a function as connected to a 3D structure. If there are proteins with similar function, but completely different structure, they could be treated separately.

    Essentially, the functional information is necessary to get the correct folding “and” the correct active site. It is interesting that the relationship between primary structure and tertiary structure is very complex, and difficult to compute. For instance, myoglobins and related molecules have almost the same structure (and function) in very distant species, and yet the primary structure is sometimes very different. The Durston method has the great value of easily assigning an “average” value to each aminoacid in terms of H reduction, but it is obviously an approximation. Sometimes, an aminoacid can change without influencing the function only if many other coordinated changes occur at the same time.

    It is interesting that what we observe in protein families is conservation of function, and somtimes adaptation of it to different environments (what we coulld call “fine tuning” of the function), rather than “evolution” of the function. One of the great surprises of recent sequencing of genomes is that many proteins are very old, and are already present in “simple” organisms, where their function is difficult to understand (see for instance the paper “Sea Anemone Genome Reveals Ancestral Eumetazoan Gene Repertoire and Genomic Organization”). And, at the same time, practically all species reveal also species specific proteins, which have apparently no known homologues. So, we have two different and serious problems for which the current darwinian paradigm has really no convincing answer:

    1) How could so many different proteins with so many different functions and structures, “evolve” so efficiently as to be already present even in the first stages of life? Darwinian theory can only avoid that problem by searching refuge in the misty mythologies of OOL “theories”.

    2) How could so many species “evolve” specific new proteins, without a trace of homologues in similar species? Darwinists can only hope that, in time, such homologues will be found. I believe they will not.

  5. Adel:

    Durston’s FSC is a measure of FSCI, only the method of measurement is different.

    In the traditional approach, to measure FSCI in a protein, you have to know bith the search space (which is simple) and the target space (which is difficult), and then you have to calculate the ratio of the second to the first.

    In the Durston approach, you consider not a single protein, but a big family of proteins with the same function and similar structure. Then you align all the primary structures, and compute the H (uncertainty) for each position, according to how much that position varies in the family. So, if an aminoacid is alwasy the same in all the proteins, the H will be the least, and the reduction of uncertainty with respect to the ground state will be the highest. IOW, that position can only host that specific aminoacid, if the function has to be conserved, and contributes very much to the total functional information. On the contrary, if one position is occupied preferentially by 2 or 3 amonoacids, amd rarely by a few others, its informative power will be less. Finally, if a position can be occupied with the same frequency by any of the 20 aminoacids, its H will be as high as in the ground state, and therefore its contribution to H the reduction of uncertainty will be null. IOW, that position has no fucntional informative value. In the ground state (a random, non functional sequence of the same length) H will be the highest.

    So, the highest value of H per position is log 20 (in base 2), that is 4.32 bits. If a position bears always the same aminoacid, its H will be 0, and so the uncertainty reduction will be of 4.32 bits, ans so the Fit value for that position. If a position changes more, its Fit value will be lower. If a position changes randomly, its H will be 4.32, and its Fit value 0.

    The total Fit value for a molecule is obtained by the sum of the individual Fit values per position. The average Fit value per position is obtained by dividing for the number of positions.

    So, let’s see the example of Ribosomal S12 protein family, cited in the paper. The protein is 121 AAs long. So, the ground state (a random sequence of that length) has an H value of about 523 bits, corresponding to the size of the whole search space of 20^121 sequences.

    The Fit value of the protein family is 359 bits (not 379: there is an error in the text). That means that the H value of the functional state (the protein family) is about 164 bits. So, the reduction of H from the ground state is 523 – 164 = 359 bits. That’s the Fit value for the protien family.

    What does that mean?

    a) 523 bits is the H of the ground (random) state, which corresponds to the whole search space of 20^121 (about 10^157)

    b) The H of the protein family (the functional state) is much lower: only 164 bits, which corresponds to about 10^49- IOW, only 10^49 sequences of that length are expected to express that function. That is an “indirect” way of measuring the target space, and the true wonderful intuition in the method.

    c) The difference, 359 bits, expresses the functional information of the molecule in Fits. Please note that it is the same as the ratio of the target space (10^49) to the search space (10^157): 10^-72 (-log of that is 359 bits). So, the value in Fits expresses exactly the probability to find the target space in the search space by a random search. For this molecule, that probability according to the above method is of 1:10^72. As I have arbitrarily set my threshold to reject any random hypothesis in the biological context at 1:10^30 – 1:10^50, with my criteria such a molecule is of 40 – 20 orders of magnitude beyond the threshold. IOW, unless a credible necessity mechanism is offered for its emergence (that is, a detailed series of selectable sub modifications starting for another previously existing protein with another completely different function), the best explanation at present is that it is designed.

    Is everything clear? This is a method of analysis. It is simple. It is quantitative. It can be easily applied to what we know.

    Is it perfect? Certainly not. It is obviously based on many assumptions. What I believe is that, if and when we have all the data to calculate the target space “directly”, IOW to know with certainty how many sequences of a certain length can express a specific function, the Fit value of those proteins will be shown to be higher (there is a reason for that belief, but for the moment I will not debate it).

    But the method is here, and it can be applied, and it definitely measures, although with some approximation and probably error, the informational content of known proteins, which, as you can see, is not a myth or a vague argument, but a precise reality.

  6. Adel:

    By the way: no insult taken. Ignorance is more something of a compliment for me :-)

  7. Adel,

    My conclusion is spot on. Otherwise you would just put up the data.

    So until you answer my questions don’t be asking anything of me.

  8. What does ID have to offer?

    1- That living organisms are NOT reducible to matter, energy, chance and necessity

    2- That the DNA sequence is NOT the information

    3- That like all other designs we can study and figure out this one so that we can better maintain it.

  9. 69

    gpuccio,

    I have seen on the Poofery thread that Mr Nakashima has been banned.

    I am sorry to learn that, because I had hoped that more personalities could be engaged in this discussion.

    I hope that womanatwell will come back.

    Anyway, your explanation of Durston et al. was most lucid and helpful. I will defer further questions and comments about that contribution to FSCI because I want to focus for the moment on a closely related issue. You said in #56:

    The size of the target space for a specific function is the most difficult variable to assess, even as an order of magnitude. Indeed, at present no one can define it with certainty. That’s where the opinions of IDists and darwinists necessarily diverge: we do believe that the target space, however big, is anyway a tiny fraction of the search space. Darwinists do hope in huge functional spaces, and profit as much as they can of the present partial ignorance about the relationship between protein structure and function. But one thing is certain: this is an issue which is going to be clarified, and in a relatively short time. So, this particular “gap” in our knowledge will be filled, and we will see who is right.

    (My emphasis)

    What is the source of your belief about the hopes of Darwinists? (References, please.) I don’t remember when I learned that the fraction of possible protein sequences and protein domains that are functional was a very small fraction of the total search space, just as the fraction of viable life forms is a fraction of the total conceivable search space pertaining thereunto. But I’ve known those things for quite a while, and I’m not especially perceptive.

    So that issue seems already to have been clarified, and both sides are right.

  10. What part of transcription and translation- complete with proof-reading, error-correction and editing, strikes you as being cobbled together via an acumulation of genetic accidents?

    And

    How can we test the premise that a bacterial flagellum, for example, arose from a population that never had one via an acumulation of genetic accidents?

  11. Adel DiBagno,
    I’m honored that my presence is requested. An important aspect to consider in the possible usefulness of proteins is the ability for them to fold into usable shapes. The ATP synthase is a collection of at least 8 types of proteins that are perfectly shaped to work togehter. Some are used once, some 3 times, some more than 3 times. They act together to capture an ADP molecule, and add a phospate to it to produce ATP, the energy molecule of the cell. In one of the simplest, that of E. Coli, there are a total of 6000 amino acids. This energy source had to be there pretty early on. You can read from the abstract of this paper: http://www.pnas.org/content/10.....0/23/13270
    that protein folds are not easy to come by. In nature, collections of amino acids that fold are rare, much less folding into just the right shape.

  12. Here’s a picture of ATP Synthase from RCSB Protein Data Bank:

    http://www.rcsb.org/pdb/static.....b72_1.html

  13. Adel:

    I will be away for a couple of days.

    I am very sorry that Nakashima has been banned. I was not aware of that.

    You ask:

    “What is the source of your belief about the hopes of Darwinists?”

    Th hope for “huge functional spaces” has been expressed many times by darwinists here in the course of discussions. It is usually expressed as the conviction that big “slopes” exist which easily allow to pass by random variation from one island of functionality to another one. There are even paper in the literature trying to support that (I will give you the reference to the most important one later, but you probably know it, it’s the one about generating functional calcium binding proteins from a random set of sequences). Those papers have been many times quoted against ID, and against my personal arguments in particular. So, I don’t think I am making that up.

    But if you agree with me that:

    “the fraction of possible protein sequences and protein domains that are functional is a very small fraction of the total search space”

    then I am very happy of that. It confirms my idea that you are a reasonable guy :-)

  14. 74

    gpuccio [73]:

    Hasta la vista.

    Thanks for considering me reasonable. Can’t fight the facts. That would be unprofessional.

  15. 75

    womanatwell,

    Nice references. Rotary motors!

    You have made me and gpuccio happy.

  16. AD, thanks. The thing about ATP synthase is that it’s in all three domains–Archaea, Bacteria and Eukaryotes, so would have been there before any branching. It needs a working membrane so that there is an osmotic/electrochemical pull on the hydrogen protons from one side to the other. The energy is converted to the high-energy phosphate bond of ATP. It is used in just about all the cell’s metabolism, including the construction of DNA, RNA and proteins.

  17. womanatwell:

    Very good arguments. And thank you for the links. ATP synthase is one wonderful example of functional complexity, but it’s only one of the many available.

    At present, it seems that a lot of fundamental proteins had to “be there” very early. I am convinced that life started very complex and organized. All OOL theories are absolute myths: I can only pity darwinists who have to try to explain what cannot be explained (at least, not their way). OOL is an example of sudden emergence of complexity. The ediacara and cambrian explosions are two more. While in general speciation can be though as more gradual, even from an ID point of view, for these three great steps graduality is practically prohibited by facts themselves, as we know them.

    Some time ago I was in favor of a completely gradualistic design implementation, except for OOL. But the data about the two “explosions”, and possible others, have convinced me that probably design has been implemented with different modalities in natural history: sometimes more gradually, sometimes more suddenly. The transition from prokaryotes to eukaryotes is another good candidate for “acute” design implementation. All these are issues which can be partially clarified as our understanding of natural history improves.

  18. 78

    gpuccio,

    Good to see that you are back.

    To avoid distraction, I’ll reserve further comment until you have come up with support for your claim that

    There are even paper in the literature trying to support that (I will give you the reference to the most important one later, but you probably know it, it’s the one about generating functional calcium binding proteins from a random set of sequences).

    Sorry, I don’t recognize that reference.

  19. gpuccio,

    Some time ago I was in favor of a completely gradualistic design implementation, except for OOL. But the data about the two “explosions”, and possible others, have convinced me that probably design has been implemented with different modalities in natural history: sometimes more gradually, sometimes more suddenly.

    I agree. I even am starting to wonder about microevolution, since they are finding species-specific unique genes, as in:
    http://www.pubmedcentral.nih.g.....id=2586386

    As Behe says in TEOE, HIV continually mutates, but remains HIV.

  20. 80

    womanatwell, you may be applying the term microevolution too loosely. How “closely related” were the two Hydra species examined in the paper? Take a look at the phylogenetic trees in reference 34 (Hemmrich G, et al., Molecular phylogenetics in Hydra, a classical model in evolutionary developmental biology. Mol Phyl Evol. 2007;44:281–290) and you will see that H. oligactis and H. magnipapillata are not sister species. (Not so very closely related.)

    Incidentally, your reference to Behe and TEOE reminded me of gpuccio’s statement at #48:

    But there is another approach which gives us a more realistic idea of where we are with darwinian explanations. Behe in TEOE has suggested that, in natural models like malaria, random mutations can, at best, provide two coordinated necessary mutations under a very strong selective pressure.

    I have been looking into Behe’s claims about the chloroquine resistance data and how they relate to his “edge,” and I find them questionable…

  21. AD, I will look that up. Today I checked out the April 10 issue of Science from the library. I haven’t gotten to Ingolia yet, but will. I’ve been wanting to read more about protein signaling, and there’s an article about it on p.198 (Smock & Gierasch).

Leave a Reply