Over at the Skeptical Zone, Petrushka has written a post arguing that “DNA is a template, not a code.” In today’s post, I’d like to briefly review the reasons why we claim that the genetic code is a literal reality, not a metaphor, and explain exactly what a code is.
But before I do that, I’d like to critique Petrushka’s short post, titled, What Is A Code? (October 20, 2015):
Lots of heat surrounding this question.
My take is that a code must be a system for conveying meaning.
In my view, an essential feature of a code is that it must be abstract and and able to convey novel messages.
DNA fails at he level of abstraction. Whatever “meaning” it conveys cannot be translated into any medium other than chemistry. And not just any abstract chemistry, but the chemistry of this universe.
Without implementing in chemistry, it is impossible to read a DNA message. One cannot predict what a novel DNA string will do.
DNA is a template, not a code.
Go to it.
Petrushka’s three reasons for denying that the code in our DNA is a genuine code are that: (i) its meaning can only be translated into chemistry; (ii) it cannot convey new messages; and (iii) it is impossible to predict what a novel DNA string will do.
Regarding Petrushka’s argument that whatever “meaning” DNA conveys “cannot be translated into any medium other than chemistry”: scientists have developed computer models of DNA (see also here), so the objection that DNA’s meaning is confined to the domain of chemistry is factually incorrect.
As for DNA’s ability to convey new messages: I’d like to ask Petrushka if he believes that genes have evolved over the course of time, and that one gene (call it A), which codes for a certain protein, can evolve into another gene (call it B), that codes for a different protein. Since Petrushka writes for the Skeptical Zone, I presume his answer would be yes to both questions. In that case, Petrushka’s objection that DNA cannot convey new messages falls to the ground: he himself would admit that DNA can, over the course of time, convey new messages.
Finally, Petrushka’s claim that “it is impossible to predict what a novel DNA string will do” is incorrect. There is a whole field of science, known as gene prediction, whose aim is to identify the regions of genomic DNA that encode genes. Indeed, there is a Web server that “takes a sequence, either RNA or DNA, and creates a highly probable, probability annotated group of secondary structures, starting with the lowest free energy structure and including others with varied probabilities of correctness.” For an overview of the various software tools that predict DNA structure, the reader is invited to go here.
So when Petrushka argues (in a comment in his post) that whereas “grammar, syntax and spelling make it possible to construct novel sentences that have a predictable meaning,” “you cannot construct novel sequences of DNA and know what they mean or what they will do,” he is overstating his point. It is indeed true, as we will see, that we cannot deduce with certainty what a given piece of DNA will do, but that’s because the genetic code is but one of several codes that controls what goes on inside the cell. Despite this fact, scientists can deduce quite a lot about what a piece of DNA will do, simply by examining its sequence.
Why the genetic code is a literal reality
In today’s post, I’d like to discuss four pieces of evidence that support my claim that talk of a genetic code is no mere metaphor: it is a literal reality.
My first piece of evidence is a letter by Francis Crick to his son in March 1953 (h/t kairosfocus):
I could quote Crick at further length, but I see that Mung has already done so, in an excellent post over at the Skeptical Zone, titled, Code Denialism Pt. 1 – Crick, which was written in reply to Petrushka’s post. I highly commend Mung’s post to readers, as well as his follow-up post, Code Denialism Pt. 2 – Nirenberg, in which he quotes further scientific testimony, concluding with a quote from theoretical biologist Howard H. Pattee’s paper, “Causation, Control, and the Evolution of Complexity”: “…the fact is that present life requires semiotic control by coded gene strings.”
My second piece of evidence is the sheer prevalence of usage of the term “genetic code” among scientists today. If the reader goes over to Pub Med and types “genetic code” in quotes, he/she will get nearly 9,000 matches. That number speaks for itself.
Even Intelligent Design’s most ardent foes readily admit the legitimacy of the term, “genetic code.” Larry Moran, Professor of biochemistry at the University of Toronto and a long-time critic of Intelligent Design, was asked by commentator Virgil Cain whether he accepted the reality of the genetic code, in a recent post on Uncommon Descent:
Larry Moran- Do you think the genetic code is a real code (like Morse Code is a real code)?
Professor Moran replied:
Yes. That’s how I describe it in my textbook.
Well, there you have it. That’s what an eminent professor of biochemistry and a leading critic of Intelligent Design says.
Dr. Stephen Meyer on why protein-based life requires a code
My third, and most vital. piece of evidence comes from a passage in Dr. Stephen Meyer’s best-selling book, Signature in the Cell (HarperOne, 2009, pp. 114-118), in which he explains, in pellucid prose, exactly what a code is, and what’s wrong with Petrushka’s assertion that “DNA is a template, not a code.” Meyer exposes the error in the “template” metaphor for DNA (which was first formulated by George Gamow) by contrasting it with Francis Crick’s proposal that DNA contained a code:
Deducing a Code
By the late 1950s the relationship between DNA and protein was coming into focus. By then leading molecular biologists understood that the three-dimensional specificity of proteins depended on the one-dimensional specificity of their amino acid sequences. They also suspected that the specific arrangements of amino acids in protein chains derived in turn from specific sequences of nucleotide bases on the DNA molecule. Yet the question remained: How does the sequence of bases on the DNA direct the construction of protein molecules? How do specific sequences in a four-character alphabet generate specific sequences in a twenty-character alphabet?
Francis Crick anticipated the answer: the cell is using some kind of a code. Crick first began to suspect this as he reflected on a proposal by George Gamow, a Russian-born theoretical physicist and cosmologist who had, in the post-war years, turned some of his prodigious intellectual powers to reflect on new discoveries in molecular biology.
Gamow had immigrated to the States in the 1930s to take an appointment at George Washington University after working at the famed Theoretical Physics Institute in Copenhagen. In 1953 and 1954 he proposed a model to explain how the specific sequences in DNA generate the specific sequences in proteins. According to Gamow’s “direct template model,” as it was called, protein assembly occurred directly on the DNA strand. Gamow proposed that proteins formed as amino acids attached directly to the DNA molecule at regularly spaced intervals. Gamow thought the amino acids could nestle into diamond-shaped cavities in DNA that formed in the space between the two backbones (see Fig. 5.1). In this model, a group of four nucleotide bases from two parallel strands of DNA made a specifically shaped hollow into which one and only one amino acid could fit. As each group of nucleotides acquired an amino acid partner, the amino acids would also link with each other to form a chain. According to Gamow, the nearly identical spacing of amino acids in protein and of bases in DNA enabled the direct “matching” of amino acids to nucleotide groups on the DNA template. This matching occurred because of the supposed fit between the cavities produced by the bases and the shape of the side chain of the amino acids and the chemical affinity between the bases and amino acids.
The notion that “DNA is a template” isn’t Petrushka’s; it’s George Gamow’s. But it rests on bad science. To quote Meyer again:
Francis Crick first recognized the futility of this scheme. In a famous communication to members of the “RNA Tie Club,” Crick explained that there was nothing about the chemical properties or shapes of the bases to ensure that one and only one amino acid would fit into, or attach to, the cavities created by a group of bases. In the first place, many of the amino acids were difficult to differentiate structurally, because they had similar side chains. Second, the bases themselves did not necessarily create shapes that either matched these shapes or, still less, differentiated one from another. As Crick put it, “Where are the nobby hydrophobic surfaces to distinguish valine from leucine from isoleucine? Where are the charged groups, in specific positions, to go with the acidic and basic amino acids?” As Juddson explains, “Crick was a protein crystallographer, and knew of no reason to think that Gamow’s holes in the helix could provide the variety or precision of shapes necessary to differentiate a score or more of rather similar objects.”
Yet in the absence of such spatial matching and the formation of corresponding chemical attachments, there could be no reliable transmission of information. For the direct template needed to explain the irregularity and specificity of the amino acid sequences in proteins, individual bases (or groups of bases) needed to manifest discriminating spatial geometries. Figuratively speaking, DNA bases not only needed to “zig” in a way that matched the specific “zags” of each amino acid, but DNA needed to do so at irregular intervals in order to produce the irregular sequencing of amino acids that characterizes proteins. Yeet as Crick realized, both the individual bases themselves and their various combinations lacked distinguishing physical features that could account for the specificity of amino-acid sequencing. Further, the geometry of the DNA molecule as a whole, at the level of its gross morphology, presents a highly repetitive sequence of major and minor grooves (see Fig. 5.2). Therefore, it could not function as a direct template for protein synthesis. As Crick explained, “What the DNA structure does show … is a specific pattern of hydrogen bonds, and very little else.”
If the chemical features and shapes of the DNA bases do not directly account for the specific sequencing of proteins, what does? Crick remained adamant that the specific arrangement of the nucleotide bases, not anything about their physical or chemical features per se, dictated amino acid sequencing.
Crick’s insight had profound implications. If a single protein could not copy the information in DNA directly, as the direct template model suggested, then as Jacques Monod would later explain, “you absolutely needed a code.” And so Crick postulated a third factor consistent with his original sequence hypothesis. He proposed the existence of a genetic code – a means of translating information from one chemical domain into another.
So there we have it. A code is a means of translating information from one domain into another – in this case, a chemical domain. That’s a broad, non-trivial definition.
Meyers then proceeds to describe how the genetic code works:
To envision what Crick had in mind, imagine having a human alphabet that uses four and only four distinct shapes that combine in various specific ways to form not just words, but words that correspond to individual letters in a larger alphabet roughly the size of the English alphabet; this larger alphabet then uses its letters (each of which is one of those words composed of the four shapes) to build sentences. Of course, we have a symbol system that does much the same thing. The binary code that computer programmers use has a translation key that enables programmers to produce English text from sequences of binary digits. Each letter in the English alphabet is represented by a unique combination of two character types, O’s and 1’s. For example, in ASCII code the letter A is represented by the sequence 100 0001, the letter B by the sequence 100 0010, and so on. Each of the letters of the twenty-six letter English alphabet has a corresponding representation in the two-digit numeric alphabet of this binary system (see Fig. 5.3). Crick realized that if his sequence hypothesis were true, then there must be some similar translation system in the cell – one that determined how sequences written in the four-character alphabet of DNA are converted into sequences that use a twenty-letter amino-acid alphabet. The DNA detective was now in the business of code breaking as well.
Yet in a physical system, as opposed to a social or linguistic one, a code must have a <b?physical expression. Crick postulated the existence of a third molecule, an adapter molecule functioning as a translation device that could recognize and convert the information in the sequential arrangements of the bases into specific amino-acid sequences. More specifically, he proposed the existence of twenty separate adapter molecules corresponding to each of the twenty protein-forming amino acids. Each adapter would, by the familiar mechanism of complementary base pairing, bind to a sequence of DNA text at one end and to a specific amino acid at the other. Crick also proposed the existence of specific enzymes (one for each of the twenty adapter-amino acid pairs) to connect the specific amino acids and their corresponding adapters. The set of correspondences between sections of genetic text, on the one hand, and the specific amino acid, on the other, constituted a genetic code (see Fig. 4.6).
Though these correspondences were mediated physically by adapter molecules and enzymes, this complex system, as conceived by Crick, would be governed by the functional requirements of information transfer as by rules of chemical affinity – as much by a set of chemically arbitrary conventions as by the necessary relations of physical-chemical law. Indeed, as Crick imagined this system, nothing about the physical or chemical features of the nucleotides or amino acids directly dictated any particular set of assignments between amino acids and bases in the DNA text. It had to be cracked…
That last phrase encapsulates a vital feature of the genetic code: its chemical arbitrariness: “nothing about the physical or chemical features of the nucleotides or amino acids directly dictated any particular set of assignments between amino acids and bases in the DNA text.”
So, how did Crick’s proposal withstand the test of time? Remarkably well, as Meyer describes it:
Crick’s proposal was striking in its sheer theoretical audacity. Biochemistry had not a shred of evidence for the existence of adapter molecules or their corresponding enzymes. Crick simply deduced the need for a code by thinking about what would be needed to make the cell’s communication system work…
What Crick would postulate on the grounds of functional necessity took nearly five years of intensive research and many transatlantic communications and conferences to verify and calculate…
…Indeed, Crick’s adapter molecules and their corresponding enzymes functioned much as he had originally envisaged, albeit as part of a far more complex process than even he has foreseen.
Finally, I should point out that strictly speaking, DNA itself is not a code. As Meyer puts it, “The set of correspondences between sections of genetic text, on the one hand, and the specific amino acid, on the other, constituted a genetic code .”
Dr. Jonathan Wells: In addition to the genetic code, there are other codes in the cell
My fourth and final piece of evidence comes from Dr. Jonathan Wells, who has two Ph.D.s – one in Molecular and Cell Biology from the University of California at Berkeley, and one in Religious Studies from Yale – and who is the author of Icons of Evolution: Science or Myth?: Why Much of What We Teach About Evolution is Wrong, The Politically Incorrect Guide to Darwinism and Intelligent Design and The Myth of Junk DNA as well as being the co-author of The Design of Life with Dr. William A. Dembski.
Dr. Wells contends that not only is the genetic code real, but there are at least six different codes inside the cell.
In a recent podcast on ID The Future, Dr. Jonathan Wells defined a code in biology as follows:
Well, a code, generally speaking, is a pattern that conveys information. Now, what information does is it limits whatever thing that promotes a huge range of possibilities to a specific set of those possibilities. For example, if we have a long string of letters that really make no sense – they don’t delimit anything, except the letters themselves – but if the sentence actually means something in English, then it specifies the thought, out of the infinite universe of thoughts that are out there, and that’s how information works. It can be in the form of letters, it can be in the form of digital information on a CD that specifies music – say, one song – out of the infinite number of possible songs that are out there, and in biology, the code specifies, among other things, the form of an organism. And a code in biology is generally carried, not by digital information or by letters, but by molecules. So biological code is a pattern in molecules that carries information to specify features of the organism.
Dr. Wells then proceeded to describe not only the genetic code, but five other codes inside the cell:
Well, a genetic code can mean two things. To a biologist, it technically means a set of instructions whereby information in DNA and RNA is translated into protein. A certain set of sub-units in RNA specifies a certain subset of amino acids and proteins. But that’s not the only meaning of code in biology – genetic code, that is. The other meaning – somewhat less technical – is that the sequence of sub-units in DNA and RNA can carry information that specifies proteins. So, I’ll use “genetic code” in the second sense, and it certainly is such a code. It’s not as powerful as it’s sometimes made out to be, as I’ll explain in a minute, because there are other codes besides it.
One code we could call the epigenetic code. It turns out that the DNA molecules in cells are chemically altered in many cases – not in the sense that the sequences of the sub-units changes, but the sub-units are decorated, so to speak, with other molecules that affect which parts of the DNA get transcribed into RNA and therefore translated into protein. So that’s another code in the organism that’s independent of the sequence of DNA it sequences.
On top of that, there’s still another code: the RNA splicing code. It turns out that in organisms such as ourselves, when the DNA is transcribed into RNA, the RNA is then cut up and spliced back together. And it can be spliced back together in many ways. We know of one stretch of DNA in fruit flies, for example, that after RNA splicing, can produce over 18,000 different proteins – a single stretch of DNA! So there’s the RNA splicing code, about which we don’t know a whole lot. It might be partly influenced by RNAs that have been transcribed from non-protein-coding RNA, but we don’t fully understand the nature of the RNA splicing code.
On top of that is the sugar code. It turns out that almost every protein made in the cell is further modified by the addition of sugar molecules. Now sugar molecules are far more complex than DNA and RNA. DNA and RNA sequences are linear – one-dimensional – but sugar molecules are three-dimensional, and so they carry a whole lot more information than in the RNA and in the DNA. So proteins are modified by the addition of sugar molecules that add a whole new level of complexity to what’s going on here.
Now on top of the sugar code is the membrane code. It turns out that there’s a pattern in the membrane of the living cell that’s there before the DNA is transcribed, and can be inherited independently of the DNA. And the membrane pattern is that it determines[?? – audio unclear] the spatial gradient for things in the cell, and the spatial ranking is all-important – especially in the embryo.
And finally, on top of the membrane code, there’s something called the bio-electric code. There are molecules in the membranes of all our cells that generate electric fields, and these fields, once generated, form a three-dimensional pattern that is known to affect development. For example, in a frog embryo, researchers have modified the electric field that they generate, without modifying the molecules or the membrane. And just by electrically modifying the bio-electric field, they can alter development.
So there are at least six codes in biology: the genetic code and five others.… They all carry information. They all affect cellular activity, and mostly embryo development.
Readers who would like to learn more about codes in biology will enjoy perusing Dr. Wells’s latest paper, “Membrane Patterns Carry Ontogenetic Information That Is Specified Independently of DNA”. The “Summary and Implications” section on page 14 provides a helpful overview.
A final objection?
In a subsequent comment on his post, Petrushka writes:
My understanding of the word “code” — which may not be the only possible one — is a system of transferring information such that an arbitrary reader can discover the meaning of the message.
This applies to human languages, including encrypted ones or extinct ones.
I think it also applies to musical notation, including mechanical notations, such as music box cylinders.
One thing that distinguishes such symbol systems is grammar and syntax. The notation has elements that can be rearranged to form alternate messages, and such alternate messages can be understood.
DNA does not have anything analogous to grammar and syntax….
Contrary to what Petrushka asserts, scientists can indeed legitimately speak of the language of the cell. Dr. Don Johnson is a scientist who has worked hard to build a link between the biological sciences and information technology over the past two decades. In addition to being the author of Programming of Life and Probability’s Nature and Nature’s Probability: A Call to Scientific Integrity, Dr. Johnson has both a Ph.D. in chemistry and a Ph.D. in computer and information sciences. He has spent 20 years teaching in universities in Wisconsin, Minnesota, California, and Europe. On April 8, 2010, Dr. Johnson gave a presentation entitled Bioinformatics: The Information in Life for the University of North Carolina Wilmington chapter of the Association for Computer Machinery. Dr. Johnson’s presentation is now on-line here. Both the talk and accompanying handout notes can be accessed from Dr. Johnson’s Web page. Here’s an excerpt from his presentation blurb:
Each cell of an organism has millions of interacting computers reading and processing digital information using algorithmic digital programs and digital codes to communicate and translate information.
On a slide entitled “Information Systems In Life,” Dr. Johnson points out that:
- the genetic system is a pre-existing operating system;
- the specific genetic program (genome) is an application;
- the native language has a codon-based encryption system;
- the codes are read by enzyme computers with their own operating system;
- each enzyme’s output is to another operating system in a ribosome;
- codes are decrypted and output to tRNA computers;
- each codon-specified amino acid is transported to a protein construction site; and
- in each cell, there are multiple operating systems, multiple programming languages, encoding/decoding hardware and software, specialized communications systems, error detection/correction systems, specialized input/output for organelle control and feedback, and a variety of specialized “devices” to accomplish the tasks of life.
Petrushka’s claim that DNA lacks grammar and syntax is also incorrect, as author, business consultant and electrical engineer Perry Marshall points out on his Cosmic Fingerprints.com Website, in a thought-provoking article titled, Language and Design: Product of a Mental Process:
…DNA is not just a molecule, DNA is a language. It is actually very comparable to English and human languages in the way that it is structured. Here is a little chart and it shows the comparison between human languages and DNA. The nucleotide is the A, T, C, G.
Marshall uses a chart to illustrate his point:
DNA Language Human Language Nucleotide Character Codon Letter Gene Word Operon Sentence Regulon Paragraph
Marshall continues:
DNA is encoding, decoding mechanism that stores and transmits the message of the living organism. Biologists have actually been using linguistic analysis to decode the human genome. Tools that we must use to analyze languages are continually being used to figure out what all of those genes actually mean.
So if you read some article in the newspaper it says we found a gene that causes Spina Bifida or something like that, some kind of linguistic analysis was used to help figure that out.
Perry Marshall’s Intelligent Design argument
Marshall also proposes an update to Paley’s design argument, which he believes makes it logically unassailable:
Perry Marshall’s Update to Paley’s Design Argument
But now I have an improvement to Paley’s design argument. This sounds audacious but I’m serious. I have an improvement that makes Paley’s argument airtight:
- Element common to both watches and life is language
- The essential distinction between pattern and design is language
- Fundamental Property of all Designs: Idea precedes Implementation
- Idea must be represented by language
- All language comes from a mind
Life is preceded by DNA, and a watch is preceded by a plan where a blueprint or at least an idea in somebody’s mind that preceded the building of the watch.
That is true of all things that are designed, an idea comes first.
The essential distinction betweens patterns and designs is language. Patterns don’t have languages, but designs do. So the fundamental property of all designs is that an idea precedes the implementation of the idea.
The idea exists in a symbolic form before it’s physically built. An idea, in order to exist, has to be represented by a language. Even to have an idea in your mind you have to talk to yourself and have images in your mind of what you want to do before you do it. So we know this:
- Ideas always precede implementation, always, no exceptions.
- All languages come from a mind. No exceptions.
- There are no languages that do not come from a mind.
- So we know that DNA was designed.
- A mind designed DNA, therefore God exists.
Can this be refuted? Yes, if any exceptions to this can be found. But a lot of people have tried to refute it, unsuccessfully. It’s an airtight inductive proof that life was designed by a mind. If anyone can find a flaw in the logic, it fails. Until that happens, it stands. It’s just like the laws of thermodynamics, or gravity, or conservation of matter and energy. If anyone can find an exception, the law fails to hold.
This leads to what I call The Atheist’s Riddle:
“Show me a language that does not come from a mind. ”
It’s so simple and a child can understand, but so complex no atheist can solve.
Conclusion
It seems to me that Petrushka’s point would be better expressed by saying that whereas the language of the cell is an artificial language, the language we use in our everyday speech is a natural language. (See here for a short exposition of these terms.) Natural languages do indeed have a richness which cannot be exhausted by their grammatical and syntactical rules. Nevertheless, it remains true that to the best of our knowledge, only a mind is capable of designing a language, whether natural or artificial. I conclude that the language of the cell warrants an inference to the existence of an Intelligent Designer.
But regardless of what readers may think of this Intelligent design argument, one point is undeniable: the genetic code is real. It is not a metaphor, but a literal reality – a point I elaborated in my 2013 post, Is the genetic code a real code?
What do readers think?