Home » Intelligent Design » Junk DNA — is it really?

Junk DNA — is it really?

Junk DNA May Not Be So Junky After All
3/23/2006

Researchers at the McKusick-Nathans Institute of Genetic Medicine at Johns Hopkins have invented a cost-effective and highly efficient way of analyzing what many have termed “junk” DNA and identified regions critical for controlling gene function. And they have found that these control regions from different species don’t have to look alike to work alike.

MORE

  • Delicious
  • Facebook
  • Reddit
  • StumbleUpon
  • Twitter
  • RSS Feed

16 Responses to Junk DNA — is it really?

  1. For years we were told that junk DNA was evidence for RM+NS because Darwinian theory would predict left-behind junk formed by random mutations. It is turning out that this prediction was wrong. ID would predict exactly the opposite: If the information in DNA is the product of design, we should expect to find function for what was once thought to be junk.

    In the future, perhaps we should expect to find more analogs between human-engineered software and genetic software. Genetic information is incredibly compact and efficient. For example, we know that some nucleotide sequences can code for more than one protein, depending on where one starts and stops in the sequence. In other words, the information overlaps, which represents a form of data compression.

    Perhaps some of the “junk” DNA appears to be junk because of other data-compression techniques like run-length encoding or more sophisticated algorithms like LZW. This might represent an area for future research, inspired by an ID perspective.

  2. Gil

    For me it’s just intuition based on decades of working with information storage systems. It would be a real miracle if 3 billion base pairs is enough digital storage capacity to specify the construction of a human being including brain organization and bootstrap code. To posit that most or even much of it is excess baggage is preposterous. It’s less than a gigabyte of storage! How is it possible? There must be spatially encoded information as well as digitally encoded information. I just don’t think it’s possible to pack all that’s needed into a gigabyte using mere digital storage akin to a paper tape.

    John Davison doesn’t know jack diddly squat about information storage systems yet his intuition as a biologist informs him that position effect is the name of the game. I think he must be right.

    Consider: information on a paper tape is stored in one dimension – the x axis. If you lay the tape a flat surface and bend it you can get added storage capacity by encoding information in the bends – the y axis. If you bend it inside a 3 dimensional space you get a third dimension for encoding – the z axis.

    This is stretching far beyond my information storage expertise as computers only exploit one dimension for data storage. The DNA molecule exists in 3 dimensions. It has length, width, and height. Moreover, the distance between points in the y and z axis are not digital, they’re analog values, which probably multiplies the potential storage capacity by a factor I can’t even guess at. I wonder if our fearless leader can tell us the theoretical maximum storage of the human DNA molecule when position in the 2 additional dimensions can be utilized.

  3. Is Junk DNA evidence of encryption?
    DNA is often compared to digital signal information. Digital signals are often encrypted in order to provide error-correcting capabilities (from my hometown university – go Gaels!)

    http://post.queensu.ca/~forsdy.....0Sequences

    This adds significant “junk” to the original signal.

    Encryption has nothing to do with error checking and correction (ECC), per se, although the encryption technique can also serve to detect transmission errors and spoofing attempts. I have a patent on an application of ECC and have read a few books on data compression. This was probably the most informative overall for me as I’m fluent in C and can glean the details left out of the text from the source code on disk that comes with the book. It is a survey of the most widely used techniques with enough detail to implement most of themThe Data Compression Book. I believe it covered up through MPEG1. MP3 would be absent as it came along after the book was written. -ds

  4. Intelligent Design logic: “Since biological structures are the result of Intelligence (as opposed to unguided forces/events/circunstances) then this so called “junk DNA” must have function. Let’s study it until we discover it’s function.”

    Darwinian logic: “We don’t know what is its function, therefore it’s vestigial/junk DNA. Don’t bother too much about it.”

  5. DaveScott, there is all manner of compression technology already identified in DNA. The best quick description of it that I have seen is in Denton’s “Evolution, a Theory in Crisis” Chapter 14. From what I have learned, more DNA codes to multiple proteins, or protein variants than codes to a single protein. In other words, most “coding” DNA is compressed. As you and I know, however, data compression produces incredible mutation resistance. The extent of the mutation resistance can be seen in the Hystone 4 gene which shows, like, 4 aminos difference between the bovine and the pea.

    Yup, organic life is more complex, WAY more complex, than it was though 50 years ago. It is most likely WAY more complex than scientists currently realize. Ie, it just happened is a religious tale far more far fetched than Noah’s flood.

    From what I have learned, more DNA codes to multiple proteins, or protein variants than codes to a single protein.

    This varies with species. The process is called transcription or RNA editing and doesn’t exist in prokaryotes because they don’t have introns in their genes which mark the splice points. In humans RNA editing raises the number of protein products threefold which is the largest increase in any sequenced genome. I’m not sure what you mean by mutation resistance. RNA editing makes mutations worse. A mutation in an exon which is involved in the manufacture of three different proteins effects three proteins instead of just one. This is a bit like dictionary compression in software. A mistake in the dictionary is disastrous. However, you may be right in that this leads to greater conservation as a mistake in an exon involved in the making of one protein is three times less likely to be fatal than an exon used in three proteins. In that light I guess you could fairly say it adds mutation resistance in that mutations are less likely to become fixed in the gene pool. -ds

  6. Someone should point out that this study occurred in zebrafish, which don’t have much junk DNA.

    This bit of ignorance is going to put your name on the moderation list. A quick search in the genome size database shows that zebra fish have a genome size (2.28pg) two thirds the size of a human genome (3.50pg). 98% of the human genome is “junk” or put another 0.07 picograms of the human genome are coding genes (non-junk). Assuming that zebra fish have roughly the same number of coding genes (they probably have fewer) they would also have 0.07pg of coding genes or 97% junk compared to 98% junk in a human. That’s not much difference in amount of junk DNA. -ds

  7. Let me ask the obvious question. How much “junk DNA” is there in human compared to chimp? or assumed in our predecessors?

    Pretty much equal amounts. “Junk DNA” is highly misleading. At a minimum it means “no known function”. In previous decades it was widely believed that DNA with no known function had no actual function at all and was baggage left over by past evolution, retrovirus infections, and etc. It used to include any DNA that didn’t code for proteins but now some of that non-coding DNA has been found to be functional but it’s still called junk under the “non-coding DNA” definition.

    What’s very intriguiging to me is the c-value paradox. The size of the genome isn’t very predictive of the complexity of the organism. The largest known genome belongs to an ameoba (200x the human genome) and things as unlikely as some pine trees and water lillies have genomes many times larger than human genomes. Some amphibians have really big genomes too. Most scientists believe the excesses in these organisms are truly useless junk DNA. As far as I’m concerned at a minimum it means that the front-loaded hypothesis of evolution is viable as it proves that organisms can carry around immense genomes with little of it actually required to be expressed to produce the organism in question. A genome 200x as large as a human’s could contain a template library that defines the characteristics of at least 200 different phyla and probably many, many more than that as the genomes of many phyla are substantially smaller than human genomes and also there exists a lot of genomic commonality across phylum boundaries. -ds

  8. There is an old saying that predictions are risky, especially when they concern the future, but I’ll take my chances.

    I predict that in the not-too-far-distant future, the integration of biochemistry, molecular biology, and computer science will reveal that we have only scratched the surface when it comes to the elegance and power of biological information processing. These discoveries will set in motion a revolution in human-engineered computer technology that will totally dwarf everything seen until now, by countless orders of magnitude.

    We will discover that much of “junk” DNA not only has a purpose, but that it has a time-control release mechanism that surpasses countless generations, and expresses itself when the time is appropriate.

    All of this, and much more, will point inexorably to design that transcends our wildest imaginations, and, in the future, those who are currently, desperately hanging on to 19th century materialistic/stochastic explanations for all of this marvelous engineering will be looked upon as those who worked diligently to prevent scientific progress in our time.

  9. DaveScot
    now that you say it, it seems obvious, yet what you say seems also incomplete somehow, there should be something more, something everyone is missing, analog data, digital data, dimension positioning, 3d information,these are what I got from your comment, my stab in the dark trying to relieve the nagging thought that something is staring us in the face that is being over looked, something to do with junctions, or constructional joints of some type, an intersection, perhaps the shape of the molecule itself? there is something there…frustrating…

  10. GilDodgen – great points. A design paradigm will prove to be a much more fruitful way of looking at cellular processes IMHO.

  11. Dave, thanks. I’m trying to understand if the 2-4% difference tops out to oh, 70mb base pairs difference. It seems misleading to focus on a 4% ratio while not explaining the significant fact of 70Mb nucleotides unless of course one assumes “junk”.

    If proven the long ‘seemingly’ random, non-coded genes actually do lead to distinct developments, then FL hypothesis is favored I guess as you say, but I’m not familiar enuf with the subject of FL. Certainly, if progress continues forward in this path, it could revolutionize the field of complex pattern recognition.

    As a coder in production environments, we were always driven to 99.99 percent up time and a goal of more efficient process each mod cycle. I realize this seems highly off base, but whenever code changes were migrated, the new program modules compiled show percentage changes based upon improvement. Often times, programs become more compact, and appearance of random deletions of code are really improvements. Similarily, patchwork of 2 codes pulled together with the difference being the particular output goal show up significantly.

    We could see variations of a 100K program reduced by .5 percent, leading to large differences or insignificant difference depending on how important the line-by-line mods were. If redundant code is stripped, then you have the non-code scenario which could just as easily stay in place and not harm anything except memory hogging charges and slower run time due to larger search space. But if the coded lines are significantly different, the runtime and behavior could be profound to memory access and cost savings.

    The question then becomes what of hardware/software error check mechanisms and do they really compare as internal progs correcting possible external force mutations. This brings us to tables and database assignments, arrays and compression algorithms. Bad data allowed as input, perculating thru could alter dynamic tables meant to vary within certain zones that allow for growth or varied selection – thus causing to trip outside or to mutate the key sequences within. A snap dump in time, looking at a particular mod level, table and input, one could misinterpret the dynamic data(hex dumps) as significant wherein the real importance is in the maintenance modules that supply data input, retrieval and storage in the tables as well.

    I guess this is a long confused way to say other processes not mapped yet which interact with DNA are more important than we possibly know? So the significance is not just in the DNA difference, but the simple folds which might do end to end – connection routines for different sub funtions of recognition and interaction of cell communication or table like read/write functions, not read only, but store/retrieve. So, not only are the ‘non-coded’ DNA important, but the external routines interacting.

    When looking at Chromosome 2, that’s so much like patchwork code, with unrelated extra base pairs to chimps. Being a code designer, the small, seemingly insignificant lines make all the difference between two programs and the actual new function. In fact, the two programs could be quite insignificant alone. But together, with additional code, voila!

    Or am I wrong and all molecular structures are mapped fully between Chimp and Humans? I feel like we’re missing something in the signaling structure related to heck, maybe Chromosome 2 and our brain function as opposed to lower order mammals.

    If I’m not thinking typical evolution, instead like a designer who put in error correction, table functions for variable input/output ratios, then I patchwork it, add the required code and send it off its merry way to QA. But there’s something unique why I patchworked it where I did, not a happy happenstance. There was reason and order to it as optimal placed code. Anywhere else, it does not function.

    Well…. my imagination runs away with me at times. Einstein, thanks.

  12. The idea of “junk” DNA doesn’t even make sense in a darwinian framework. DNA replication doesn’t come free – presumably, any organism that can function with less DNA will be at advantage to those who are carying around extra baggage – you’d presume the “extra” would naturally be deleted over time.

    My bold prediction: just as darwinians currently use the supposed existance of junk to “prove” Darwin, if it is shown to not be junk, they will use that fact to “prove” Darwin, without blinking an eye. We have always been at war with Eurasia.

    Micheals7 –

    You might wnat to check out some of Rupert Sheldrake’s work, particularly “The Presence of the Past”. He has some intrigueing ideas about the relationship (or rather the non-realtionship) of DNA to biological form. He’s considered a crackpot, of course, but his ideas are not all that easy to dismiss…

  13. Whoever coined the words “junk DNA” would kindly do well to remember that this code has been around for several billion years. Therefore, this code deserves way more respect than what such dismissive words imply.

  14. Moderator: Encryption has nothing to do with error checking and correction.

    As a software developer who has worked with data compression a lot, let me suggest that virtually all data compression techniques have an inherant error amplifying effect. In the world of organisms and DNA, where natural selection happily deletes that which is destructive to the organism, the net effect is error correction.

    Let’s consider this in the context of DNA. Denton reports that a gene will often encode for a protein, then will divide up and encode for sub-proteins. He also reports that the start and stop points of a gene will overlap. In both cases, individual codons code to two very separate proteins. If a particular codon now codes for two separate proteins, a change in that codon must be at least non-destructive in two variants or it will be rejected by natural selection. This is a simple example of error amplification. If a mutation happens which is non-destructive to one of the proteins, but is destructive to the other, therefore the organism is removed from the gene pool, haven’t we effectively achieved an error correction system?

    Agreed. If I didn’t say essentially the same thing in my comment I at least meant to say the same thing. -ds

  15. Every life form is a system and every system that we build has three major phases, constuction, operation, and maintenance. Books on embryology address the CONSTRUCTION phase.They provide excellent descriptions on what happens during embryonic development but do not explain how each cell knows where to go, and what to do when it arrives. It is clear from observation of our own bodies that a vast data base specifying all of our exquisite detail is being utilized. Moreover, a equally vast set of detailed procedures which operate on this data base must also be accessed and used. The qustion is: Where in our genome is this located? Most likely in the 98.5 of our DNA known as ‘junk’ DNA.

    It takes far less data and procedure to OPERATE a system after it is built. This could be the small part of our DNA which has been decoded by the Human Genome project which appears to mainly involve functions necessary for day to day living.

    Our MAINTENANCE phase is starting to receive more attention. Adult STEM cells are obviously part of our maintenance phase. For example, the few hematoeitent divide to produce new blood cells when they receive a signal indicating that more are needed.

    The CONSTRUCTION, OPERATION, AND MAINTENANCE phases have some overlap. For example, if a newt loses part of a limb it is able to replace the exart part that is missing. Cells located at the injury must know their exact position so that they can re-invoke the exact same CONSTRUCTION procedures that were used in the original construction in the embryo. Adult stem cells which are located in various tissues including the bone marrow, brain and organs, must have been left over from the CONSTRUCTION phase for use by the MAINTENANCE phase.

    No doubt, it is the ‘junk’ DNA that contains the huge 3D coordinate system necessary to accurately place almost every cell in our body. There probably is a set of master coordinates which define the body plan and many sets of relative coordinates to define finer and finer detail. Maybe cells differentiate depending upon where they are in the local coordinate set. Possibly fractal like processing is used to reduce the amount of needed data and procedure. There is no doubt that our coordinate system is accurate. Our limbs can grow independently for years and yet remain the same length.

  16. Embryo development, or construction must be a digital process in order to explain the exquisite detail which we observe in our three dimensional bodies. Indeed the vast amount of embryonic research contains descriptions of digital signaling between cells, and on/off control of genes. An analog process such as cells following point to point chemical gradients is extremely unlikely because of difficulties with gradient path dispersion and accurately sensing/generating gradient strength

    Industrial systems programming makes heavy use of an object oriented language called C++ For example, the top level object can encapsulate all the data and procedures to build a house. A room object can be derived from the house object and can inherit all of the data and procedures, use what it wants of these, and then add data and procedures which specify general room data such as doors, windows, power outlets Types of rooms derived from the main room and then add detail pertaining to the type of room, such as bathroom. bedroom, kitchen. One great value of an object oriented programming language is that the data and procedures of any object can be hidden from every other object.

    No doubt embryo development is also object oriented. Text books say as much when they describe buds for appendages which appear on the early embryo. These buds are groups of cells which encapsulate all of the data and procedure needed to grow the complete appendage. If such a bud is transplanted to any other part of the embryo, it will grow a complete appendage there (arm, leg, wing, eye, or whatever). One can imagine the levels of objects descending from a bud, each one differentiating according to its relative position in the appendage. Most likely there is also an encapsulated hierarchical 3 D coordinate system which is being applied to each object level. With object encapsulation, only local (relative) coordinates would be needed.

    One big mystery is where in the genome that all this data and procedure is located. There can be no question that this is located somewhere, most likely in our ‘junk’ DNA. The data must contain sets of exact 3 D coordinates to account for exact placement of detail which we observe. Some form of coordinate reference frame must be established so that cells can apply their coordinate data. Perhaps cells count the number of divisions since the first. Possibly this count can be applied to differentiation or building up a coordinate reference frame using cell to cell communication between anchor cells.

    As described previously, at least some cells must know their exact 3 D position in order for us to explain how the newt can grow the exact replacement of part or all of a leg that has been lost.

    Excellent comment. -ds

Leave a Reply