Non Randomness of DNA as a whole

_{Salvador Cordova

April 27, 2014

'Junk DNA', News}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Stretches of DNA that code for proteins are considered non-random, but what about DNA as a whole?

“‘We’ve long known that on a small scale, DNA is a double helix…But if the double helix didn’t fold further, the genome in each cell would be two meters long. Scientists have not really understood how the double helix folds to fit into the nucleus of a human cell, which is only about a hundredth of a millimeter in diameter…’

“The researchers report two striking findings. First, the human genome is organized into two separate compartments, keeping active genes separate and accessible while sequestering unused DNA in a denser storage compartment. Chromosomes snake in and out of the two compartments repeatedly as their DNA alternates between active, gene-rich and inactive, gene-poor stretches….

“Second, at a finer scale, the genome adopts an unusual organization known in mathematics as a ‘fractal.’ The specific architecture the scientists found, called a ‘fractal globule,’ enables the cell to pack DNA incredibly tightly — the information density in the nucleus is trillions of times higher than on a computer chip — while avoiding the knots and tangles that might interfere with the cell’s ability to read its own genome. Moreover, the DNA can easily unfold and refold during gene activation, gene repression, and cell replication.” (EurekAlert! 2009)

“We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free, polymer conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model.” (Lieberman-Aiden et al. 2009:289)

Comments

groovamos, First off, again apologies for losing my temper at you last year. I'm now entering a phase in my exploration of ID where I can ill afford to not consider criticism from qualified engineers and scientists. Error corrections schemes as they became increasingly sophisticated required computation at both the encoding and decoding end, but especially at the decoding end where inferences had to be made as to which bits if any had been corrupted over the noisy channel and the proper correction needed to infer what the intended bit was. The very early digital communications (like telegraph like systems) probably had no error corrections. The CRC wasn't in play until after about 1961. Even though CRC is important for internal communication (like reading computer memory) it is still considered a digital communication. It apparently does have a role in Ethernet communications as well. Prior to the 60's there was PCM over the telephone/telegraph wires, and I doubt there was much in the way of error correction back then, certainly not any that required the level of real time computation by the "modems" of back then. Even if there had been a theoretical method to do appropriate error correction and increase speed, the hardware technology to do it was not available or at least not economical.
In the history of electrical communications, the earliest reason for sampling a signal was to interlace samples from multiple telegraphy sources, and convey them over a single telegraph cable. Telegraph time-division multiplexing (TDM) was conveyed as early as 1853, by the American inventor Moses G. Farmer. Electrical engineer W. M. Miner, in 1903, used an electro-mechanical commutator for time-division multiplex of multiple telegraph signals, and also applied this technology to telephony. He obtained intelligible speech from channels sampled at a rate above 3500–4300 Hz; lower rates were unsatisfactory. This was TDM, but pulse-amplitude modulation (PAM) rather than PCM. In 1920, the Bartlane cable picture transmission system, named for its inventors Harry G. Bartholomew and Maynard D. McFarlane,[6] used telegraph signaling of characters punched in paper tape to send samples of images quantized to 5 levels; whether this is considered PCM or not depends on how one interprets "pulse code", but it was transmission of quantized samples. In 1926, Paul M. Rainey of Western Electric patented a facsimile machine which transmitted its signal using 5-bit PCM, encoded by an opto-mechanical analog-to-digital converter.[7] The machine did not go into production. British engineer Alec Reeves, unaware of previous work, conceived the use of PCM for voice communication in 1937 while working for International Telephone and Telegraph in France. He described the theory and advantages, but no practical use resulted. Reeves filed for a French patent in 1938, and his US patent was granted in 1943. By this time Reeves was working at the Telecommunications Research Establishment (TRE). http://en.wikipedia.org/wiki/Pulse_code_modulation
scordova_{April 28, 2014
April
04
Apr
28
28
2014
05:30 PM
5
05
30
PM
PDT}

OK error correction, Ethernet uses various types of PAM. Below is a link to to a technical paper showing the wedding of trellis coding to Multiple Phase Shift Keying (modulation). I have not worked in the field so am ignorant of details of how this works but the take away from this is that to approach the Shannon limit, multiple techniques are wedded and are closely chosen to work together. The modulation scheme here is Multiple Phase Shift Keying. The error correction coding is adapted to the trellis code which fits in with the modulation scheme. There is also a randomizing function typically upstream from the error correction, and the modern version of this randomization is two concatenated randomizing algorithms which together are called Turbo Codes, which, historically speaking pushed performance from 14.4 bps on the POTS to 34.4 bps or 2.5-fold speedup. Thus if I may venture, you could say that the Trellis Code is an interface between the error correction and the modulation operation. http://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=293682&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D293682groovamos_{April 28, 2014
April
04
Apr
28
28
2014
04:20 PM
4
04
20
PM
PDT}

Thanks Barb, Wow! Great info.scordova_{April 28, 2014
April
04
Apr
28
28
2014
09:12 AM
9
09
12
AM
PDT}

Another interesting factoid: there's a new way to edit DNA, as seen by researchers at Emory University in Atlanta, GA: http://www.emoryhealthsciblog.com/crispr-way-edit-dna/Barb_{April 28, 2014
April
04
Apr
28
28
2014
09:08 AM
9
09
08
AM
PDT}

“Scientists have not really understood how the double helix folds to fit into the nucleus of a human cell, which is only about a hundredth of a millimeter in diameter…’” Interesting factoid: stretched out, the DNA in one cell of your body is about six feet (2 m) long. If you were to extract the DNA from all your body’s trillions of cells and put the strands end to end, the total length according to some estimates would be nearly 670 times the distance from the earth to the sun and back. To travel that distance at the speed of light would take about 185 hours.Barb_{April 28, 2014
April
04
Apr
28
28
2014
05:49 AM
5
05
49
AM
PDT}

The issue here is also one of navigation and addressing. If DNA is memory, how are its locations addressed? Non randomness might be important in address schemes.scordova_{April 28, 2014
April
04
Apr
28
28
2014
12:27 AM
12
12
27
AM
PDT}

I think it must be irritating to read that it is not the modulation technique nor error correction technique that pushed communication capacities to near the Shannon limit, but rather the ability to almost perfectly randomize the intelligible data stream.
The modulation technique I had in mind were of this variety:
http://en.wikipedia.org/wiki/Trellis_modulation In telecommunication, trellis modulation (also known as trellis coded modulation, or simply TCM) is a modulation scheme which allows highly efficient transmission of information over band-limited channels such as telephone lines. Trellis modulation was invented by Gottfried Ungerboeck working for IBM in the 1970s, and first described in a conference paper in 1976; but it went largely unnoticed until he published a new detailed exposition in 1982 which achieved sudden widespread recognition. In the late 1980s, modems operating over plain old telephone service (POTS) typically achieved 9.6 kbit/s by employing 4 bits per symbol QAM modulation at 2,400 baud (symbols/second). This bit rate ceiling existed despite the best efforts of many researchers, and some engineers predicted that without a major upgrade of the public phone infrastructure, the maximum achievable rate for a POTS modem might be 14 kbit/s for two-way communication (3,429 baud × 4 bits/symbol, using QAM).[citation needed] However, 14 kbit/s is only 40% of the theoretical maximum bit rate predicted by Shannon's Theorem for POTS lines (approximately 35 kbit/s). .... A flurry of research activity ensued, and by 1990 the International Telecommunication Union had published modem standards for the first trellis-modulated modem at 14.4 kilobits/s (2,400 baud and 6 bits per symbol). Over the next several years further advances in encoding, plus a corresponding symbol rate increase from 2,400 to 3,429 baud, allowed modems to achieve rates up to 34.3 kilobits/s (limited by maximum power regulations to 33.8 kilobits/s). Today, the most common trellis-modulated V.34 modems use a 4-dimensional set partition which is achieved by treating two 2-dimensional symbols as a single lattice. This set uses 8, 16, or 32 state convolutional codes to squeeze the equivalent of 6 to 10 bits into each symbol sent by the modem (for example, 2,400 baud × 8 bits/symbol = 19,200 bit/s).
scordova_{April 28, 2014
April
04
Apr
28
28
2014
12:12 AM
12
12
12
AM
PDT}

Speaking of codes, and determinism vs randomness, remember this: Modems were slow because of lack of the ability to implement sophisticated modulation and demodulation and error correction, contrary to what groovamos suggested by focusing on the relatively recent past. It’s irritating I have to point this out to defend my points. ?? I think it must be irritating to read that it is not the modulation technique nor error correction technique that pushed communication capacities to near the Shannon limit, but rather the ability to almost perfectly randomize the intelligible data stream. This essentially "fools" (maybe a not so good term) nature into treating the data stream as perfectly random. You did not acknowledge that I had introduced you to this crucial point. As far as error correction is concerned, all modern systems are able to communicate error rate measurement back and forth between the endpoints, and to adjust the amount of redundant information inserted into the data streams to bring the error rates into a negligible range, which varies the data rate to make room for the redundant data. Shannon discussed this quite a bit (haha) and stated that a system operating at optimum data rate in given, constant signal power/bandwidth/noise power scenario, with perfectly optimized error correction, would manifest an error immediately if the data rate were slightly increased. So when your 4G phone loads a webpage slowly at low signal power, now you know it is because the effective data rate AND and the compounded data rate have been slowed, because the latter includes redundant information inserted by the error-correction coding. The latter has been slowed because of the low signal/noise ratio and the former is doubly slowed, by both the S/N ratio and the need to make room for the bits making up the redundant information inserted by the error-correcting code. Now for some reason you balked at my using the '90's as a timeline for the asymptotic approach to the Shannon optimal performance. It's real simple - and I explained - that's because it was when it happened, when the advances occurred to the RANDOMIZING coding techniques. I think there is good chance I earned the first EE degree before you were born, and maybe the MS also, so I don't need to be hammered with the history of modern data communications. One thing I won't go into here is the ability for QAM modulation implementations to adjust the constellation point set size under control of the error-correction algorithm, but this is also a crucial facet of modern data communications. This helps the granularity of the error-correction scheme more finely granular, but this is a complex subject, actually a DSP implemented function. And so far as rudimentary FSK being the reason for slower communications back in the day, as they say, 'ain't necessarily so'. Ethernet has been based on FSK for decades, even though FSK is not as flexible as QAM for finely granular error correction stepping. But since ethernet maybe doesn't need this granularity being a more controlled, robust environmental scenario than that of telecommunications, the use of FSK seems not to matter a whit to ethernet data rates. BTW many people in my field were aware of Philips choosing the Cross-Interleave code back in the '80's for music CD error-correction. Anyone wondering what this is about can see the discussion here: https://uncommondescent.com/philosophy/when-designed-errors-are-the-perfect-design/groovamos_{April 27, 2014
April
04
Apr
27
27
2014
11:56 PM
11
11
56
PM
PDT}

Scientists have not really understood how...
I thought they had figured out everything by now. At least that's the impression one gets when reading what some popular science media publishes every now and then. Unfortunately many people seem unaware of the real situation in science and the pseudo-science bluffing :(Dionisio_{April 27, 2014
April
04
Apr
27
27
2014
08:49 PM
8
08
49
PM
PDT}

You must be logged in to post a comment.

Leave a Reply