Gauss’ Ghost

_{Robert Sheldon

March 17, 2010

Intelligent Design}

Share: Facebook; Twitter; LinkedIn; Flipboard; Print; Email

Johann Carl Friedrich Gauss was a polymath of no mean skill. Mathematicians bemoan the fact that he spent his later years doing physics, and physicists wish he had started earlier. One of his contributions was the derivation and proofs for the bell-shaped curve known as a “Gaussian” or “normal” distribution. It is the result of a random process in which small steps are taken in any direction. So universal is the “Gaussian” in all areas of life that it is taken to be prima facie evidence of a random process.

Only in recent years have people addressed situations that can deviate from a Gaussian. For example, one of the criteria that produce a Gaussian, is that the probability of a “small” step must be greater than the probability of a “big” step. That is, if we consider the random walk of the proverbial drunk near a lightpole, if he staggers in small steps most of the time, a plot of his position taken, say, every other second, would be a Gaussian. But if he staggers in big steps with a few small ones thrown in, then the plot begins to look peculiar. Instead of being a Gaussian, it develops a “fat tail“, with many locations far from the lightpole.

Now why is this important? Because many people predict that Darwinian evolution is driven by random processes of small steps. This implies that there must be some Gaussians there if we knew where to look.

Comments

#33 But the point of the post was to consider what Pagel had discovered, which I think has been admirably accomplished. Robert - oddly at the end of it all I am not sure what you think he discovered. I think he found some evidence that a lot more speciation is the result of single events (rather than accumulated change) then was previously thought.Mark Frank_{March 22, 2010
March
03
Mar
22
22
2010
02:58 PM
2
02
58
PM
PDT}

@Robert Sheldong: And no, I rarely use Wikipedia, esp. in regard to ID *LOL* One got another impression, as your article contains at least four links to wikipedia. Only that your link for "fat tails" isn't directed to their article on this subject, but to the one on the Lévy distribution...DiEb_{March 22, 2010
March
03
Mar
22
22
2010
07:27 AM
7
07
27
AM
PDT}

Heinrich, I am enlightened. And no, I rarely use Wikipedia, esp. in regard to ID ;) If you read my 1st comment, you would see the connection between intermittency and spatial diffusion. Which of course, is the weak point in the argument, but on the other hand, is often assumed. You may be right about the exponential having a well behaved Diffusion coefficient, I'm afraid that my classes only discussed power laws, and I expanded the exponential as 1 - x + x2/2! ... and concluded it had a power law -1, which would give an infinite diffusion coeff. If there is a more elegant way to do it, I will have to read up on it. As for the definition of "fat tails", I was using it evidently in a non-mathematical way, for distributions whose tails have a smaller power law than Gaussian. Clearly not the Wikipedia definition. But the point of the post was to consider what Pagel had discovered, which I think has been admirably accomplished.Robert Sheldon_{March 22, 2010
March
03
Mar
22
22
2010
06:56 AM
6
06
56
AM
PDT}

Thanks, Allen. The technical details relate to getting correct what I have called the cropped phylogenetic tree -- a history of extant species and their ancestral species. My concern is with the species that emerged, but that do not have descendant species among extant species. To be concrete, I would expect that many bumblebee species have emerged and gone extinct.Sooner Emeritus_{March 20, 2010
March
03
Mar
20
20
2010
03:51 PM
3
03
51
PM
PDT}

Robert, you've done a superb job of presenting this rather vexing (for evolutionist) result of Meade and Pagel. I, too, saw the article in New Scientist and was thinking about starting a thread along the same line you have; but you're much more in charge of the mathematics than I could have ever been, so, I'm happy I didn't post before you. I think Pagel's result is rather devastating to Darwinism, and for all the reasons you've pointed out. And your connection between this finding and the work of Behe is, IMHO, straightforward and solid. Keep up the good work.PaV_{March 20, 2010
March
03
Mar
20
20
2010
11:57 AM
11
11
57
AM
PDT}

The authors do look at removing short branches (see the supplemental info.), and find no difference. The problem Sooner Emeritus raises won't affect the exponential distribution, and I'm not sure what will happen with the rest. I guess it's only a problem if extinctions make the distribution more exponential-like. But I don't understand phylogenetic trees enough to have a clear understanding.Heinrich_{March 20, 2010
March
03
Mar
20
20
2010
11:03 AM
11
11
03
AM
PDT}

Sorry, the line breaks were not visible in the preview.Allen_MacNeill_{March 20, 2010
March
03
Mar
20
20
2010
08:16 AM
8
08
16
AM
PDT}

Here's their methodology (copied directly from Venditti, C., Meade, A., and Pagel, M., Nature (21 January 2010):
We studied the frequency distributions of these branch lengths in 101 phylogenies inferred from gene-sequence data, and selected for including a well-characterized and narrow taxonomic range of species. This reduces background differences in life histories, morphology and ecology that might affect rates of speciation. Our data sets include bumblebees, cats, turtles and roses (Supplementary Information). For each of the gene-sequence alignments, we inferred a Bayesian posterior probability sample of 750 phylogenetic trees using our phylogenetic mixture model8 (Supplementary Information). We used uniform (0–10) priors on branch lengths to avoid biasing towards short or long branches, although exponential priors gave the same results. The mixture model improves on conventional single rate-matrix models and on partitioned models, more accurately recovers branch lengths and reduces artefacts of phylogeny reconstruction 8,9. Accurate reconstruction of branch lengths is crucial, as, for example, systematically underestimating the true lengths of long branches would bias the branch-length distribution away from longtailed distributions. We excluded any data sets in which the inferred trees suffered from node-density artefacts10,11. We characterized the frequency distributions of the phylogenetic branches using statistical models that make differing assumptions about the expected amount of divergence or waiting times between successive speciation events. We suppose there are many potential causes of speciation, including environmental and behavioural changes, purely physical factors such as the uplifting of a mountain range that divides two populations, or genetic and genomic changes. If many independent factors combine additively to produce a speciation event, the distribution of branch lengths will conform to a normal probability density; if they combine multiplicatively, a lognormal density of lengths will arise. Suppose the factors are rare but large in number, where ‘rare’ means occurring at a rate less than the rate of speciation. Then their distribution over long periods spanning many speciation events will follow a Poisson density12. If these factors have the potential on their own to cause a speciation, the branch length distribution will follow an exponential density12, that being the waiting time between successive events of a Poisson process. This is also the density that arises if there is a constant probability of speciation. A variant of the exponential model allows the multiple rare factors to affect species differently such that they have different constant rates13 (hereafter the variable-rates model), as might be expected of a species radiation3. Another variation of the exponential—the Weibull density—can accommodate the probability of speciation changing according to the amount of divergence from the ancestral species. This model will fit the data if, for example, species are either more or less likely to speciate the older they get.
Allen_MacNeill_{March 20, 2010
March
03
Mar
20
20
2010
08:15 AM
8
08
15
AM
PDT}

Allen MacNeill:
Notice once again that “descent with modification” does not require (or even imply) progress.
Thank you Allen. That helps prove my point that descent with modification does not expect a nested hierarchy because nested hierarchies require a progression.Joseph_{March 20, 2010
March
03
Mar
20
20
2010
07:01 AM
7
07
01
AM
PDT}

Allen MacNeill, I've read only the New Scientist fluff and the abstract, but I have some doubts about the methodology. Please set me straight if I'm misunderstanding something. From the abstract:
Phylogenetic branch lengths record the amount of time or evolutionary change between successive events of speciation.
This statement should be qualified by noting that a phylogenetic tree represents belief about evolutionary history. Most species of the past are not ancestors of any species for which we may observe a genome. So we're unable to infer most speciation events from genomes. The species for which genomes are observed constitute the leaf nodes of the inferred tree. Only branches on paths leading from the root (common ancestor) to these leaves can be inferred. There is implicit cropping of all branches in the actual tree (the tree we might construct if we had a time machine) that lead only to extinct species. It seems to me that the further we go back in time, the more branches in the actual tree are missing from the inferred tree. Thus I doubt that the lengths of old branches in inferred phylogenetic trees accurately indicate interarrival times of speciation events. There are generally more young branches than old branches in a tree. If young branches are short and old branches are long, then the exponential distribution of branch lengths might be an artifact of phylogenetic tree inference.Sooner Emeritus_{March 20, 2010
March
03
Mar
20
20
2010
12:43 AM
12
12
43
AM
PDT}

In comment #20 Robert Sheldon wrote:
"...progress is not just impossible, but prohibited."
Oddly enough, this is exactly right. As many evolutionary biologists (most notably Stephen J. Gould) have pointed out, evolution is not necessarily progressive, nor does it have a predetermined direction. This is particularly the case for speciation, which (unlike natural selection) doesn't necessarily involve the evolution of functional adaptations.
"Remember, the whole point of Darwin was to explain progress as an appearance of design, but actually random." [emphasis added]
Wrong again. One of Darwin's two main goals in the Origin of Species was to provide a "naturalistic" explanation for the origin of adaptation (i.e. not "progress"). Darwin's other goal was to show that "descent with modification" (his preferred term for what we now refer to as "evolution") had occurred. Notice once again that "descent with modification" does not require (or even imply) progress.
"So the exponential destroys the whole point of the exercise, it destroys progress."
Exactly right. As the foregoing should make clear, the finding that the changes that result in the divergence of new clades approximate an exponential distribution does indeed destroy the whole concept of progress in speciation. As I hope I have now made clear, this is exactly what most evolutionary biologists have also asserted about phylogenetic evolution: that unlike a "designed process", it is not necessarily progressive. I would also like to take this opportunity to concede that in my first response to Robert Sheldon's original post 1) I over-reacted, and 2) I misunderstood the point he was trying to make about Gaussian distributions. My thanks to Mark Frank, Nakashima-san, and Robert Sheldon for helping me to come to clarity on these issues. Now that I have had time to think about Venditti, Meade, and Pagel’s analysis and how it relates to the evolutionary model of speciation (and, more generally, cladogenesis), I realize that their crucial finding was that the overwhelming majority of the observed data fit an exponential distribution, and that this is indeed very strong evidence for the hypothesis that speciation is a result of single events, happening rarely and essentially at random:
"...the causes of speciation are many and rare, not necessarily limited to biotic interactions, and each individually having the potential to cause a speciation event....Speciation is freed from the gradual tug of natural selection, there need not be an ‘arms race’ between the species and its environment, nor even any biotic effects."
Furthermore, the proximate causes of cladogenesis cited by Venditti, Meade, and Pagel (all of which have been cited by evolutionary biologists as the causes of speciation):
"...polyploidy, altered sex determination mechanisms, chromosomal rearrangements, accumulation of genetic incompatibilities, sensory drive, hybridization, and the many physical factors included in the metaphor of mountain range uplift..."
most definitely qualify as the rare (and essentially random) events that have the effect of produding an exponential distribution of cladogenetic "forks". Finally, and as I suggested in an earlier comment, the genetic changes noted in Venditti, Meade, and Pagel's analysis may have happened after the reproductive isolation between the branching clades, rather than causing the branching:
"...the gradual genetic and other changes that normally accompany speciation [listed above] may often be consequential to the event that promotes the reproductive isolation, rather than causal themselves.
Reference Cited: Venditti, C., Meade, A., and Pagel, M., Nature (21 January 2010), vol. 463, pg. 351Allen_MacNeill_{March 19, 2010
March
03
Mar
19
19
2010
06:45 PM
6
06
45
PM
PDT}

Robert Sheldon, A fat-tailed (heavy-tailed) distribution has a tail that is not exponentially bounded. You demonstrate above that the exponential and Gaussian distributions have exponentially bounded tails, and thus are not fat-tailed. The variance of the exponential distribution is finite, and I don't understand your concern about "infinite energy." It's a bit ironic that the log-normal distribution, which is the best fit for 8% of the data sets, is fat-tailed.Sooner Emeritus_{March 19, 2010
March
03
Mar
19
19
2010
02:09 PM
2
02
09
PM
PDT}

It is these concerns that prompted me to say that Darwinian diffusion of traits does not work, and hence the idea of random variation does not work, irregardless of whether natural selection operates or not.
Diffusion of traits has (almost) nothing to do with the Pagel paper. It's about the distribution of speciation times. Most of what you have written is, quite frankly, irrelevant to the Pagel paper you're discussing: Point 1 is about trait evolution, not speciation times Point 2 seems to be about trait evolution as well. Your first point 3 also seems to be about trait evolution: the infinite energy argument applies just as well to the Gaussian distribution, so I've no idea what you're on about. Incidentally, the exponential distribution is positive, so requiring the absolution value is redundant. Your second point 3 is about entropy, but I can't see the connection to trait distributions any more. Point 4 seems to be about diffusion of something through a population. But what's diffusing isn't clear. Also, the infinite speed may be irrelevant (depending on what exactly you're on about): if you're talking about spatial spread, then an exponential dispersal kernel gives a constant wave of advance. Mollison proved this in the 80s: the precise condition for a wave of advance with a constant rate is that the tail is exponentially bounded (which the exponential obviously is). Point 5 is about Pagel, but is, simply, wrong. The paper is about times between speciation events, they didn't discuss genotypes in any meaningful sense (the only sense they did was in terms of sequence identity). Point 6 implies that physicists aren't able to use google, or wikipedia. :-)Heinrich_{March 19, 2010
March
03
Mar
19
19
2010
10:11 AM
10
10
11
AM
PDT}

Thanks Mark, I agree that the lack of a small step size is the key to the exponential. But there are numerous other problems with the exponential, in contrast to the Gaussian, that cause concern. It is these concerns that prompted me to say that Darwinian diffusion of traits does not work, and hence the idea of random variation does not work, irregardless of whether natural selection operates or not. 1) This, BTW, is the essential point of ID and Behe, that random variation is an insufficient source of the information actually observed in evolution. Now lest you think I am overinterpreting this result, remember how Dawkins tried to counter Behe by his analogy of "climbing Mt Improbable", taking small steps in the right direction. Dembski's critique is that Darwin prohibits speciation from "knowing" the right direction. Pagel's paper says it didn't even happen in small steps. So now the analogy becomes "teleporting to the top of Mt Improbable". Which as Behe reminds us, was not observed in Plasmodium in over a century of quinine challenge, and from all the statistics we know, is utterly improbable. It is this conclusion that I refer to when I say, "it wasn't random". 2) Now lets go back to the "random" component of the exponential. When a radioactive atom decays, it is independent of its previous history or the history of all the other radioactive atoms in its vicinity. It has no "memory". In contrast, in the random walk problem, each step is in a random direction, (e.g., Markovian) but each step proceeds from the location of the last step. So there is a spatial memory, even if there is no temporal memory. If the drunk were to create an exponential distribution with his random walk, then before each step, he would have to be teleported back to the lightpost. So if Darwinian evolution were to produce an exponential, it would have to be teleported back to its "pristine" condition. In other words, progress is not just impossible, but prohibited. Remember, the whole point of Darwin was to explain progress as an appearance of design, but actually random. So the exponential destroys the whole point of the exercise, it destroys progress. 3) But the exponential is worse than that. The "fat tail" means that there are numerous events that are very far from the mean. If you look at the Gaussian, the tail goes as exp(-x2/s), but the exponential goes as exp(-|x|/s). The tail persists much further than the Gaussian. Now if this were an energy distribution, we would say that fat-tails produce a distribution with infinite energy. By analogy, large steps away from the mean correspond to highly improbable events which nonetheless survive in the genome because they do something. Following Dembski, we'll call that information. Then the exponential has nearly infinite information, because no matter how far out you go, you still find events. (Of course, there are other limits to the integral which keep it from being infinite, but mathematically it is an odd beast.) Once again, if information were being added randomly, this is not the distribution that would result. Another way of saying that is entropy. 3) Again, if this were an energy distribution, then the condition that the energy be constant and the entropy maximized gives the Gaussian. Now the entropy is maximized when the distribution wins the lottery of which distribution can be shuffled the most number of ways. Briefly if you have two dice, there is only one way to roll snake eyes, but 6 ways to roll a "7", so "7" has maximum entropy. So since we have made the analogy between energy and information, the Gaussian is the most probable way to achieve information X. The same information may be in the exponential, but it becomes much less likely, since so much of it is in highly improbable "fat-tail" locations. 4) Finally we come to the time evolution of distributions. How does information pass through the population. Random step processes that obey Gaussian statistics progress through a diffusion "wave" with a distinct velocity. But "fat-tail" distributions do not have a finite diffusion coefficient, and therefore have potential infinite "speed". Just as an exponential has no "spatial" memory, it also has no "temporal" memory. This means that the entire population, independent of islands, breeding populations, sub-groups, sexual preferences etc, is behaving similarly. Or if you prefer, behaving globally. Then the information is being transmitted identically to each member, and must be the result of some internal law or external design. We're right back to ID again. 5) Now to respond to the phenotype vs genotype or the "separate breeding population = species", neither of these options are what Pagel addressed. He looked only at genotypes, and he looked only at distinct (one hundred and one) species. So your objections may indeed be correct, but irrelevant to Pagel's paper. 6) The mathematical definition of a "fat tail"? I have no idea, I was trained as a physicist. Perhaps someone can enlighten me.Robert Sheldon_{March 19, 2010
March
03
Mar
19
19
2010
09:41 AM
9
09
41
AM
PDT}

The exponential distribution may have a fatter tail than the Gaussian, but it isn't fat tailed according to the usual definition. And when trying to talk about math, we should stick to usual concepts. words mean things and all that.DiEb_{March 19, 2010
March
03
Mar
19
19
2010
08:54 AM
8
08
54
AM
PDT}

Robert Thank you for your polite and thoughtful responses. I too hope to learn something from this. I guess if your battery is running low you will not read this for sometime. I dispute that a Gaussian distribution of speciation requires a random distribution of underlying events - but in part that depends on what you mean by "random". The important result is that the significant difference between the exponential result and the Gaussian result is not whether either is "random" but the step size. Do you accept that this conflicts with much of your original post?Mark Frank_{March 19, 2010
March
03
Mar
19
19
2010
08:21 AM
8
08
21
AM
PDT}

Mark and Allen, I think we are converging on the point of the paper. So these discussions are very profitable! (A point I've often wondered about.) I unnecessarily confused the debate by lumping two concepts into one word. There is the matter of small steps, and the matter of random direction. A Gaussian requires both. An exponential negates the first, and demands the second. Several other distributions modify the first, without negating the second. The link is the central limit theorem, which says that if the probability of step size decreases with size by at least a power -2, then the distribution will be Gaussian. This is a powerful theorem, and means that lots of probability distributions have a Gaussian as their equilibrium. A Cauchy distribution, BTW has only a power -1. An exponential, if I remember my math correctly, also as a power -1. On a log-log plot, which shows the power laws as straight lines, the exponential lies above the Gaussian, hence, "fat tail" (My battery is dying, so I'll try to draw out the consequences of the "small-step" hypothesis in the next post. As Mark suggests, it is related to Eldredge and Gould's hypothesis.)Robert Sheldon_{March 19, 2010
March
03
Mar
19
19
2010
07:32 AM
7
07
32
AM
PDT}

#15 JDH I agree that Allen's first comment misses the mark and have made a comment on his own blog. But let's be clear Robert Sheldon's post also misses the mark. In the fresh light of the morning think I can put it more simply than in my earlier comments. As I understand it Robert is trying to argue that the exponential distribution is evidence that the speciation process is not random. In fact it is only evidence that the process is not comprised of small steps. If anything it is evidence that the process is random. First let's define what we mean by random events distribution of events. A fair definition might be a class of events where the probability of one event happening in a given time is not conditional on the occurrence of other events in the same class or any other known condition (feel free to suggest a better one). If this probability is constant then it follows that the time between events is exponentially distributed. The Gaussian distribution arises because the steps were "small" in the sense of requiring many steps to achieve speciation. The small steps do not have to be random. Almost any probability distribution of small steps will result in a Gaussian distribution of large steps through the central limit theorem. On the other hand the exponential distribution is random. So the exponential distribution shows that the steps were large and random while the Gaussian that they were small and may or may not be random.Mark Frank_{March 18, 2010
March
03
Mar
18
18
2010
11:48 PM
11
11
48
PM
PDT}

Allen, I mostly find your comments on this site relevant and intelligent. That being the case I think you should withdraw your comments from this thread. It is obvious you did not understand the point of the article. The whole point of the article and the underlying paper is to examine the distribution of the various step sizes in proposed speicization. Admittedly this is a hard problem but I think it can be grasped. If, as species evolve and differentiate through a series of unknown events, the step sizes are small and random - then the inevitable Gaussian distribution should occur. The fact that it did not occur immediately implies the step sizes were not both small and random. No guess is made as to what the other process were, and what is the source of variation. Your example of human height is totally missing the point. I hope you will reread the article and try and get by the points you obviously missed.JDH_{March 18, 2010
March
03
Mar
18
18
2010
10:54 PM
10
10
54
PM
PDT}

Mr Sheldon, Several points relating to your article. Mendel found the distributions he did because he was looking at single locus traits. In such phenotypic traits, the digital genotype shows clearly. However, a phenotypic trait that depended on multiple loci would show a Gaussian distribution that resulted from the interaction of several uniform distributions. (For example, rolling two dice, both uniformly random, results in a normally random sum.) The important thing to note here is the distinction of phenotype and genotype. Darwin may have mistakenly assumed a continuous and Gaussian genotype from the observation of Gaussian phenotypes. Whatever his reasoning, we do know that Darwin's hypothesizing about the process of heritability was very wrong. That, however, is somewhat to the side of any discussion of speciation. We know that there are models, such as sand pile models, where small changes are delivered at a relatively stable and constant rate, but the larger behavior of the system can follow a temporal distribution that is exponential or power law distributed. Such a sand pile model might be a more appropriate way to think about speciation. Accumulating small changes in a variety of species in an ecosystem may eventually lead to an avalanche of changes across the whole ecosystem. But it should also be recognised that small genotypic changes can lead to large phenotypic variations, gradualism at one level driving saltations at the other. This can be the result of changing traits that function very early in development. In a model that takes into account the developmental hierarchies of genes, the uniform distribution of gradual changes creates phenotypic change according to a power law or exponential distribution, depending on where in the hierarchy the change occurs. Such a variation in an individual may lead to speciation, depending on a large number of other factors.Nakashima_{March 18, 2010
March
03
Mar
18
18
2010
01:42 PM
1
01
42
PM
PDT}

Robert you seem to be trying to argue that in some sense a Gaussian distribution is the result of a truly random process whereas an exponential distribution is not or is “random with added laws”. You also write that “However, unlike a Gaussian, lots of other non-random things make exponentials.” These are such strange statements it is hard to know where to start without writing an essay. Here are few points: 1) Some random processes lead to Gaussian distributions. Some lead to exponential or other distributions. Some processes lead to both depending on which variable you are measuring e.g. a Bernoulli process will approximate to a Gaussian distribution for the number of successes in N trials as N gets large and an exponential distribution of the number of trials before success. To say that one distribution in some sense is more random than another is, to put it simply, rubbish. 2) All stochastic processes have a random element but it may be worth pointing out that a random walk where the step size is always the same leads to Gaussian distribution whereas if the step size is itself a random variable then the distribution may be different. Which is the more random process? 3) I am not sure where you get the idea that the exponential distribution lacks maximum entropy. To quote Wikipedia: Among all continuous probability distributions with support [0,?) and mean ?, the exponential distribution with ? = 1/? has the largest entropy. 4) The definition of “random events” is ambiguous but if you were to ask a group of statisticians what is the distribution of time between random events I would hope the majority would reply “depends what you mean”, I am sure the second largest group would say “exponential distribution”. I doubt that any would say Gaussian. For example, the time between clicks on Geiger counter is exponentially distributed. Now let’s come to the main point. We are comparing two models: a) Speciation is the result of the accumulation of “random” events. b) Speciation is the result of single “random” events. I don’t dispute that (a) is expected to lead a Gaussian distribution of time between speciation and (b) leads to an exponential distribution. But that is only because (a) requires accumulation and (b) does not. As far as the constituent events are concerned in both cases we would expect them to arrive with an exponential distribution (although model (a) would lead to a Gaussian distribution of speciation events even if the arrival time were not “random”). In all of your reply the only coherent argument I can see for model (b) not being random is that large events are just too improbable to happen (because they require multiple simultaneous mutations). Pagel provides examples of such large events which do not require multiple simultaneous mutations: Factors apart from biotic interactions that can cause speciation include polyploidy, altered sex determination mechanisms22, chromosomal rearrangements, accumulation of genetic incompatibilities, sensory drive, hybridization and the many physical factors included in the metaphor of mountain range uplift. I don’t pretend to understand all of these, I am not a biologist, but he provides references.Mark Frank_{March 18, 2010
March
03
Mar
18
18
2010
01:39 PM
1
01
39
PM
PDT}

Allen MacNeill:
Wrong. Species don’t necessarily differ from each other at all, either genetically or phenotypically.
Then what makes them different species?
What makes them different species is that members of each species do not interbreed with members of the other species.
So if memebers of my family refuse to interbreed with members of another family that means the two families are different species? Are you serious?
Why they don’t interbreed can be very complex, from phenotypically silent chromosomal mutations (especially inversions, polyploidies, and translocations) to ecological/temporal/spatial separation to behavioral differences that may be almost entirely learned (i.e. not genetic).
So it's either genetic, a choice or not even being able to choose that "makes" a species? Can the concept of species be any more ambiguous?Joseph_{March 18, 2010
March
03
Mar
18
18
2010
01:35 PM
1
01
35
PM
PDT}

@Robert Sheldon 1. Every distribution has a mean, it is not a distinguishing feature of the Gaussian. That's just wrong, especially as you mentioned the Cauchy distribution. 2. The exponential distribution isn't fat tailed. 3. In mathematics, the heat kernel is really ubiquitous, and just a Gaussian distribution in disguise: it isn't limited to randomness...DiEb_{March 18, 2010
March
03
Mar
18
18
2010
01:11 PM
1
01
11
PM
PDT}

In comment #8 Robert Sheldon wrote:
"1) species differ from each other by multiple mutations or variations. I don’t think this is a controversial point."
Wrong. Species don't necessarily differ from each other at all, either genetically or phenotypically. Indeed, one of the most interesting discoveries in evolutionary biology of the past 20 years has been the discovery that what appeared to be one species is actually a whole aggregation of many closely related species. What makes them different species is that members of each species do not interbreed with members of the other species. Why they don't interbreed can be very complex, from phenotypically silent chromosomal mutations (especially inversions, polyploidies, and translocations) to ecological/temporal/spatial separation to behavioral differences that may be almost entirely learned (i.e. not genetic). Furthermore, Darwin himself didn't suggest any genetic mechanism underlying speciation at all. Indeed, Darwin almost completely disregarded genetics, as the dominant "theory" of genetics at the time was blended inheritance. Darwin did emphasize that sterility between species was an important cause of the divergence of species, but he was emphatic that natural selection couldn't be the cause of such sterility:
"The importance of the fact that hybrids are very generally sterile, has, I think, been much underrated by some late writers. On the theory of natural selection the case is especially important, inasmuch as the sterility of hybrids could not possibly be of any advantage to them, and therefore could not have been acquired by the continued preservation of successive profitable degrees of sterility. I hope, however, to be able to show that sterility is not a specially acquired or endowed quality, but is incidental on other acquired differences." [Origin of Species, 1st ed., ch. 8, pg. 245; see http://darwin-online.org.uk/content/frameset?itemID=F373&viewtype=side&pageseq=263
Furthermore, it is not necessarily the case that the divergence of one species into another is the result of "large jumps", even if the genomic evidence suggests that the changes correlated with such divergence are not additive or multiplicative. As the quotation from Wolfram emphasizes, exponential distributions (not Gaussian distributions) are the hallmark of purely random (i.e. memoryless) events happening at random moments during an otherwise extended period of "stasis" (i.e. no change). This is why the production of decay products as the result of the radioactive decay of an aggregation of radioactive nuclei produces an exponential distribution, not a Gaussian distribution. Once again, Venditti, Meade, and Pagel’s analysis showed that the pattern of cladogenesis in eukaryotes approximated an exponential distribution in 78% of the phylogenies studied, strong evidence that the branch points in the phylogenies are both purely random and also fundamentally unpredictable in the same way that the decay of a single radioactive nucleus in unpredictable. It is also not the case that the changes that result in cladogenesis are necessarily "large" changes. Rather, they are single (i.e. non-cumulative) changes, meaning that they only have to happen once to produce a branch in a phylogeny. The fact that 78% of the branch points in the phylogenies studies by Venditti, Meade, and Pagel were the result of single changes does not necessarily mean that such changes were "large". It only means that they were sufficient to cause the splitting of the lineage (i.e. sufficient to result in reproductive isolation). Personally, I take issue with the assumption underlying Venditti, Meade, and Pagel's analysis that genetic changes are necessarily markers of the branch points in cladogenesis. Yes, it is the case that certain types of genetic changes can result in reproductive isolation (allopolyploidy and autopolyploidy in plants come immediately to mind), but it isn't clear that Venditti, Meade, and Pagel have distinguished whether genetic changes are a cause or an effect of cladistic divergence. Indeed, it is quite possible that the genetic changes observed in the phylogenies that Venditti, Meade, and Pagel studied happened after reproductive isolation had already taken place. This disjunction between reproductive isolation and genetic incompatibility is quite common among eukaryotes. There are numerous examples of reproductively isolated species (such as wolves and dogs) which are nonetheless fully inter-fertile, especially under artificial conditions. In sum, the evidence provided by Venditti, Meade, and Pagel support the hypothesis that speciation occurs as the result of single, isolated, "memoryless", and mostly random events, and support Eldredge and Gould's theory of "punctuated equilibrium", a widely accepted and empirically supported model for the origin of species according to current evolutionary theory. For much more on all of this, see: http://evolutionlist.blogspot.com/2006/03/origin-of-specious.htmlAllen_MacNeill_{March 18, 2010
March
03
Mar
18
18
2010
12:58 PM
12
12
58
PM
PDT}

This may well be true. However, unlike a Gaussian, lots of other non-random things make exponentials.
Such as...?
Thus Pagel’s data support the idea that speciation occurred by large jumps, which are not occurring randomly,
No, they suggest that epeciation events are memoryless, but they could still be small jumps. His model isn't about trait differences, it's about the time between speciation events.Heinrich_{March 18, 2010
March
03
Mar
18
18
2010
12:37 PM
12
12
37
PM
PDT}

Mark, thank you for your long comment. Once more, here's my reply to your reply. One caveat--in my blog and in my comment, I said repeatedly that Gaussians imply randomness, whereas non-Gaussians usually imply non-randomness. Your point is that exponentials can be the result of a random process, so would be one of the "unusual" examples of a random process producing non-Gaussians. This may well be true. However, unlike a Gaussian, lots of other non-random things make exponentials. So since you produced a mechanism, lets pull it apart and see what it means that speciation events are NOT cumulative. 1) species differ from each other by multiple mutations or variations. I don't think this is a controversial point. 2) Darwin suggested that species gradually mutate until something keeps the two sub-populations apart and we have a new species. Island populations were thought to do this, lots of population genetics solutions have been proposed. However, no matter which of the solutions you subscribe to, they all involve multiple, cumulative changes. This is not controversial. 3) Multiple, cumulative changes that proceed randomly are also known as a "random walk". Again, this is not controversial. 4) "distance" traversed in a random walk, is described by a diffusion equation, df/dt = D dx2/dt2, which has solutions of f(x) that look like Gaussians. Not controversial. 5) Then the expectation was that cumulative small mutations should be distributed (using some sort of linear mapping between the mutation time-differences and a spatial coordinate x) as a Gaussian. Let me say this again. In our random walk example, we take each mutation event as a "step", where the initial position (lightpole) and the final position (new species) is the "distance" coordinate. Pagel didn't have any information about steps that were reversed, so instead he looked at the "time between steps". This is an added sophistication to our diffusion model, and has to do with "intermittency". There's a lot of literature on the topic which I am not that conversant in, but it is my general impression that this temporal measurement maps back into the spatial definition through the diffusion equation. (if you would like to direct me to the literature, I'd be happy to reconsider.) So that means a cumulative diffusive approach to a new species should demonstrate a Gaussian distribution in "speciation-space" and a concurrent Gaussian distribution in "delta-time" space. But Pagel found an exponential. What does this mean? One of our assumptions about speciation is wrong. 6) Pagel, and apparently your blog, think that one-step speciation events are the answer. That is, there is no accumulation of small steps to produce speciation, but sudden, big steps, occurring at very infrequent intervals. Okay, let's pull that apart. a) Does this require randomness? No. Lots of causes can have this effect. Does this require non-randomness? No, because the big jumps can be random too. But what it does require is that there NOT be a cumulative small step = diffusion. b) I alluded to these "fat-tail" solutions in the original blog. And yes, exponentials are a "fat-tail" along with Cauchy, Levy and Poisson distributions. Fat-tail distributions have a number of disturbing properties: i) they do not exhibit maximum entropy. There are rearrangements of the members that are more probable than themselves. This means that they are inherently improbable, all other things being equal. In other words, some law other than random chance is producing this distribution, because random chance ought to produce the most probable (maximum entropy) distribution. ii) they do not exhibit minimum energy. There are rearrangements that minimize the "work" (which we arbitrarily claim is proportional to the "size" or "number of single codon replacements" of the mutation.) iii) they do not have a finite diffusion coefficient. This means they exhibit infinite diffusional velocity. In other words, they get places faster than chance. For example, bacteria when searching for food use a Levy-flight search pattern, since it is more efficient than diffusive-search. This is an example of purpose-driven search, which in our case would correspond to something law-like that is driving the mutation rate. c) All this non-Gaussian difficulty could be avoided if the delta-t distribution were Gaussian. But since Pagel and your blog finds this otherwise, they claim that big steps can still be random. Perhaps, but it now becomes a "random-with-added-laws" mechanism, which is a bit disturbing. d) ID argues that large jumps are probabilistically forbidden. That is, 3 simultaneous mutations in the Plasmodium that causes malaria are required to convey quinine resistance. Mike Behe calculates this in his book "The Edge of Evolution". It took something like 10^20 generations of Plasmodium to acquire this mutation. Larger jumps, like those seen in Pagel's data, would then be impossible by random chance. Thus Pagel's data support the idea that speciation occurred by large jumps, which are not occurring randomly, and indeed support Behe's contention that speciation is NOT driven by Darwinian mechanisms. What drives it, neither Behe nor Pagel knows, but one thing we can be certain of, it isn't Darwinian diffusion.Robert Sheldon_{March 18, 2010
March
03
Mar
18
18
2010
11:10 AM
11
11
10
AM
PDT}

Robert Sheldon: "So for you to say that the exponential distribution is “random” is the assumption, not the data to support it. " Statistics fail. From Wolfram: The exponential distribution is the only continuous memoryless random distribution. pilkington_{March 18, 2010
March
03
Mar
18
18
2010
11:04 AM
11
11
04
AM
PDT}

In what sense is the exponential distribution "not random"? Exponential is just what you would expect if events happen at a roughly constant rate but independently of each other. If you think about it the "small events" which comprise the normal distribution model may well also be exponentially distributed - in fact that is what I would expect. The Gaussian element only arises because the necessity for them to accumulate - it is nothing to do with their "randomness". The only important difference in the distribution of the events is that in one case they accumulate to cause speciation and in the other they are large enough to cause speciation by themselves. Neither model implies design and both models have been accepted for decades - the controversy is the frequency of the "large event" model. (I have written a longer piece about this but it was too long for a comment.)Mark Frank_{March 18, 2010
March
03
Mar
18
18
2010
10:12 AM
10
10
12
AM
PDT}

DLH -- thanks for the defence! Allen Your response is longer than the post! So if you will humor me, I will try to address your points. 1) If you hadn't noticed, I avoided most of the math concerning the mean value theorem and the conditions that need to be fulfilled in order for the Gaussian to be a valid solution to the random walk problem. To say it several other ways, non-Markovian random walks do not produce a Gaussian, but that is because most people would say they aren't random. Likewise, Levy-flight and Levy-stable distributions do not produce Gaussians, because the probability of the step sizes are not sufficiently decreasing (power law -2). So when one analyzes an observed distribution, one inductively determines the rules that produced it, and the rules that produced non-Gaussians are generally (though not always) considered non-random. This is the point of the blog, and isn't that controversial. One more time: Gaussian --> random, random --> Gaussian. Your objection that Gaussian--> "probability distribution clustered about a mean" is of course true in a very general way, but doesn't distinguish between say Gaussian, Cauchy, Levy, and Poisson distributions. In other words, if you understood statistics, you wouldn't be making the accusation. 2) Every distribution has a mean, it is not a distinguishing feature of the Gaussian. Nor must the mean value of a Gaussian be random. It can be any value you like, because it is the _distribution_ that is the Gaussian, not the mean. Again, read those Wikipedia entries and try to absorb the statistics. 3) You said, "While it is the case that Gaussian distributions are the result of random deviations, they are random deviations from a mean value, which is assumed to be the result of a determinative process." You are making my point for me. You just said "Gaussian" --> random deviations I said nothing about mean values, and don't know if they are determinative or not. So why are you so upset if you restate my main conclusion as fact? 4) In the next few paragraphs you state that I claimed the mean value is random, when it is just the deviation that is random. I said no such thing. In fact, I was deliberately vague. I said "there must be some Gaussians there if we knew where to look." Which you agreed, when you said the deviations would be random. So we are still agreeing. Stop hyperventilating. 5) You go on to say I am "profoundly wrong" in the following quote, "Sheldon goes on to state in the OP that “[s]o universal is the “Gaussian” in all areas of life that it is taken to be prima facie evidence of a random process.” This is simply wrong; " and then you give your own definition of a process that produces a Gaussian. "A Gaussian distribution is evidence of random deviation from a determined value" Great. Now tell me why "a random deviation from a determined value" is NOT a process? Perhaps you think processes are not deviations, or perhaps you think processes are not random? I'm really having trouble with your analysis here, because in the first paragraph you give a more general definition of a Gaussian than I do, and now you give a more narrow definition of a Gaussian than I do, all the while saying that I am wrong, wrong, wrong. Frankly, you can't do both. Either I am too general or too specific, which is it? 6) You say, "Sheldon goes on to strongly imply that such Gaussian distributions are not found in nature, and that instead most or all variation in nature is “discontinuous”. " You are free to draw any inferences you want, but I'm afraid I didn't imply any such thing. What I actually said, was that Gaussians are not found where they are expected. Exponentials, Cauchy, Levy distributions are all continuous distributions, but they aren't Gaussian. 7) You also say that I was wrong about Darwin's garden and then talk about pigeons. Pigeons don't grow in gardens. Give me a vegetable example of something Darwin grew in his garden that exemplified his theory. Just one example. Mendel did it, why couldn't Darwin? 8) I do not equate "forks" with spatial distribution of genes, I equate them with temporal distribution of genes. So the next few paragraphs are torching a straw man. 9) You say, "n other words (and in direct and complete contradiction to Sheldon’s assertions in the OP), Venditti, Meade, and Pagel’s fully support the assumption that the events that cause speciation (i.e. macroevolution) are random:" No, if you had read the article in the New Scientist, it would say that Pagel's interpretation of the exponential distribution of deltas (time differences) was assumed to be the result of a very rare, random occurrence. This interpretation, said New Scientist (which was undoubtedly Pagel's comment) is: "Like the bell curve, the exponential has a straightforward explanation - but it is a disquieting one for evolutionary biologists...So far, other evolutionary biologists have been reluctant to accept Pagel's idea wholeheartedly. Some regard it as interesting but in need of further testing. "The single, rare events model is brilliant as an interpretation - as a potential interpretation," says Arne Mooers at Simon Fraser University in Vancouver, Canada." So for you to say that the exponential distribution is "random" is the assumption, not the data to support it. Furthermore, many biologists disagree with Pagels that he is interpreting it correctly. Hence you and Pagels are supporting my one and only point: non-Gaussians ---> non-random It's a mathematical point. Not particularly ideological. You can add more random things in there if you like, but you have to _add_ them, they are not supported by the observation of the distribution, they are _assumed_. So you are not correct in saying that Pagel's data support random. They do no such thing. 10) You say, "And stochastic events are not what Sheldon tried (and failed) to assert they are: they are not regular, determinative events resulting from either the deliberate intervention in nature by a supernatural “designer” nor are they the result of a regular, determinative process such as “natural selection”. I neither asserted nor tried to assert what the stochastic events were. I merely pointed out that stochasticity (=random) would result in a Gaussian somewhere, and we didn't find a Gaussian, but an exponential. 11) But you actually undermine your own theory when you say, "Most evolutionary biologists have assumed that this also meant that the rate of formation of new species would not only be continuous, but that it would also be regular, with new species forming at regular, widely spaced intervals as the result of the accumulation of relatively small genetic differences that eventually resulted in reproductive incompatibility. This assumption was neither rigorously derived from first principles nor empirically derived, but rather was based on the assumption that “continuous variation” is the overwhelming rule in both traits and the genes that produce them." You are correct that this assumption wasn't derived. This is because it was part of the definition of evolution Darwin used. One more time, when Darwin assumed evolution occurred by random variation, he meant exactly what you wrote--small steps in random directions. Then, when enough steps had been taken, Voila! a new species. In other words, these small steps were cumulative. Putting that mathematically, it means that the distribution of steps, or their outcome, species, should be Gaussian because it is the result of an accumulation of small steps. I'm not making this up, this is Darwinist Doctrine, believed by Darwin in his garden and Neo-Darwinian population geneticists who posit islands and isolation as the driver for speciation. And this is precisely what Pagel did _not_ find. So what did Pagel find? An exponential, which says that speciation is NOT cumulative. Therefore it is NOT the result of small steps--it is the result of big steps. One day you find a fish, and the next day its a reptile. No small steps in between. The word Gould preferred for such jumps is "saltation". Call it what you like, it says that Gaussians are not involved. Now you see why Pagel's colleagues were afraid to agree with his conclusions?Robert Sheldon_{March 18, 2010
March
03
Mar
18
18
2010
09:16 AM
9
09
16
AM
PDT}

One of the greatest article at Uncommondescent ever. Well done!Kyrilluk_{March 18, 2010
March
03
Mar
18
18
2010
01:38 AM
1
01
38
AM
PDT}

1 2 Next

You must be logged in to post a comment.

Leave a Reply