﻿<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: New Dembski-Marks Paper</title>
	<atom:link href="http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/</link>
	<description>Serving The Intelligent Design Community</description>
	<lastBuildDate>Mon, 13 Feb 2012 10:04:28 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
	<item>
		<title>By: DiEb</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342679</link>
		<dc:creator>DiEb</dc:creator>
		<pubDate>Sat, 12 Dec 2009 00:43:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342679</guid>
		<description>Dear Dr. Dembski,

you didn&#039;t define what you understand by a these &lt;i&gt;some-to-many&lt;/i&gt; mapping, and it&#039;s not a common term. Could you do so, please? For me, they seem to be just reverse images under a mapping from &#937;&#039; to &#937;....</description>
		<content:encoded><![CDATA[<p>Dear Dr. Dembski,</p>
<p>you didn&#8217;t define what you understand by a these <i>some-to-many</i> mapping, and it&#8217;s not a common term. Could you do so, please? For me, they seem to be just reverse images under a mapping from &Omega;&#8217; to &Omega;&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Prof_P.Olofsson</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342624</link>
		<dc:creator>Prof_P.Olofsson</dc:creator>
		<pubDate>Fri, 11 Dec 2009 15:32:19 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342624</guid>
		<description>Mystic[79],
Good point. It&#039;s even an uncountable space if you choose probabilities in [0,1]. This error has also been pointed out &lt;a href=&quot;http://boundedtheoretics.blogspot.com/2009/12/blunder-in-new-dembski-marks-paper.html&quot; rel=&quot;nofollow&quot;&gt;by Tom English&lt;/a&gt;.</description>
		<content:encoded><![CDATA[<p>Mystic[79],<br />
Good point. It&#8217;s even an uncountable space if you choose probabilities in [0,1]. This error has also been pointed out <a href="http://boundedtheoretics.blogspot.com/2009/12/blunder-in-new-dembski-marks-paper.html" rel="nofollow">by Tom English</a>.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Zachriel</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342526</link>
		<dc:creator>Zachriel</dc:creator>
		<pubDate>Thu, 10 Dec 2009 12:44:25 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342526</guid>
		<description>This may have been lost above due to the long moderation delay. (Why are Zachriel&#039;s comments being moderated?)

&lt;blockquote&gt;&lt;b&gt;Dembski &amp; Marks&lt;/b&gt;: Prior knowledge about the smoothness of a search landscape required for gradient based hill-climbing, is not only common but is also vital to the success of some search optimizations. Such procedures, however, are of little use when searching to find a sequence of, say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker …&lt;/blockquote&gt;

Does that mean an evolutionary algorithm can&#039;t navigate the wordscape of the dictionary from shorter to longer words because there are no hills to climb?</description>
		<content:encoded><![CDATA[<p>This may have been lost above due to the long moderation delay. (Why are Zachriel&#8217;s comments being moderated?)</p>
<blockquote><p><b>Dembski &amp; Marks</b>: Prior knowledge about the smoothness of a search landscape required for gradient based hill-climbing, is not only common but is also vital to the success of some search optimizations. Such procedures, however, are of little use when searching to find a sequence of, say, 7 letters from a 26-letter alphabet to form a word that will pass successfully through a spell checker …</p></blockquote>
<p>Does that mean an evolutionary algorithm can&#8217;t navigate the wordscape of the dictionary from shorter to longer words because there are no hills to climb?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mystic</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342510</link>
		<dc:creator>Mystic</dc:creator>
		<pubDate>Thu, 10 Dec 2009 08:49:32 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342510</guid>
		<description>My first application of evolutionary programming was to optimization of the connection weights of recurrent neural nets. Now, I could have exploited the mathematical analysis that yielded partial derivatives of the objective function for the weights. But I knew also that the computation of the partial derivatives was very slow, and I had to believe that &quot;quick and dirty&quot; EP would go faster, by the clock on the wall, than &quot;neat and clean&quot; gradient descent. And I was right.

Dembski and Marks will say that the gradient descent algorithm did better than EP because it required fewer trials than EP to obtain an acceptable solution. But I say that EP outperformed gradient descent because it found an acceptable solution with much less computational work than gradient descent did. The &quot;information gain&quot; in knowing the gradient was not worth what it cost.</description>
		<content:encoded><![CDATA[<p>My first application of evolutionary programming was to optimization of the connection weights of recurrent neural nets. Now, I could have exploited the mathematical analysis that yielded partial derivatives of the objective function for the weights. But I knew also that the computation of the partial derivatives was very slow, and I had to believe that &#8220;quick and dirty&#8221; EP would go faster, by the clock on the wall, than &#8220;neat and clean&#8221; gradient descent. And I was right.</p>
<p>Dembski and Marks will say that the gradient descent algorithm did better than EP because it required fewer trials than EP to obtain an acceptable solution. But I say that EP outperformed gradient descent because it found an acceptable solution with much less computational work than gradient descent did. The &#8220;information gain&#8221; in knowing the gradient was not worth what it cost.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mystic</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342509</link>
		<dc:creator>Mystic</dc:creator>
		<pubDate>Thu, 10 Dec 2009 08:16:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342509</guid>
		<description>The argument of Appendix B is invalid. For a fixed representation of randomized algorithms (e.g., as probabilistic Turing machines), the space of search algorithms, Omega_2, is countably infinite, not finite.

Consider a search algorithm that obtains i.i.d. uniform bits from a random source, and simulates a toss of a biased coin (probability of heads is p) to decide which of two deterministic search algorithms to run on a search problem. There are at least as many randomized algorithms for simulating the coin toss as there are rational probabilities p = n / d, and the set of rational probabilities is countably infinite. Note also that the algorithms must obtain &#124;d&#124; random bits in the worst case. Thus there is no upper bound on running time for algorithms of this form.

Vanishingly few random search processes have exact implementations as randomized search algorithms. And many randomized search algorithms require impractical time and space. When we observe a success for a search algorithm, the algorithm is much more likely to be small and fast than big and slow. It is very important to consider this observational bias of ours.</description>
		<content:encoded><![CDATA[<p>The argument of Appendix B is invalid. For a fixed representation of randomized algorithms (e.g., as probabilistic Turing machines), the space of search algorithms, Omega_2, is countably infinite, not finite.</p>
<p>Consider a search algorithm that obtains i.i.d. uniform bits from a random source, and simulates a toss of a biased coin (probability of heads is p) to decide which of two deterministic search algorithms to run on a search problem. There are at least as many randomized algorithms for simulating the coin toss as there are rational probabilities p = n / d, and the set of rational probabilities is countably infinite. Note also that the algorithms must obtain |d| random bits in the worst case. Thus there is no upper bound on running time for algorithms of this form.</p>
<p>Vanishingly few random search processes have exact implementations as randomized search algorithms. And many randomized search algorithms require impractical time and space. When we observe a success for a search algorithm, the algorithm is much more likely to be small and fast than big and slow. It is very important to consider this observational bias of ours.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: R0b</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342477</link>
		<dc:creator>R0b</dc:creator>
		<pubDate>Wed, 09 Dec 2009 23:46:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342477</guid>
		<description>I&#039;ll make one more comment on the paper, regarding Appendix B: The LCI actually follows immediately from the second sentence of the appendix.  That sentence says that blindly finding the target has the same probability as finding the target with a blindly selected algorithm.  From this it follows that blindly finding the target is &lt;i&gt;at least as&lt;/i&gt; probable as finding the target with a blindly selected algorithm AND the selected algorithm being as good as it is.  That&#039;s the LCI.</description>
		<content:encoded><![CDATA[<p>I&#8217;ll make one more comment on the paper, regarding Appendix B: The LCI actually follows immediately from the second sentence of the appendix.  That sentence says that blindly finding the target has the same probability as finding the target with a blindly selected algorithm.  From this it follows that blindly finding the target is <i>at least as</i> probable as finding the target with a blindly selected algorithm AND the selected algorithm being as good as it is.  That&#8217;s the LCI.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Prof_P.Olofsson</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342461</link>
		<dc:creator>Prof_P.Olofsson</dc:creator>
		<pubDate>Wed, 09 Dec 2009 21:02:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342461</guid>
		<description>GradStudent[75],
Sorry, that&#039;s just way too much for me to read right now. If they assume maximum entropy based on PrOIR alone, and draw conclusions from that assumption alone, without analyzing any data or invoking any other information, sure, it&#039;s an application. I doubt that&#039;s what is done though. 

I agree, we should probably quit. Let me just finish like I started and quote myself:

Consider D&amp;M’s Formula (1). On the one hand they claim that it assumes the PrOIR due to “no prior knowledge.” But before Formula (1) they state that the deck is well shuffled which is a lot of prior knowledge and precisely the prior knowledge that warrants the uniform distribution. One must not confuse prior knowledge of the distribution of cards (which we do have) with prior knowledge of the location of the ace of spades (which we don’t have).</description>
		<content:encoded><![CDATA[<p>GradStudent[75],<br />
Sorry, that&#8217;s just way too much for me to read right now. If they assume maximum entropy based on PrOIR alone, and draw conclusions from that assumption alone, without analyzing any data or invoking any other information, sure, it&#8217;s an application. I doubt that&#8217;s what is done though. </p>
<p>I agree, we should probably quit. Let me just finish like I started and quote myself:</p>
<p>Consider D&amp;M’s Formula (1). On the one hand they claim that it assumes the PrOIR due to “no prior knowledge.” But before Formula (1) they state that the deck is well shuffled which is a lot of prior knowledge and precisely the prior knowledge that warrants the uniform distribution. One must not confuse prior knowledge of the distribution of cards (which we do have) with prior knowledge of the location of the ace of spades (which we don’t have).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Prof_P.Olofsson</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342459</link>
		<dc:creator>Prof_P.Olofsson</dc:creator>
		<pubDate>Wed, 09 Dec 2009 20:55:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342459</guid>
		<description>UP[72].
No it&#039;s not the same thing and we&#039;re not nitpicking. The NFLT says that &lt;b&gt;if&lt;/b&gt; the distribution is uniform, then certain conclusions hold. If you can verify uniformity, the NFLT applies. It&#039;s got nothing to do with the PrOIR, just like your other examples don&#039;t.  Opinion polls obviously has nothing at all to do with PrOIR as they are based on collecting data from which one gets information.

You are right, I&#039;d rather be painting my nails!</description>
		<content:encoded><![CDATA[<p>UP[72].<br />
No it&#8217;s not the same thing and we&#8217;re not nitpicking. The NFLT says that <b>if</b> the distribution is uniform, then certain conclusions hold. If you can verify uniformity, the NFLT applies. It&#8217;s got nothing to do with the PrOIR, just like your other examples don&#8217;t.  Opinion polls obviously has nothing at all to do with PrOIR as they are based on collecting data from which one gets information.</p>
<p>You are right, I&#8217;d rather be painting my nails!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: GradStudent</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342444</link>
		<dc:creator>GradStudent</dc:creator>
		<pubDate>Wed, 09 Dec 2009 20:00:59 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342444</guid>
		<description>Hi Prof P.Olofsson,

Well, thanks for the discussion.  It seems this thread has been booted from the front page and so maybe we should wrap it up.

Regarding applying the PrOIR repeatedly at each level, actually, I don&#039;t think that is necessary to assume PrOIR at each level if one has a lot of levels.  It seems that all you need is a prior that has positive probability over the whole space.  But the key idea is that each prior adds a little more variance, and with many levels, that variance could drown out everything else and thus make the lowest prior look uniform.  

Regarding the &quot;guess my prior&quot; game, if you were absolutely *forced* to make a choice based on no knowledge, my guess is that you would guess uniform rather than placing high probability on a subset of the space.   

Of course I just made up these arguments offhand and there could be serious flaws in them.  

Regarding applications, there is a lot of work in maxent modeling as I suggested earlier.  They are assuming maximum entropy and seem to be getting great results:

http://homepages.inf.ed.ac.uk/lzhang10/maxent.html

Is this what you mean by application of PrOIR?</description>
		<content:encoded><![CDATA[<p>Hi Prof P.Olofsson,</p>
<p>Well, thanks for the discussion.  It seems this thread has been booted from the front page and so maybe we should wrap it up.</p>
<p>Regarding applying the PrOIR repeatedly at each level, actually, I don&#8217;t think that is necessary to assume PrOIR at each level if one has a lot of levels.  It seems that all you need is a prior that has positive probability over the whole space.  But the key idea is that each prior adds a little more variance, and with many levels, that variance could drown out everything else and thus make the lowest prior look uniform.  </p>
<p>Regarding the &#8220;guess my prior&#8221; game, if you were absolutely *forced* to make a choice based on no knowledge, my guess is that you would guess uniform rather than placing high probability on a subset of the space.   </p>
<p>Of course I just made up these arguments offhand and there could be serious flaws in them.  </p>
<p>Regarding applications, there is a lot of work in maxent modeling as I suggested earlier.  They are assuming maximum entropy and seem to be getting great results:</p>
<p><a href="http://homepages.inf.ed.ac.uk/lzhang10/maxent.html" rel="nofollow">http://homepages.inf.ed.ac.uk/lzhang10/maxent.html</a></p>
<p>Is this what you mean by application of PrOIR?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Frank</title>
		<link>http://www.uncommondescent.com/intelligent-design/new-dembski-marks-paper/comment-page-3/#comment-342443</link>
		<dc:creator>Mark Frank</dc:creator>
		<pubDate>Wed, 09 Dec 2009 19:58:34 +0000</pubDate>
		<guid isPermaLink="false">http://www.uncommondescent.com/?p=10268#comment-342443</guid>
		<description>#73

&lt;em&gt;Lets go do something useful. Bake a cake. Shop. Do our nails. Make up Richard Dawkins jokes. Read more Dembski books.&lt;/em&gt;

Is this in decreasing order of usefulness?</description>
		<content:encoded><![CDATA[<p>#73</p>
<p><em>Lets go do something useful. Bake a cake. Shop. Do our nails. Make up Richard Dawkins jokes. Read more Dembski books.</em></p>
<p>Is this in decreasing order of usefulness?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

