<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: A great read &#8211; an expose of a bunch of standard pitfalls of econometrics (done in an ever so slightly dodgy way)</title>
	<atom:link href="http://clubtroppo.com.au/2009/05/11/a-great-read-an-expose-of-a-bunch-of-standard-pitfalls-of-econometrics-done-in-an-ever-so-slightly-dodgy-way/feed/" rel="self" type="application/rss+xml" />
	<link>http://clubtroppo.com.au/2009/05/11/a-great-read-an-expose-of-a-bunch-of-standard-pitfalls-of-econometrics-done-in-an-ever-so-slightly-dodgy-way/</link>
	<description></description>
	<lastBuildDate>Sun, 12 Feb 2012 10:49:39 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3</generator>
	<item>
		<title>By: conrad</title>
		<link>http://clubtroppo.com.au/2009/05/11/a-great-read-an-expose-of-a-bunch-of-standard-pitfalls-of-econometrics-done-in-an-ever-so-slightly-dodgy-way/#comment-357419</link>
		<dc:creator>conrad</dc:creator>
		<pubDate>Tue, 12 May 2009 06:27:36 +0000</pubDate>
		<guid isPermaLink="false">http://clubtroppo.com.au/?p=8394#comment-357419</guid>
		<description>I&#039;m not into political science enough to be sure, but I would guess that a lot of the studies that are creating the graph from political science journals given above are not &quot;data mining&quot;. I would think that many are essentially experimental and are looking at differences across groups based on questions derived apriori from different theories. For example, I could test something like &quot;do labor voters love their pets more than liberal voters&quot; and base it on some apriori theory of altruism. When I didn&#039;t find a difference, the result would go into my silly-experiments-I-ran pile. However, when the 50th person does essentially the same experiment and gets a Type I error with a p value of .0499, then they think it&#039;s a great thing to publish. So the effect may be coming from things that arn&#039;t to do with &quot;data mining&quot; but the fact that many people run lots and lots of little experiments.</description>
		<content:encoded><![CDATA[<p>I&#8217;m not into political science enough to be sure, but I would guess that a lot of the studies that are creating the graph from political science journals given above are not &#8220;data mining&#8221;. I would think that many are essentially experimental and are looking at differences across groups based on questions derived apriori from different theories. For example, I could test something like &#8220;do labor voters love their pets more than liberal voters&#8221; and base it on some apriori theory of altruism. When I didn&#8217;t find a difference, the result would go into my silly-experiments-I-ran pile. However, when the 50th person does essentially the same experiment and gets a Type I error with a p value of .0499, then they think it&#8217;s a great thing to publish. So the effect may be coming from things that arn&#8217;t to do with &#8220;data mining&#8221; but the fact that many people run lots and lots of little experiments.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bruce Bradbury</title>
		<link>http://clubtroppo.com.au/2009/05/11/a-great-read-an-expose-of-a-bunch-of-standard-pitfalls-of-econometrics-done-in-an-ever-so-slightly-dodgy-way/#comment-357415</link>
		<dc:creator>Bruce Bradbury</dc:creator>
		<pubDate>Tue, 12 May 2009 03:54:14 +0000</pubDate>
		<guid isPermaLink="false">http://clubtroppo.com.au/?p=8394#comment-357415</guid>
		<description>My explanation for the figure is that it represents a mixture of two types of studies. Type 1 are studies which are only interesting (and hence published) if the result is significant. Type 2 are studies where the result is of interest even if it is not significant.

As an aside, the term &#039;data mining&#039; is not entirely negative. There is a large industry of software and practitioners who use data mining in a statistically respectable manner. The trick is to have a very large dataset, do your &#039;mining&#039; in one half of the dataset, then test the estimated model in the other half.</description>
		<content:encoded><![CDATA[<p>My explanation for the figure is that it represents a mixture of two types of studies. Type 1 are studies which are only interesting (and hence published) if the result is significant. Type 2 are studies where the result is of interest even if it is not significant.</p>
<p>As an aside, the term &#8216;data mining&#8217; is not entirely negative. There is a large industry of software and practitioners who use data mining in a statistically respectable manner. The trick is to have a very large dataset, do your &#8216;mining&#8217; in one half of the dataset, then test the estimated model in the other half.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Invig</title>
		<link>http://clubtroppo.com.au/2009/05/11/a-great-read-an-expose-of-a-bunch-of-standard-pitfalls-of-econometrics-done-in-an-ever-so-slightly-dodgy-way/#comment-357393</link>
		<dc:creator>Invig</dc:creator>
		<pubDate>Tue, 12 May 2009 01:36:00 +0000</pubDate>
		<guid isPermaLink="false">http://clubtroppo.com.au/?p=8394#comment-357393</guid>
		<description>Great article!

Something that annoys me is proponents&#039; of econometrics unwillingness to adopt a more visceral and honest approach to research.

To admit that the crafting of variables is probably THE MOST important part of the process, and that everything else (the relationships between them) can flow from there.

Sadly, variables are restricted to those which have data available.

NB I ask &lt;a href=&quot;http://andrewleigh.com/?p=2053&quot; rel=&quot;nofollow&quot;&gt;Andrew Leigh&lt;/a&gt; about this via blog comments (I deign to use private correspondence in a principled attempt to make these discussions public, but I guess people of status have got there because they care about such things, and discussions with the likes of me may undermine that status) but received no response.</description>
		<content:encoded><![CDATA[<p>Great article!</p>
<p>Something that annoys me is proponents&#8217; of econometrics unwillingness to adopt a more visceral and honest approach to research.</p>
<p>To admit that the crafting of variables is probably THE MOST important part of the process, and that everything else (the relationships between them) can flow from there.</p>
<p>Sadly, variables are restricted to those which have data available.</p>
<p>NB I ask <a href="http://andrewleigh.com/?p=2053">Andrew Leigh</a> about this via blog comments (I deign to use private correspondence in a principled attempt to make these discussions public, but I guess people of status have got there because they care about such things, and discussions with the likes of me may undermine that status) but received no response.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: derrida derider</title>
		<link>http://clubtroppo.com.au/2009/05/11/a-great-read-an-expose-of-a-bunch-of-standard-pitfalls-of-econometrics-done-in-an-ever-so-slightly-dodgy-way/#comment-357379</link>
		<dc:creator>derrida derider</dc:creator>
		<pubDate>Tue, 12 May 2009 01:11:24 +0000</pubDate>
		<guid isPermaLink="false">http://clubtroppo.com.au/?p=8394#comment-357379</guid>
		<description>&lt;i&gt;&quot;[humans] usually want to do better and predict when red will come up too, engaging in reasoning like after three straight greens, we are due for a red.&lt;/i&gt;

Of course if it&#039;s sampling without replacement and the population size is small, this is perfectly rational.  

Economists aren&#039;t the only ones who data mine - in fact, they&#039;re far from the worst (marketing people IME are the worst).  It&#039;s widespread wherever statistics are used and a risk in all frequentist approaches.  So maybe the best remedy is to be more Bayesian.

But ultimately we have to understand that there are no shortcuts to truth - definitely establishing whether A causes B is often far harder than it seems.</description>
		<content:encoded><![CDATA[<p><i>&#8220;[humans] usually want to do better and predict when red will come up too, engaging in reasoning like after three straight greens, we are due for a red.</i></p>
<p>Of course if it&#8217;s sampling without replacement and the population size is small, this is perfectly rational.  </p>
<p>Economists aren&#8217;t the only ones who data mine &#8211; in fact, they&#8217;re far from the worst (marketing people IME are the worst).  It&#8217;s widespread wherever statistics are used and a risk in all frequentist approaches.  So maybe the best remedy is to be more Bayesian.</p>
<p>But ultimately we have to understand that there are no shortcuts to truth &#8211; definitely establishing whether A causes B is often far harder than it seems.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Don Arthur</title>
		<link>http://clubtroppo.com.au/2009/05/11/a-great-read-an-expose-of-a-bunch-of-standard-pitfalls-of-econometrics-done-in-an-ever-so-slightly-dodgy-way/#comment-357362</link>
		<dc:creator>Don Arthur</dc:creator>
		<pubDate>Mon, 11 May 2009 23:18:10 +0000</pubDate>
		<guid isPermaLink="false">http://clubtroppo.com.au/?p=8394#comment-357362</guid>
		<description>Great post!

As a non-economist I&#039;ve always been puzzled by the rituals of the discipline (how to get published, win peer respect etc). Each discipline seems to have its own.

Psychologists like experiments. But they tend to run a lot of them with American college students and under conditions that make it difficult to generalise the results outside the laboratory.

For a while it was fashionable in psychology to practice a crude kind of operationalism where it was forbidden to appeal to any unobservable process (eg thoughts, feelings or neural processes).

Anthropologists like ethnography and are very impressed when a researcher immerses themselves for years in an interesting foreign culture (an economics department perhaps?).

Economists love regressions seem to hover like seagulls around organisations with large data sets.

If an economist was researching unemployment and you offered to let them spend a week watching people in a welfare to work office do their job, they&#039;d probably think you were trying to contaminate the research process (or waste their valuable time).

The odd thing is, if there are any good theories in social science, they&#039;d have to apply across these disciplinary boundaries.

Sorry if this off topic.</description>
		<content:encoded><![CDATA[<p>Great post!</p>
<p>As a non-economist I&#8217;ve always been puzzled by the rituals of the discipline (how to get published, win peer respect etc). Each discipline seems to have its own.</p>
<p>Psychologists like experiments. But they tend to run a lot of them with American college students and under conditions that make it difficult to generalise the results outside the laboratory.</p>
<p>For a while it was fashionable in psychology to practice a crude kind of operationalism where it was forbidden to appeal to any unobservable process (eg thoughts, feelings or neural processes).</p>
<p>Anthropologists like ethnography and are very impressed when a researcher immerses themselves for years in an interesting foreign culture (an economics department perhaps?).</p>
<p>Economists love regressions seem to hover like seagulls around organisations with large data sets.</p>
<p>If an economist was researching unemployment and you offered to let them spend a week watching people in a welfare to work office do their job, they&#8217;d probably think you were trying to contaminate the research process (or waste their valuable time).</p>
<p>The odd thing is, if there are any good theories in social science, they&#8217;d have to apply across these disciplinary boundaries.</p>
<p>Sorry if this off topic.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: conrad</title>
		<link>http://clubtroppo.com.au/2009/05/11/a-great-read-an-expose-of-a-bunch-of-standard-pitfalls-of-econometrics-done-in-an-ever-so-slightly-dodgy-way/#comment-357353</link>
		<dc:creator>conrad</dc:creator>
		<pubDate>Mon, 11 May 2009 21:24:35 +0000</pubDate>
		<guid isPermaLink="false">http://clubtroppo.com.au/?p=8394#comment-357353</guid>
		<description>&quot;Theres a pretty obvious conclusion here, and it has nothing to do with publication bias: data is being massaged on wide scale.&quot;
.
I&#039;m not sure how you infer that. The graph is entirely unsurprising to me, and would occur even if people didn&#039;t massage their data (although I&#039;m sure that happens too). The reason is that many people (perhaps most in some areas) are often looking for tiny effects with low power that are supposed to be interesting theoretically, whereas other people are running more quantitative stuff where you do have big effects with lots of power. This means that what you are seeing is basically noise from the fiddly little effects, where someone has collected a data set than many other people have, but they just happened to have sampled from the tail of the distribution. This means to get to the next percentile, it will be very much harder (since the z = 1.96 data sets might already be coming from a sample a few SDs away from the real population mean), so you should see a huge drop off -- the fact the data sets were just significant were just an outlier even in the first place. If you then mix the fiddly-little-effects with the big effects, you will end up with a normal curve generated by these plus the 1.96 blip from the fiddly little effects people. 

I think the real test would be for someone else to re-run a pile of other people&#039;s experiments from the same journal (which would be possible in some areas but not others). That way you could see to what extent results really are getting exaggerated by everybody trying to find the magical .05 value. Unfortunately, in many areas, this is an impossible strategy, and failures to replicate almost never get published. This is why we end up with a massive proliferation of little effects that don&#039;t really matter.</description>
		<content:encoded><![CDATA[<p>&#8220;Theres a pretty obvious conclusion here, and it has nothing to do with publication bias: data is being massaged on wide scale.&#8221;<br />
.<br />
I&#8217;m not sure how you infer that. The graph is entirely unsurprising to me, and would occur even if people didn&#8217;t massage their data (although I&#8217;m sure that happens too). The reason is that many people (perhaps most in some areas) are often looking for tiny effects with low power that are supposed to be interesting theoretically, whereas other people are running more quantitative stuff where you do have big effects with lots of power. This means that what you are seeing is basically noise from the fiddly little effects, where someone has collected a data set than many other people have, but they just happened to have sampled from the tail of the distribution. This means to get to the next percentile, it will be very much harder (since the z = 1.96 data sets might already be coming from a sample a few SDs away from the real population mean), so you should see a huge drop off &#8212; the fact the data sets were just significant were just an outlier even in the first place. If you then mix the fiddly-little-effects with the big effects, you will end up with a normal curve generated by these plus the 1.96 blip from the fiddly little effects people. </p>
<p>I think the real test would be for someone else to re-run a pile of other people&#8217;s experiments from the same journal (which would be possible in some areas but not others). That way you could see to what extent results really are getting exaggerated by everybody trying to find the magical .05 value. Unfortunately, in many areas, this is an impossible strategy, and failures to replicate almost never get published. This is why we end up with a massive proliferation of little effects that don&#8217;t really matter.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

