Why most published research findings are false

19 August, 2018 at 10:45 | Posted in Statistics & Econometrics | Leave a comment

Instead of chasing statistical significance, we should improve our understanding of the range of R values — the pre-study odds — where research efforts operate. Before running an experiment, investigators should consider what they believe the chances are that they are testing a true rather than a non-true relationship. Speculated high R values may sometimes then be ascertained … Large studies with minimal bias should be performed on research findings that are considered relatively established, to see how often they are indeed confirmed. I suspect several established “classics” will fail the test.

homer-stats-quoteNevertheless, most new discoveries will continue to stem from hypothesis-generating​ research with low or very low pre-study odds. We should then acknowledge that statistical significance testing in the report of a single study gives only a partial picture, without knowing how much testing has been done outside the report and in the relevant field at large. Despite a large statistical literature for multiple testing corrections, usually it is impossible to decipher how much data dredging by the reporting authors or other research teams has preceded a reported research finding. Even if determining this were feasible, this would not inform us about the pre-study odds. Thus, it is unavoidable that one should make approximate assumptions on howmany relationships are expected to be true among those probed across the relevant research fields and research designs.​

John P. A. Ioannidis

Advertisements

Collider bias (wonkish)

8 August, 2018 at 11:51 | Posted in Statistics & Econometrics | Leave a comment

 

Why data is NOT enough to answer scientific questions

7 August, 2018 at 10:50 | Posted in Statistics & Econometrics | 3 Comments

The Book of Why_coverIronically, the need for a theory of causation began to surface at the same time that statistics came into being. In fact modern statistics hatched out of the causal questions that Galton and Pearson asked about heredity and out of their ingenious attempts to answer them from cross-generation data. Unfortunately, they failed in this endeavor and, rather than pause to ask “Why?”, they declared those questions off limits, and turned to develop a thriving, causality- free enterprise called statistics.

This was a critical moment in the history of science. The opportunity to equip causal questions with a language of their own came very close to being realized, but was squandered. In the following years, these questions were declared unscientific and went underground. Despite heroic efforts by the geneticist Sewall Wright (1889-1988), causal vocabulary was virtually prohibited for more than half a century. And when you prohibit speech, you prohibit thought, and you stifle principles, methods, and tools.

Readers do not have to be scientists to witness this prohibition. In Statistics 101, every student learns to chant: “Correlation is not causation.” With good reason! The rooster crow is highly correlated with the sunrise, yet it does not cause the sunrise.

Unfortunately, statistics took this common-sense observation and turned it into a fetish. It tells us that correlation is not causation, but it does not tell us what causation is. In vain will you search the index of a statistics textbook for an entry on “cause.” Students are never allowed to say that X is the cause of Y — only that X and Y are related or associated.

A popular idea in quantitative social sciences is to think of a cause (C) as something that increases the probability of its effect or outcome (O). That is:

P(O|C) > P(O|-C)

However, as is also well-known, a correlation between two variables, say A and B, does not necessarily imply that that one is a cause of the other, or the other way around, since they may both be an effect of a common cause, C.

In statistics and econometrics, we usually solve this confounder problem by controlling for C, i. e. by holding C fixed. This means that we actually look at different populations – those in which C occurs in every case, and those in which C doesn’t occur at all. This means that knowing the value of A does not influence the probability of C [P(C|A) = P(C)]. So if there then still exist a correlation between A and B in either of these populations, there has to be some other cause operating. But if all other possible causes have been controlled for too, and there is still a correlation between A and B, we may safely conclude that A is a cause of B, since by controlling for all other possible causes, the correlation between the putative cause A and all the other possible causes (D, E,. F …) is broken.

This is, of course, a very demanding prerequisite, since we may never actually be sure to have identified all putative causes. Even in scientific experiments may the number of uncontrolled causes be innumerable. Since nothing less will do, we do all understand how hard it is to actually get from correlation to causality. This also means that only relying on statistics or econometrics is not enough to deduce causes from correlations.

Some people think that randomization may solve the empirical problem. By randomizing we are getting different populations that are homogeneous in regards to all variables except the one we think is a genuine cause. In that way, we are supposed being able not having to actually know what all these other factors are.

If you succeed in performing an ideal randomization with different treatment groups and control groups that is attainable. But — it presupposes that you really have been able to establish — and not just assumed — that the probability of all other causes but the putative (A) have the same probability distribution in the treatment and control groups, and that the probability of assignment to treatment or control groups are independent of all other possible causal variables.

Unfortunately, real experiments and real randomizations seldom or never achieve this. So, yes, we may do without knowing all causes, but it takes ideal experiments and ideal randomizations to do that, not real ones.

That means that in practice we do have to have sufficient background knowledge to deduce causal knowledge. Without old knowledge, we can’t get new knowledge, and — no causes in, no causes out.

Econometrics is basically a deductive method. Given the assumptions (such as manipulability, transitivity, Reichenbach probability principles, separability, additivity, linearity, etc., etc.) it delivers deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Real target systems are seldom epistemically isomorphic to axiomatic-deductive models/systems, and even if they were, we still have to argue for the external validity of the e conclusions reached from within these epistemically convenient models/systems. Causal evidence generated by statistical/econometric procedures may be valid in closed models, but what we usually are interested in, is causal evidence in the real target system we happen to live in.

Advocates of econometrics want to have deductively automated answers to fundamental causal questions. But to apply ‘thin’ methods we have to have ‘thick’ background knowledge of what’s going on in the real world, and not in idealized models. Conclusions can only be as certain as their premises — and that also applies to the quest for causality in econometrics.

The central problem with the present ‘machine learning’ and ‘big data’ hype is that so many — falsely — think that they can get away with analysing real-world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for.

Clever data-mining tricks are never enough to answer important scientific questions. Theory matters.

Der Zusammenhang zwischen Musikgeschmack und Intelligenz

30 July, 2018 at 16:26 | Posted in Statistics & Econometrics | 3 Comments

 

Regression analysis — a case of wishful thinking

13 July, 2018 at 18:34 | Posted in Statistics & Econometrics | Leave a comment

The impossibility of proper specification is true generally in regression analyses across the social sciences, whether we are looking at the factors affecting occupational status, voting behavior, etc. The problem is that as implied by the conditions for regression analyses to yield accurate, unbiased estimates, you need to investigate a phenomenon that has underlying mathematical regularities – and, moreover, you need to know what they are. Neither seems true. I have no reason to believe that the way in which multiple factors affect earnings, student achievement, and GNP have some underlying mathematical regularity across individuals or countries. More likely, each individual or country has a different function, and one that changes over time. Even if there was some constancy, the processes are so complex that we have no idea of what the function looks like.

regressionResearchers recognize that they do not know the true function and seem to treat, usually implicitly, their results as a good-enough approximation. But there is no basis for the belief that the results of what is run in practice is anything close to the underlying phenomenon, even if there is an underlying phenomenon. This just seems to be wishful thinking. Most regression analysis research doesn’t even pay lip service to theoretical regularities. But you can’t just regress anything you want and expect the results to approximate reality. And even when researchers take somewhat seriously the need to have an underlying theoretical framework – as they have, at least to some extent, in the examples of studies of earnings, educational achievement, and GNP that I have used to illustrate my argument – they are so far from the conditions necessary for proper specification that one can have no confidence in the validity of the results.

Steven J. Klees

The theoretical conditions that have to be fulfilled for regression analysis and econometrics to really work are nowhere even closely met in reality. Making outlandish statistical assumptions do not provide a solid ground for doing relevant social science and economics. Although regression analysis and econometrics have become the most used quantitative methods in social sciences and economics today, it’s still a fact that the inferences made from them are — strictly seen — invalid.

The main reason why almost all econometric models are wrong

13 July, 2018 at 09:33 | Posted in Statistics & Econometrics | 3 Comments

How come that econometrics and statistical regression analyses still have not taken us very far in discovering, understanding, or explaining causation in socio-economic contexts? That is the question yours truly has tried to answer in an article published in the latest issue of World Economic Association Commentaries:

The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot be maintained that it even should be mandatory to treat observations and data — whether cross-section, time series or panel data — as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette-wheels. Data generating processes — at least outside of nomological machines like dice and roulette-wheels — are not self-evidently best modelled with probability measures.

EGOBILD2017When economists and econometricians — often uncritically and without arguments — simply assume that one can apply probability distributions from statistical theory on their own area of research, they are really skating on thin ice. If you cannot show that data satisfies all the conditions of the probabilistic nomological machine, then the statistical inferences made in mainstream economics lack sound foundations.

Statistical — and econometric — patterns should never be seen as anything other than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Statistics cannot establish the truth value of a fact. Never has. Never will.

What instrumental​ variables analysis is all about

9 July, 2018 at 18:11 | Posted in Statistics & Econometrics | 2 Comments

 

The randomistas revolution

7 July, 2018 at 10:54 | Posted in Statistics & Econometrics | 2 Comments

RandomistasIn his new history of experimental social science — Randomistas: How radical researchers are changing our world — Andrew Leigh gives an introduction to the RCT (randomized controlled trial) method for conducting experiments in medicine, psychology, development economics, and policy evaluation. Although it mentions there are critiques that can be waged against it, the author does not let that shadow his overwhelmingly enthusiastic view on RCT.

Among mainstream economists, this uncritical attitude towards RCTs has become standard. Nowadays many mainstream economists maintain that ‘imaginative empirical methods’ — such as natural experiments, field experiments, lab experiments, RCTs — can help us to answer questions concerning the external validity of economic models. In their view, they are more or less tests of ‘an underlying economic model’ and enable economists to make the right selection from the ever-expanding ‘collection of potentially applicable models.’

When looked at carefully, however, there are in fact few real reasons to share this optimism on the alleged ’empirical turn’ in economics.

If we see experiments or field studies as theory tests or models that ultimately aspire to say something about the real ‘target system,’ then the problem of external validity is central (and was for a long time also a key reason why behavioural economists had trouble getting their research results published).

Assume that you have examined how the performance of a group of people (A) is affected by a specific ‘treatment’ (B). How can we extrapolate/generalize to new samples outside the original population? How do we know that any replication attempt ‘succeeds’? How do we know when these replicated experimental results can be said to justify inferences made in samples from the original population? If, for example, P(A|B) is the conditional density function for the original sample, and we are interested in doing an extrapolative prediction of E [P(A|B)], how can we know that the new sample’s density function is identical with the original? Unless we can give some really good argument for this being the case, inferences built on P(A|B) is not really saying anything on that of the target system’s P'(A|B).

External validity/extrapolation/generalization is founded on the assumption that we can make inferences based on P(A|B) that is exportable to other populations for which P'(A|B) applies. Sure, if one can convincingly show that P and P’are similar enough, the problems are perhaps surmountable. But arbitrarily just introducing functional specification restrictions of the type invariance/stability /homogeneity, is, at least for an epistemological realist far from satisfactory. And often it is – unfortunately – exactly this that I see when I take part of mainstream economists’ RCTs and ‘experiments.’

Many ‘experimentalists’ claim that it is easy to replicate experiments under different conditions and therefore a fortiori easy to test the robustness of experimental results. But is it really that easy? Population selection is almost never simple. Had the problem of external validity only been about inference from sample to population, this would be no critical problem. But the really interesting inferences are those we try to make from specific labs/experiments/fields to specific real-world situations/institutions/ structures that we are interested in understanding or (causally) to explain. And then the population problem is more difficult to tackle.

In randomized trials the researchers try to find out the causal effects that different variables of interest may have by changing circumstances randomly — a procedure somewhat (‘on average’) equivalent to the usual ceteris paribus assumption).

Besides the fact that ‘on average’ is not always ‘good enough,’ it amounts to nothing but hand waving to simpliciter assume, without argumentation, that it is tenable to treat social agents and relations as homogeneous and interchangeable entities.

Randomization is used to basically allow the econometrician to treat the population as consisting of interchangeable and homogeneous groups (‘treatment’ and ‘control’). The regression models one arrives at by using randomized trials tell us the average effect that variations in variable X has on the outcome variable Y, without having to explicitly control for effects of other explanatory variables R, S, T, etc., etc. Everything is assumed to be essentially equal except the values taken by variable X.

In a usual regression context one would apply an ordinary least squares estimator (OLS) in trying to get an unbiased and consistent estimate:

Y = α + βX + ε,

where α is a constant intercept, β a constant ‘structural’ causal effect and ε an error term.

The problem here is that although we may get an estimate of the ‘true’ average causal effect, this may ‘mask’ important heterogeneous effects of a causal nature. Although we get the right answer of the average causal effect being 0, those who are ‘treated'( X=1) may have causal effects equal to – 100 and those ‘not treated’ (X=0) may have causal effects equal to 100. Contemplating being treated or not, most people would probably be interested in knowing about this underlying heterogeneity and would not consider the OLS average effect particularly enlightening.

Most ‘randomistas’ underestimate the heterogeneity problem. It does not just turn up as an external validity problem when trying to ‘export’ regression results to different times or different target populations. It is also often an internal problem to the millions of regression estimates that economists produce every year.

‘Ideally controlled experiments’ tell us with certainty what causes what effects — but only given the right ‘closures.’ Making appropriate extrapolations from (ideal, accidental, natural or quasi) experiments to different settings, populations or target systems, is not easy. “It works there” is no evidence for “it will work here”. Causes deduced in an experimental setting still have to show that they come with an export-warrant to the target population/system. The causal background assumptions made have to be justified, and without licenses to export, the value of ‘rigorous’ and ‘precise’ methods — and ‘on-average-knowledge’ — is despairingly small.

RCTs have very little reach beyond giving descriptions of what has happened in the past. From the perspective of the future and for policy purposes they are as a rule of limited value since they cannot tell us what background factors were held constant when the trial intervention was being made.

RCTs usually do not provide evidence that the results are exportable to other target systems. RCTs cannot be taken for granted to give generalizable results. That something works somewhere for someone is no warranty for us to believe it to work for us here or even that it works generally. RCTs are simply not the best method for all questions and in all circumstances. And insisting on using only one tool often means using the wrong tool.

Econometrics cannot establish the truth value of a fact. Never has. Never will.

6 July, 2018 at 14:08 | Posted in Statistics & Econometrics | 1 Comment

assumptions There seems to be a pervasive human aversion to uncertainty, and one way to reduce feelings of uncertainty is to invest faith in deduction as a sufficient guide to truth. Unfortunately, such faith is as logically unjustified as any religious creed, since a deduction produces certainty about the real world only when its assumptions about the real world are certain …

Assumption uncertainty reduces the status of deductions and statistical computations to exercises in hypothetical reasoning – they provide best-case scenarios of what we could infer from specific data (which are assumed to have only specific, known problems). Even more unfortunate, however, is that this exercise is deceptive to the extent it ignores or misrepresents available information, and makes hidden assumptions that are unsupported by data …

Econometrics supplies dramatic cautionary examples in which complexmodelling​g has failed miserably in important applications …

Sander Greenland

Yes, indeed, econometrics fails miserably over and over again. One reason why it does, is that the error term in the regression models used is thought of as representing the effect of the variables that were omitted from the models. The error term is somehow thought to be a ‘cover-all’ term representing omitted content in the model and necessary to include to ‘save’ the assumed deterministic relation between the other random variables included in the model. Error terms are usually assumed to be orthogonal (uncorrelated) to the explanatory variables. But since they are unobservable, they are also impossible to empirically test. And without justification of the orthogonality assumption, there is, as a rule, nothing to ensure identifiability:

Paul-Romer-727x727With enough math, an author can be confident that most readers will never figure out where a FWUTV (facts with unknown truth value) is buried. A discussant or referee cannot say that an identification assumption is not credible if they cannot figure out what it is and are too embarrassed to ask.

Distributional assumptions about error terms are a good place to bury things because hardly anyone pays attention to them. Moreover, if a critic does see that this is the identifying assumption, how can she win an argument about the true expected value the level of aether? If the author can make up an imaginary variable, “because I say so” seems like a pretty convincing answer to any question about its properties.

Paul Romer

Nowadays it has almost become a self-evident truism among economists that you cannot expect people to take your arguments seriously unless they are based on or backed up by advanced econometric modelling​. So legions of mathematical-statistical theorems are proved — and heaps of fiction are being produced, masquerading as science. The rigour​ of the econometric modelling and the far-reaching assumptions they are built on is frequently not supported by data.

Econometrics is basically a deductive method. Given the assumptions, it delivers deductive inferences. The problem, of course, is that we almost never know when the assumptions are right. Conclusions can only be as certain as their premises — and that also applies to econometrics.

Econometrics cannot establish the truth value of a fact. Never has. Never will.

The illusion of certainty

4 July, 2018 at 22:37 | Posted in Statistics & Econometrics | 3 Comments

 

Next Page »

Blog at WordPress.com.
Entries and comments feeds.