Signifikanta resultat är inte alltid signifikanta

3 March, 2015 at 16:15 | Posted in Statistics & Econometrics | Leave a comment

significant-p-valueTidskriften Basic and Applied Social Psychology beslutade nyligen att förbjuda p-värden i publicerade artiklar. Det finns mycket att säga om denna ganska drastiska åtgärd … men det står helt klart att det finns problem med att fokusera alltför mycket på p-värden. Särskilt problematiska är p-värden när den statistiska styrkan (power) är låg i kombination med publiceringsbias, d.v.s. att framförallt statistiskt signifikanta resultat publiceras (se tidigare inlägg om publiceringsbias). Om den statistiska styrkan är låg, kan låga p-värden i vissa fall snarare vara en garant för missvisande resultat än ett tecken på tillförlitlighet.

Robert Östling

Läsvärd artikel!

Yours truly har själv berört problemen med den överdrivna fixeringen vid p-värden här, här, här och här.

Forecasting time series data in Gretl

2 March, 2015 at 20:38 | Posted in Statistics & Econometrics | 2 Comments

 

Thanks to Allin Cottrell and Riccardo Lucchetti we today have access to a high quality tool for doing and teaching econometrics — Gretl. And, best of all, it is totally free!

Gretl is up to the tasks you may have, so why spend money on expensive commercial programs?

The latest snapshot version of Gretl can be downloaded here.

Econom(etr)ic fictions masquerading as rigorous science

27 February, 2015 at 09:15 | Posted in Statistics & Econometrics | 2 Comments

In econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes casual knowledge. This is like pulling a rabbit from a hat. Great — but first you have to put the rabbit in the hat. And this is where assumptions come in to the picture.

As social scientists — and economists — we have to confront the all-important question of how to handle uncertainty and randomness. Should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts.

Accepting a domain of probability theory and a sample space of “infinite populations” — which is legion in modern econometrics — also implies that judgments are made on the basis of observations that are actually never made! Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

In his great book Statistical Models and Causal Inference: A Dialogue with the Social Sciences David Freedman touched on this fundamental problem, arising when you try to apply statistical models outside overly simple nomological machines like coin tossing and roulette wheels:

freedLurking behind the typical regression model will be found a host of such assumptions; without them, legitimate inferences cannot be drawn from the model. There are statistical procedures for testing some of these assumptions. However, the tests often lack the power to detect substantial failures. Furthermore, model testing may become circular; breakdowns in assumptions are detected, and the model is redefined to accommodate. In short, hiding the problems can become a major goal of model building.

Using models to make predictions of the future, or the results of interventions, would be a valuable corrective. Testing the model on a variety of data sets – rather than fitting refinements over and over again to the same data set – might be a good second-best … Built into the equation is a model for non-discriminatory behavior: the coefficient d vanishes. If the company discriminates, that part of the model cannot be validated at all.

Regression models are widely used by social scientists to make causal inferences; such models are now almost a routine way of demonstrating counterfactuals. However, the “demonstrations” generally turn out to depend on a series of untested, even unarticulated, technical assumptions. Under the circumstances, reliance on model outputs may be quite unjustified. Making the ideas of validation somewhat more precise is a serious problem in the philosophy of science. That models should correspond to reality is, after all, a useful but not totally straightforward idea – with some history to it. Developing appropriate models is a serious problem in statistics; testing the connection to the phenomena is even more serious …

In our days, serious arguments have been made from data. Beautiful, delicate theorems have been proved, although the connection with data analysis often remains to be established. And an enormous amount of fiction has been produced, masquerading as rigorous science.

Making outlandish statistical assumptions does not provide a solid ground for doing relevant social science.

Econometrics and the difficult art of making it count

26 February, 2015 at 20:43 | Posted in Statistics & Econometrics | Leave a comment

Modern econometrics is fundamentally based on assuming — usually without any explicit justification — that we can gain causal knowledge by considering independent variables that may have an impact on the variation of a dependent variable. This is however, far from self-evident. Often the fundamental causes are constant forces that are not amenable to the kind of analysis econometrics supplies us with. As Stanley Lieberson has it in his modern classic Making It Count:

LiebersonOne can always say whether, in a given empirical context, a given variable or theory accounts for more variation than another. But it is almost certain that the variation observed is not universal over time and place. Hence the use of such a criterion first requires a conclusion about the variation over time and place in the dependent variable. If such an analysis is not forthcoming, the theoretical conclusion is undermined by the absence of information …

Moreover, it is questionable whether one can draw much of a conclusion about causal forces from simple analysis of the observed variation … To wit, it is vital that one have an understanding, or at least a working hypothesis, about what is causing the event per se; variation in the magnitude of the event will not provide the answer to that question.

Causality in social sciences — and economics — can never solely be a question of statistical inference. Causality entails more than predictability, and to really in depth explain social phenomena requires theory. Analysis of variation – the foundation of all econometrics – can never in itself reveal how these variations are brought about. First when we are able to tie actions, processes or structures to the statistical relations detected, can we say that we are getting at relevant explanations of causation. Too much in love with axiomatic-deductive modeling, neoclassical economists especially tend to forget that accounting for causation — how causes bring about their effects — demands deep subject-matter knowledge and acquaintance with the intricate fabrics and contexts. As already Keynes argued in his A Treatise on Probability, statistics and econometrics should not primarily be seen as means of inferring causality from observational data, but rather as description of patterns of associations and correlations that we may use as suggestions of possible causal realations.

Model assumptions and reality

9 February, 2015 at 13:57 | Posted in Statistics & Econometrics | Leave a comment

In a previous article posted here — What are the key assumptions of linear regression models? — yours truly tried to argue that since econometrics doesn’t content itself with only making optimal predictions, but also aspires to explain things in terms of causes and effects, econometricians need loads of assumptions — and that most important of these are additivity and linearity.

Let me take the opportunity to cite one of my favourite introductory statistics textbooks on one further reason these assumptions are made — and why they ought to be much more argued for on both epistemological and ontological grounds when used (emphasis added):

In a hypothesis test … the sample comes from an unknown population. If the population is really unknown, it would suggest that we do not know the standard deviation, and therefore, we cannot calculate the standard error. gravetterTo solve this dilemma, we have made an assumption. Specifically, we assume that the standard deviation for the unknown population (after treatment) is the same as it was for the population before treatment.

Actually this assumption is the consequence of a more general assumption that is part of many statistical procedure. The general assumption states that the effect of the treatment is to add a constant amount to … every score in the population … You should also note that this assumption is a theoretical ideal. In actual experiments, a treatment generally does not show a perfect and consistent additive effect.

Additivity and linearity are the two most important of the assumptions that most applied econometric models rely on, simply because if they are not true, your model is invalid and descriptively incorrect. It’s like calling your house a bicycle. No matter how you try, it won’t move you an inch. When the model is wrong — well, then it’s wrong.

Markov’s Inequality Theorem (wonkish)

4 February, 2015 at 12:33 | Posted in Statistics & Econometrics | 2 Comments

MarkovOne of the most beautiful results of probability theory is Markov’s Inequality Theorem (after the Russian mathematician Andrei Markov (1856-1922)):

If X is a non-negative stochastic variable (X ≥ 0) with a finite expectation value E(X), then for every a > 0

P{X ≥ a} ≤ E(X)/a

If, e.g., the production of cars in a factory during a week is assumed to be a stochastic variable with an expectation value (mean) of 50 units, we can – based on nothing else but the inequality – conclude that the probability that the production for a week would be greater than 100 units can not exceed 50% [P(X≥100)≤(50/100)=0.5 = 50%]

I still feel a humble awe at this immensely powerful result. Without knowing anything else but an expected value (mean) of a probability distribution we can deduce upper limits for probabilities. The result hits me as equally suprising today as thirty years ago when I first run into it as a student of mathematical statistics.

Wasserman on Bayesian religion

30 January, 2015 at 18:10 | Posted in Statistics & Econometrics | 4 Comments

There is a nice YouTube video with Tony O’Hagan interviewing Dennis Lindley. Of course, Dennis is a legend and his impact on the field of statistics is huge.


At one point, Tony points out that some people liken Bayesian inference to a religion. Dennis claims this is false. Bayesian inference, he correctly points out, starts with some basic axioms and then the rest follows by deduction. This is logic, not religion.

I agree that the mathematics of Bayesian inference is based on sound logic. But, with all due respect, I think Dennis misunderstood the question. When people say that “Bayesian inference is like a religion,” they are not referring to the logic of Bayesian inference. They are referring to how adherents of Bayesian inference behave.

(As an aside, detractors of Bayesian inference do not deny the correctness of the logic. They just don’t think the axioms are relevant for data analysis. For example, no one doubts the axioms of Peano arithmetic. But that doesn’t imply that arithmetic is the foundation of statistical inference. But I digress.)

The vast majority of Bayesians are pragmatic, reasonable people. But there is a sub-group of die-hard Bayesians who do treat Bayesian inference like a religion. By this I mean:

They are very cliquish.
They have a strong emotional attachment to Bayesian inference.
They are overly sensitive to criticism.
They are unwilling to entertain the idea that Bayesian inference might have flaws.
When someone criticizes Bayes, they think that critic just “doesn’t get it.”
They mock people with differing opinions …

No evidence you can provide would ever make the die-hards doubt their ideas. To them, Sir David Cox, Brad Efron and other giants in our field who have doubts about Bayesian inference, are not taken seriously because they “just don’t get it.”

So is Bayesian inference a religion? For most Bayesians: no. But for the thin-skinned, inflexible die-hards who have attached themselves so strongly to their approach to inference that they make fun of, or get mad at, critics: yes, it is a religion.

Larry Wasserman

For some more thoughts on the limits of the Bayesian approach, Stephen Senn’s You May Believe You Are a Bayesian But You Are Probably Wrong is a good read.

The Lady Tasting Tea

29 January, 2015 at 15:20 | Posted in Statistics & Econometrics | Leave a comment

En av mina absoluta favoriter i statistikhyllan är David Salsburgs insiktsfulla statistikhistoria The Lady Tasting Tea. Boken är full av djupa och värdefulla reflektioner kring statistikens roll i modern vetenskap. Salsburg är, precis som tidigare till exempel Keynes, tveksam till hur många samhällsvetare – inte minst ekonomer – okritiskt och oargumenterat ofta bara antar att man kan applicera statistikteorins sannolikhetsfördelningar på sitt eget undersökningsområde. I slutkapitlet skriver han:

Kolmogorov established the mathematical meaning of probability: Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

Kloka ord för ekonometriker och andra “räknenissar” att begrunda!

The big problem with randomization (wonkish)

22 January, 2015 at 09:54 | Posted in Statistics & Econometrics | Leave a comment

But when the randomization is purposeful, a whole new set of issues arises — experimental contamination — which is much more serious with human subjects in a social system than with chemicals mixed in beakers … Anyone who designs an experiment in economics would do well to anticipate the inevitable barrage of questions regarding the valid transference of things learned in the lab (one value of z) into the real world (a different value of z) …

randomizeAbsent observation of the interactive compounding effects z, what is estimated is some kind of average treatment effect which is called by Imbens and Angrist (1994) a “Local Average Treatment Effect,” which is a little like the lawyer who explained that when he was a young man he lost many cases he should have won but as he grew older he won many that he should have lost, so that on the average justice was done. In other words, if you act as if the treatment effect is a random variable by substituting βt for β0 + β′zt, the notation inappropriately relieves you of the heavy burden of considering what are the interactive confounders and finding some way to measure them …

If little thought has gone into identifying these possible confounders, it seems probable that little thought will be given to the limited applicability of the results in other settings. This is the error made by the bond rating agencies in the recent financial crash — they transferred findings from one historical experience to a domain in which they no longer applied because, I will suggest, social confounders were not included.

Ed Leamer

Econometrics made easy — Gretl

21 January, 2015 at 21:53 | Posted in Statistics & Econometrics | Leave a comment

 

Thanks to Allin Cottrell and Riccardo Lucchetti we today have access to a high quality tool for doing and teaching econometrics — Gretl. And, best of all, it is totally free!

Gretl is up to the tasks you may have, so why spend money on expensive commercial programs?

The latest snapshot version of Gretl – 1.9.92 – can be downloaded here.

So just go ahead. With a program like Gretl econometrics has never been easier to master!

[And yes, I do know there’s another fabulously nice and free program — R. But R hasn’t got as nifty a GUI as Gretl — and at least for students, it’s more difficult to learn to handle and program. I do think it’s preferable when students are going to learn some basic econometrics to use Gretl so that they can concentrate more on “content” rather than “technique.”]

Next Page »

Create a free website or blog at WordPress.com. | The Pool Theme.
Entries and comments feeds.