## How to think about statistics

29 May, 2019 at 09:32 | Posted in Statistics & Econometrics | 1 Comment

If anything, Gelman’s talk underlines how important it is not to equate science with statistical calculation. All science entail human judgement, and using statistical models doesn’t relieve us of that necessity. Working with misspecified models, the scientific value of statistics is actually zero — even though you’re making valid statistical inferences! Statistical models are no substitutes for doing real science. Or as a famous German philosopher famously wrote 150 years ago:

There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.

We should never forget that the underlying parameters we use when performing statistical tests are *model constructions*. And if the model is wrong, the value of our calculations is nil. As ‘shoe-leather researcher’ David Freedman wrote in *Statistical Models and Causal Inference*:

I believe model validation to be a central issue. Of course, many of my colleagues will be found to disagree. For them, fitting models to data, computing standard errors, and performing significance tests is “informative,” even though the basic statistical assumptions (linearity, independence of errors, etc.) cannot be validated. This position seems indefensible, nor are the consequences trivial. Perhaps it is time to reconsider.

All of this, of course, does also apply when we use statistics in economics. Most work in econometrics and regression analysis is — still — made on the assumption that the researcher has a theoretical model that is ‘true.’ Based on this belief of having a correct specification for an econometric model or running a regression, one proceeds as if the only problem remaining to solve have to do with measurement and observation.

When things sound too good to be true, they usually aren’t. And that goes for econometrics too. The snag is that there is pretty little to support the perfect specification assumption. Looking around in social science and economics we don’t find a single regression or econometric model that lives up to the standards set by the ‘true’ theoretical model — and there is pretty little that gives us reason to believe things will be different in the future.

To think that we are being able to construct a model where all relevant variables are included and correctly specify the functional relationships that exist between them is not only a belief without support, but a belief *impossible* to support.

The theories we work with when building our econometric regression models are insufficient. No matter what we study, there are always some variables missing, and we don’t know the correct way to functionally specify the relationships between the variables.

*Every* regression model constructed is misspecified. There is always an endless list of possible variables to include, and endless possible ways to specify the relationships between them. So every applied econometrician comes up with his own specification and ‘parameter’ estimates. The econometric Holy Grail of consistent and stable parameter-values is nothing but a dream.

In order to draw inferences from data as described by econometric texts, it is necessary to make whimsical assumptions. The professional audience consequently and properly withholds belief until an inference is shown to be adequately insensitive to the choice of assumptions. The haphazard way we individually and collectively study the fragility of inferences leaves most of us unconvinced that any inference is believable. If we are to make effective use of our scarce data resource, it is therefore important that we study fragility in a much more systematic way. If it turns out that almost all inferences from economic data are fragile, I suppose we shall have to revert to our old methods …

A rigorous application of econometric methods in economics really presupposes that the phenomena of our real world economies are ruled by stable causal relations between variables. Parameter-values estimated in specific spatio-temporal contexts are *presupposed* to be exportable to totally different contexts. To warrant this assumption one, however, has to convincingly establish that the targeted acting causes are stable and invariant so that they maintain their parametric status after the bridging. The endemic lack of predictive success of the econometric project indicates that this hope of finding fixed parameters is a hope for which there really is no other ground than hope itself.

The theoretical conditions that have to be fulfilled for regression analysis and econometrics to really work are nowhere even closely met in reality. Making outlandish statistical assumptions does not provide a solid ground for doing relevant social science and economics. Although regression analysis and econometrics have become the most used quantitative methods in social sciences and economics today, it’s still a fact that the inferences made from them are invalid.

Econometrics — and regression analysis — is basically a deductive method. Given the assumptions (such as manipulability, transitivity, separability, additivity, linearity, etc) it delivers deductive inferences. The problem, of course, is that we will never completely know when the assumptions are right. Conclusions can only be as certain as their premises — and that also applies to econometrics and regression analysis.

## 1 Comment

Sorry, the comment form is closed at this time.

Blog at WordPress.com.

Entries and comments feeds.

“There is no royal road to science, and only those who do not dread the fatiguing climb of its steep paths have a chance of gaining its luminous summits.” ― Karl Marx, Capital, Vol. 1: A Critical Analysis of Capitalist Production

I learned something new today Lars. Thank you.

Comment by Rob— 29 May, 2019 #