A quick refresher on ergodicity (student stuff)

18 June, 2013 at 13:10 | Posted in Statistics & Econometrics | Leave a comment

 

The book I wish I had written

17 June, 2013 at 08:37 | Posted in Statistics & Econometrics | Leave a comment

freedmanMathematical statistician David A. Freedman‘s Statistical Models and Causal Inference (Cambridge University Press, 2010) is a marvellous book. It ought to be mandatory reading for every serious social scientist – including economists and econometricians – who doesn’t want to succumb to ad hoc assumptions and unsupported statistical conclusions!

Introduction

How do we calibrate the uncertainty introduced by data collection? Nowadays, this question has become quite salient, and it is routinely answered using wellknown methods of statistical inference, with standard errors, t -tests, and P-values … These conventional answers, however, turn out to depend critically on certain rather restrictive assumptions, for instance, random sampling …

Thus, investigators who use conventional statistical technique turn out to be making, explicitly or implicitly, quite restrictive behavioral assumptions about their data collection process … More typically, perhaps, the data in hand are simply the data most readily available …

The moment that conventional statistical inferences are made from convenience samples, substantive assumptions are made about how the social world operates … When applied to convenience samples, the random sampling assumption is not a mere technicality or a minor revision on the periphery; the assumption becomes an integral part of the theory …

Regression Models

In particular, regression and its elaborations … are now standard tools of the trade. Although rarely discussed, statistical assumptions have major impacts on analytic results obtained by such methods.

Consider the usual textbook exposition of least squares regression. We have n observational units, indexed by i = 1, . . . , n. There is a response variable yi , conceptualized as μi + i , where μi is the theoretical mean of yi while the disturbances or errors i represent the impact of random variation (sometimes of omitted variables). The errors are assumed to be drawn independently from a common (gaussian) distribution with mean 0 and finite variance. Generally, the error distribution is not empirically identifiable outside the model; so it cannot be studied directly—even in principle—without the model. The error distribution is an imaginary population and the errors i are treated as if they were a random sample from this imaginary population—a research strategy whose frailty was discussed earlier.

Usually, explanatory variables are introduced and μi is hypothesized to be a linear combination of such variables. The assumptions about the μi and i are seldom justified or even made explicit—although minor correlations in the i can create major bias in estimated standard errors for coefficients …

Why do μi and i behave as assumed? To answer this question, investigators would have to consider, much more closely than is commonly done, the connection between social processes and statistical assumptions …

Conclusions

We have tried to demonstrate that statistical inference with convenience samples is a risky business. While there are better and worse ways to proceed with the data at hand, real progress depends on deeper understanding of the data-generation mechanism. In practice, statistical issues and substantive issues overlap. No amount of statistical maneuvering will get very far without some understanding of how the data were produced.

More generally, we are highly suspicious of efforts to develop empirical generalizations from any single dataset. Rather than ask what would happen in principle if the study were repeated, it makes sense to actually repeat the study. Indeed, it is probably impossible to predict the changes attendant on replication without doing replications. Similarly, it may be impossible to predict changes resulting from interventions without actually intervening.

A quick refresher on mathematical induction (student stuff)

15 June, 2013 at 10:28 | Posted in Statistics & Econometrics | Leave a comment

 

Chebyshev’s Inequality Theorem (student stuff)

12 June, 2013 at 16:21 | Posted in Statistics & Econometrics | Leave a comment

Chebyshev’s Inequality Theorem – named after Russian mathematician Pafnuty Chebyshev – states that for a population (or sample) at most 1/kof the distribution’s values can be more than k standard deviations away from the mean. The beauty of the theorem is that although we may not know the exact distribution of the data – e.g. if it’s normally distributed  - we may still say with certitude (since the theorem holds universally)  that there are bounds on probabilities!

Markov’s Inequality

12 June, 2013 at 12:04 | Posted in Statistics & Econometrics | Leave a comment

One of the most beautiful results of probability theory is Markov’s inequality (after the Russian mathematician Andrei Markov (1856-1922)):

If X is a non-negative stochastic variable (X ≥ 0) with a finite expectation value E(X), then for every a > 0

P{X ≥ a} ≤ E(X)/a

If, e.g., the production of cars in a factory during a week is assumed to be a stochastic variable with an expectation value (mean) of 50 units, we can – based on nothing else but the inequality – conclude that the probability that the production for a week would be greater than 100 units can not exceed 50% [P(X≥100)≤(50/100)=0.5 = 50%]

I still feel a humble awe at this immensely powerful result. Without knowing anything else but an expected value (mean) of a probability distribution we can deduce upper limits for probabilities. The result hits me as equally suprising today as thirty years ago when I first run into it as a student of mathematical statistics.

[For a derivation of the inequality, see e.g. Sheldon Ross, Introduction to Probability and Statistics for Engineers and Scientists, Academic Press, 2009, p. 129]

A quick refresher on Cumulative Distribution Functions (student stuff)

11 June, 2013 at 09:44 | Posted in Statistics & Econometrics | Leave a comment

 

Fun with statistics

10 June, 2013 at 15:25 | Posted in Statistics & Econometrics | 1 Comment

Yours truly gave a PhD course in statistics for students in education and sports this semester. And between teaching them all about Chebyshev’s Theorem, Beta Distributions, Moment-Generating Functions and the Neyman-Pearson Lemma, I tried to remind them that statistics can actually also be fun …
 

On the significance of significance tests – Andrew Gelman vs. Deborah Mayo

8 June, 2013 at 10:12 | Posted in Statistics & Econometrics | Leave a comment

Mayo says: 
June 4, 2013 at 9:46 pm 
Andrew: You seem to have undergone a gestalt switch from the Gelman of a short time ago–the one who embraced significance tests. 
http://www.rmm-journal.de/downloads/Article_Gelman.pdf

Andrew says: 
June 4, 2013 at 10:08 pm 
Mayo:
I believed, and still believe, in checking the fit of a model by comparing data to hypothetical replications. This is not the same as significance testing in which a p-value is used to decide whether to reject a model or whether to believe that a finding is true.

Mayo says: 
June 4, 2013 at 10:38 pm 
Gelman: I don’t know that significance tests are used to decide that a finding is true, and I’m surprised to see you endorsing/spreading the hackneyed and much lampooned view of significance tests, p-values, etc. despite so many of us trying to correct the record. And statistical hypothesis testing denies uncertainty? Where in the world do you get this? (I know it’s not because they don’t use posterior probabilities…)
But never mind, let me ask: when you check the fit of a model using p-value assessments, are you not inferring the adequacy/inadequacy of the model? Tell me what you are doing if not. I don’t particularly like calling it a decision, neither do many people, and I like viewing the output as “whether to believe” even less. But I don’t know what your output is supposed to be.

Andrew says: 
June 4, 2013 at 10:53 pm 
Mayo: 
1. I don’t think hypothesis testing inherently denies uncertainty. But I do think that it is used by many researchers as a way of avoiding uncertainty: it’s all too common for “significant” to be interpreted as “true” and “non-significant” to be interpreted as “zero.” Consider, for example, all the trash science we’ve been discussing on this blog recently, studies that may have some scientific content but which get ruined by their authors’ deterministic interpretations.
2. When I check the fit of a model, I’m assessing its adequacy for some purpose. This is not the same as looking for p< .05 or p<.01 in order to go around saying that some theory is now true.

Mayo says: 
June 4, 2013 at 11:04 pm 
Andrew: I fail to see how a deterministic interpretation could go hand in hand with error probabilities; and I never hear even the worst test abusers declare a theory is not true, give me A break…
So when you assess adequacy for a purpose, what does this mean? Adequate vs inadequate for a purpose is pretty dichotomous. Do you assess how adequate? I’m unclear as to where the uncertainty enters for you, because as I understand it is not in terms of a posterior probability.

Andrew says: 
June 4, 2013 at 11:18 pm 
Mayo:
Here’s a quote from a researcher, I posted it on the blog a few days ago: “Our results demonstrate that physically weak males are more reluctant than physically strong males to assert their self-interest…”
Here’s another quote: “Ovulation led single women to become more liberal, less religious, and more likely to vote for Barack Obama. In contrast, ovulation led married women to become more conservative, more religious, and more likely to vote for Mitt Romney.”
These are deterministic statements based on nothing more than p-values that happen to be statistically significant. Researchers make these sorts of statements all the time. It’s not your fault, I’m not saying you would do this, but it’s a serious problem.
Along similar lines, we’ll see claims that a treatment has an effect on men and not on women, when really what is happening is that p< .05 for the men in the study and p>.05 for the women.
In addition to brushing away uncertainty, people also seem to want to brush away uncertainty, thus talking about “the effect” as if it is a constant across all groups and all people. A recent example featured on this blog was a study primarily of male college students which was referred repeatedly (by its authors, not just by reporters and public relations people) as a study of “men” with no qualifications.
P.S. Bayesians do this too, indeed there’s a whole industry (which I hate) of Bayesian methods for getting the posterior probability that a null hypothesis is true. Bayesians use different methods but often have the misguided goal of other statisticians, to deny uncertainty and variation.

Mayo says: 
June 5, 2013 at 6:47 pm 
These moves from observed associations, and even correlations, to causal claims are poorly warranted, but these are classic fallacies that go beyond tests to reading all manner of “explanations” into the data. I find it very odd to view this as a denial of uncertainty by significance tests. Even if they got their statistics right, the link from stat to substantive causal claim would exist. I just find it odd to regard the statistical vs substantive and correlation vs cause fallacies, which every child knows, some kind of shortcoming with significance tests. Any method or no method can commit these fallacies, especially from observational studies. But when you berate the tests as somehow responsible, you misleadingly suggest that other methods are better, rather than worse. At least error statistical methods can identify the flaws at 3 levels (data, statistical inference, stat-> substantive causal claim) in a systematic way. We can spot the flaws a mile off…
I still don’t know where you want the uncertainty to show up; I’ve indicated how I do.

Andrew says: 
June 5, 2013 at 8:34 pm 
Mayo:
You write, “I still don’t know where you want the uncertainty to show up;” I want the uncertainty to show up in a posterior distribution for continuous parameters, as described in my books.

Mayo says: 
June 6, 2013 at 9:59 am 
Andrew (couldn’t post under your comment). You write, “I want the uncertainty to show up in a posterior distribution for continuous parameters”. Let’s see if I have this right. You would report the posterior probabilities that a model was adequate for a goal. Yes? Now you have also said you are a falsificationist. So is your falsification rule to move from a low enough posterior probability in the adequacy of a model, to the falsity of a claim that the model of is adequate (for the goal). And would high enough posterior in the adequacy of a model translate into something like, not being able to falsify its adequacy or perhaps, accepting it as adequate (the latter would not be falsificationist, but might be more sensible than the former). Or are you no longer falsificationist-leaning.

Andrew says: 
June 6, 2013 at 10:56 am 
Mayo: 
No, I would not “report the posterior probabilities that a model was adequate for a goal.” That makes no sense to me. I would report the posterior distribution of parameters and make probabilistic predictions within a model.

Mayo says: 
June 6, 2013 at 5:14 pm 
Andrew: Well if you’re going to falsify as a result, you need a rule from these posteriors to infer the predictions are met satisfactorily or not. Else there is no warrant for rejecting/improving the model. That’s the kind of thing significance tests can do. But specifically, with respect to the misleading interpretations of data that you were just listing, it isn’t obvious how they are avoided by you. The data may fit these hypotheses swimmingly. 
Anyhow, this is not the place to discuss this further. In signing off, I just want to record my objection to (mis)portraying statistical tests and other error statistical methods as flawed because of some blatant, age-old misuses or misleading language, like “demonstrate” (flaws that are at least detectable and self-correctable by these same methods, whereas they might remain hidden by other methods now in use). [Those examples should not even be regarded as seeking evidence but at best colorful and often pseudoscientific interpretations.] When the Higgs particle physicists found their 2 and 3 standard deviation effects were disappearing with new data—just to mention a recent example from my blog—they did not say the flaw was with the p-values! They tightened up their analyses and made them more demanding. They didn’t report posterior distributions for the properties of the Higgs, but they were able to make inferences about their values, and identify gaps for further analysis.
http://errorstatistics.com/2013/03/17/update-on-higgs-data-analysis-statistical-flukes-1/

Statistical Modeling, Causal Inference, and Social Science

For my own take on significance tests see here, here, here, and here.

A quick refresher on Probability Density Functions (student stuff)

8 June, 2013 at 09:31 | Posted in Statistics & Econometrics | Leave a comment

 

Truth as replicability (wonkish)

6 June, 2013 at 09:30 | Posted in Statistics & Econometrics | Leave a comment

Much of statistical practice is an effort to reduce or deny variation and uncertainty. The reduction is done through standardization, replication, and other practices of experimental design, with the idea being to isolate and stabilize the quantity being estimated and then average over many cases. Even so, however, uncertainty persists, and statistical hypothesis testing is in many ways an endeavor to deny this, by reporting binary accept/reject decisions.

Classical statistical methods produce binary statements, but there is no reason to assume that the world works that way. Expressions such as Type 1 error, Type 2 error, false positive, and so on, are based on a model in which the world is divided into real and non-real effects. To put it another way, I understand the general scientific distinction of real vs. non-real effects but I do not think this maps well into the mathematical distinction of θ=0 vs. θ≠0. Yes, there are some unambiguously true effects and some that are arguably zero, but I would guess that the challenge in most current research in psychology is not that effects are zero but that they vary from person to person and in different contexts.

4_1But if we do not want to characte-rize science as the search for true positives, how should we statistically model the process of scientific publication and discovery? An empirical approach is to identify scientific truth with replicability; hence, the goal of an experimental or observational scientist is to discover effects that replicate in future studies.

The replicability standard seems to be reasonable. Unfortunately … researchers in psychology (and, presumably, in other fields as well) seem to have no problem replicating and getting statistical significance, over and over again, even in the absence of any real effects of the size claimed by the researchers …

As a student many years ago, I heard about opportunistic stopping rules, the file drawer problem, and other reasons why nominal p-values do not actually represent the true probability that observed data are more extreme than what would be expected by chance. My impression was that these problems represented a minor adjustment and not a major reappraisal of the scientific process. After all, given what we know about scientists’ desire to communicate their efforts, it was hard to imagine that there were file drawers bulging with unpublished results.

More recently, though, there has been a growing sense that psychology, biomedicine, and other fields are being overwhelmed with errors (consider, for example, the generally positive reaction to the paper of Ioannidis, 2005). In two recent series of papers, Gregory Francis and Uri Simonsohn and collaborators have demonstrated too-good-to-be-true patterns of p-values in published papers, indicating that these results should not be taken at face value.

Andrew Gelman

Econometric testing with the net down

2 June, 2013 at 09:42 | Posted in Statistics & Econometrics | 2 Comments

truthversusprecisionSuppose you test a highly confirmed hypothesis, for example, that the price elasticity of demand is negative. What would you do if the computer were to spew out a positive coefficient? Surely you would not claim to have overthrown the law of demand … Instead, you would rerun many variants of your regression until the recalcitrant computer finally acknowledged the sovereignty of your theory …

Only the naive are shocked by such soft and gentle testing … Easy it is. But also wrong, when the purpose of the exercise is not to use a hypothesis, but to determine its validity …

Econometric tests are far from useless. They are worth doing, and their results do tell something … But many economists insist that economics can deliver more, much more, than merely, more or less, plausible knowledge, that it can reach its results with compelling demonstrations. By such a standard how should one describe our usual way of testing hypotheses? One possibility is to interpret it as Blaug [The Methodology of Economics, 1980, p. 256] does, as ‘playing tennis with the net down’ …

Perhaps my charge that econometric testing lacks seriousness of purpose is wrong … But regardless of  the cause, it should be clear that most econometric testing is not rigorous. Combining such tests with formalized theoretical analysis or elaborate techniques is another instance of the principle of the strongest link. The car is sleek and elegant; too bad the wheels keep falling off.

A quick refresher on the Central Limit Theorem (student stuff)

1 June, 2013 at 09:04 | Posted in Statistics & Econometrics | 1 Comment

 

Useless stochastic models

31 May, 2013 at 08:11 | Posted in Economics, Statistics & Econometrics | 6 Comments

To understand real world “non-routine” decisions and unforeseeable changes in behaviour, ergodic probability distributions are of no avail. In a world full of genuine uncertainty – where real historical time rules the roost – the probabilities that ruled the past are not necessarily those that will rule the future.

hicksbbcWhen we cannot accept that the observations, along the time-series available to us, are independent … we have, in strict logic, no more than one observation, all of the separate items having to be taken together. For the analysis of that the probability calculus is useless; it does not apply … I am bold enough to conclude, from these considerations that the usefulness of ‘statistical’ or ‘stochastic’ methods in economics is a good deal less than is now conventionally supposed … We should always ask ourselves, before we apply them, whether they are appropriate to the problem in hand. Very often they are not … The probability calculus is no excuse for forgetfulness.

John Hicks, Causality in Economics, 1979:121

To simply assume that economic processes are ergodic – and a fortiori in any relevant sense timeless – is not a sensible way for dealing with the kind of genuine uncertainty that permeates open systems such as economies.

On the impossibility of predicting the future

28 May, 2013 at 20:02 | Posted in Statistics & Econometrics | 1 Comment

 

What is randomness?

28 May, 2013 at 08:38 | Posted in Statistics & Econometrics | 1 Comment

Modern probabilistic econometrics relies on the notion of probability. To at all be amenable to econometric analysis, economic observations allegedly have to be conceived as random events.

But is it really necessary to model the economic system as a system where randomness can only be analyzed and understood when based on an a priori notion of probability?

In probabilistic econometrics, events and observations are as a rule interpreted as random variables as if generated by an underlying probability density function, and a fortiori – since probability density functions are only definable in a probability context – consistent with a probability. As Haavelmo (1944:iii) has it:

For no tool developed in the theory of statistics has any meaning – except , perhaps for descriptive purposes – without being referred to some stochastic scheme.

When attempting to convince us of the necessity of founding empirical economic analysis on probability models, Haavelmo – building largely on the earlier Fisherian paradigm – actually forces econometrics to (implicitly) interpret events as random variables generated by an underlying probability density function.

This is at odds with reality. Randomness obviously is a fact of the real world. Probability, on the other hand, attaches to the world via intellectually constructed models, and a fortiori is only a fact of a probability generating  machine or a well constructed experimental arrangement or “chance set-up”.

Just as there is no such thing as a “free lunch,” there is no such thing as a “free probability.” To be able at all to talk about probabilities, you have to specify a model. If there is no chance set-up or model that generates the probabilistic outcomes or events – in statistics one refers to any process where you observe or measure as an experiment (rolling a die) and the results obtained as the outcomes or events (number of points rolled with the die, being e. g. 3 or 5) of the experiment –there strictly seen is no event at all.

Probability is a relational element. It always must come with a specification of the model from which it is calculated. And then to be of any empirical scientific value it has to be shown to coincide with (or at least converge to) real data generating processes or structures – something seldom or never done!

And this is the basic problem with economic data. If you have a fair roulette-wheel, you can arguably specify probabilities and probability density distributions. But how do you conceive of the analogous nomological machines for prices, gross domestic product, income distribution etc? Only by a leap of faith. And that does not suffice. You have to come up with some really good arguments if you want to persuade people into believing in the existence of socio-economic structures that generate data with characteristics conceivable as stochastic events portrayed by probabilistic density distributions!

From a realistic point of view we really have to admit that the socio-economic states of nature that we talk of in most social sciences – and certainly in econometrics – are not amenable to analyze as probabilities, simply because in the real world open systems that social sciences – including econometrics – analyze, there are no probabilities to be had!

The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot really be maintained – as in the Haavelmo paradigm of probabilistic econometrics – that it even should be mandatory to treat observations and data – whether cross-section, time series or panel data – as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette-wheels. Data generating processes – at least outside of nomological machines like dice and roulette-wheels – are not self-evidently best modeled with probability measures.

If we agree on this, we also have to admit that probabilistic econometrics lacks a sound justification. I would even go further and argue that there really is no justifiable rationale at all for this belief that all economically relevant data can be adequately captured by a probability measure. In most real world contexts one has to argue one’s case. And that is obviously something seldom or never done by practitioners of probabilistic econometrics.

Econometrics and probability are intermingled with randomness. But what is randomness?

In probabilistic econometrics it is often defined with the help of independent trials – two events are said to be independent if the occurrence or nonoccurrence of either one has no effect on the probability of the occurrence of the other – as drawing cards from a deck, picking balls from an urn, spinning a roulette wheel or tossing coins – trials which are only definable if somehow set in a probabilistic context.

But if we pick a sequence of prices – say 2, 4, 3, 8, 5, 6, 6 – that we want to use in an econometric regression analysis, how do we know the sequence of prices is random and a fortiori being able to treat as generated by an underlying probability density function? How can we argue that the sequence is a sequence of probabilistically independent random prices? And are they really random in the sense that is most often applied in probabilistic econometrics – where X is called a random variable only if there is a sample space S with a probability measure and X is a real-valued function over the elements of S?

Bypassing the scientific challenge of going from describable randomness to calculable probability by just assuming it, is of course not an acceptable procedure. Since a probability density function is a “Gedanken” object that does not exist in a natural sense, it has to come with an export license to our real target system if it is to be considered usable.

Among those who at least honestly try to face the problem – the usual procedure is to refer to some artificial mechanism operating in some “games of chance” of the kind mentioned above and which generates the sequence. But then we still have to show that the real sequence somehow coincides with the ideal sequence that defines independence and randomness within our – to speak with science philosopher Nancy Cartwright (1999) – “nomological machine”, our chance set-up, our probabilistic model.

As the originator of the Kalman filter, Rudolf Kalman (1994:143), notes:

Not being able to test a sequence for ‘independent randomness’ (without being told how it was generated) is the same thing as accepting that reasoning about an “independent random sequence” is not operationally useful.

So why should we define randomness with probability? If we do, we have to accept that to speak of randomness we also have to presuppose the existence of nomological probability machines, since probabilities cannot be spoken of – and actually, to be strict, do not at all exist – without specifying such system-contexts (how many sides do the dice have, are the cards unmarked, etc)

If we do adhere to the Fisher-Haavelmo paradigm of probabilistic econometrics we also have to assume that all noise in our data is probabilistic and that errors are well-behaving, something that is hard to justifiably argue for as a real phenomena, and not just an operationally and pragmatically tractable assumption.

Maybe Kalman’s (1994:147) verdict that

Haavelmo’s error that randomness = (conventional) probability is just another example of scientific prejudice

is, from this perspective seen, not far-fetched.

Accepting Haavelmo’s domain of probability theory and sample space of infinite populations– just as Fisher’s (1922:311) “hypothetical infinite population, of which the actual data are regarded as constituting a random sample”, von Mises’ “collective” or Gibbs’ ”ensemble” – also implies that judgments are made on the basis of observations that are actually never made!

Infinitely repeated trials or samplings never take place in the real world. So that cannot be a sound inductive basis for a science with aspirations of explaining real-world socio-economic processes, structures or events. It’s not tenable.

As David Salsburg (2001:146) notes on probability theory:

[W]e assume there is an abstract space of elementary things called ‘events’ … If a measure on the abstract space of events fulfills certain axioms, then it is a probability. To use probability in real life, we have to identify this space of events and do so with sufficient specificity to allow us to actually calculate probability measurements on that space … Unless we can identify [this] abstract space, the probability statements that emerge from statistical analyses will have many different and sometimes contrary meanings.

Just as e. g. Keynes (1921) and Georgescu-Roegen (1971), Salsburg (2001:301f) is very critical of the way social scientists – including economists and econometricians – uncritically and without arguments have come to simply assume that one can apply probability distributions from statistical theory on their own area of research:

Probability is a measure of sets in an abstract space of events. All the mathematical properties of probability can be derived from this definition. When we wish to apply probability to real life, we need to identify that abstract space of events for the particular problem at hand … It is not well established when statistical methods are used for observational studies … If we cannot identify the space of events that generate the probabilities being calculated, then one model is no more valid than another … As statistical models are used more and more for observational studies to assist in social decisions by government and advocacy groups, this fundamental failure to be able to derive probabilities without ambiguity will cast doubt on the usefulness of these methods.

Some wise words that ought to be taken seriously by probabilistic econometricians is also given by mathematical statistician Gunnar Blom (2004:389):

If the demands for randomness are not at all fulfilled, you only bring damage to your analysis using statistical methods. The analysis gets an air of science around it, that it does not at all deserve.

Richard von Mises (1957:103) noted that

Probabilities exist only in collectives … This idea, which is a deliberate restriction of the calculus of probabilities to the investigation of relations between distributions, has not been clearly carried through in any of the former theories of probability.

And obviously not in Haavelmo’s paradigm of probabilistic econometrics either. It would have been better if one had heeded von Mises warning (1957:172) that

the field of application of the theory of errors should not be extended too far.

This importantly also means that if you cannot show that data satisfies all the conditions of the probabilistic nomological machine – including randomness – then the statistical inferences used, lack sound foundations!

References

Gunnar Blom et al: Sannolikhetsteori och statistikteori med tillämpningar, Lund: Studentlitteratur.

Cartwright, Nancy (1999), The Dappled World. Cambridge: Cambridge University Press.

Fisher, Ronald (1922), On the mathematical foundations of theoretical statistics. Philosophical Transactions of The Royal Society A, 222.

Georgescu-Roegen, Nicholas (1971), The Entropy Law and the Economic Process. Harvard University Press.

Haavelmo, Trygve  (1944), The probability approach in econometrics. Supplement to Econometrica 12:1-115.

Kalman, Rudolf (1994), Randomness Reexamined. Modeling, Identification and Control  3:141-151.

Keynes, John Maynard  (1973 (1921)), A Treatise on Probability. Volume VIII of The Collected Writings of John Maynard Keynes, London: Macmillan.

Pålsson Syll, Lars (2007), John Maynard Keynes. Stockholm: SNS Förlag.

Salsburg, David (2001), The Lady Tasting Tea. Henry Holt.

von Mises, Richard (1957), Probability, Statistics and Truth. New York: Dover Publications.

Next Page »

Blog at WordPress.com. | Theme: Pool by Borja Fernandez.
Entries and comments feeds.