## On gender and alcohol

23 Jan, 2023 at 18:15 | Posted in Statistics & Econometrics | 2 CommentsEconometric identification sure is difficult …

## Causal inferences — what Big Data cannot give us

20 Jan, 2023 at 13:25 | Posted in Statistics & Econometrics | Leave a comment.

The central problem with the present ‘Machine Learning’ and ‘Big Data’ hype is that so many — falsely — think that they can get away with analyzing real-world phenomena without any (commitment to) theory. But — data never speaks for itself. Data by themselves are useless. Without a prior statistical set-up, there actually are no data at all to process.

Clever data-mining tricks are not enough to answer important scientific questions. Theory matters.

If we wanted highly probable claims, scientists would stick to low-level observables and not seek generalizations, much less theories with high explanatory content. In this day of fascination with Big data’s ability to predict what book I’ll buy next, a healthy Popperian reminder is due: humans also want to understand and to explain. We want bold ‘improbable’ theories. I’m a little puzzled when I hear leading machine learners praise Popper, a realist, while proclaiming themselves fervid instrumentalists. That is, they hold the view that theories, rather than aiming at truth, are just instruments for organizing and predicting observable facts. It follows from the success of machine learning, Vladimir Cherkassy avers, that “realism is not possible.” This is very quick philosophy!

Quick indeed!

## The 25 Best Econometrics Blogs

13 Jan, 2023 at 09:58 | Posted in Statistics & Econometrics | 1 CommentYours truly, of course, feels truly honored to find himself on the list of the world’s 25 best econometrics blogs.

7. Eran Raviv Blog Statistics and Econometrics

9. How the (Econometric) Sausage is Made

14. Lars P Syll

Pålsson Syll received a Ph.D. in economic history in 1991 and a Ph.D. in economics in 1997, both at Lund University. He became an associate professor in economic history in 1995 and has since 2004 been a professor of social science at Malmö University. His primary research areas have been in the philosophy, history, and methodology of economics.

## Econometric testing

8 Jan, 2023 at 14:09 | Posted in Statistics & Econometrics | 1 CommentDebating econometrics and its shortcomings yours truly often gets the response from econometricians that “ok, maybe econometrics isn’t perfect, but you have to admit that it is a great technique for empirical testing of economic hypotheses.”

But is econometrics — really — such a great testing instrument?

Econometrics is supposed to be able to test economic theories. But to serve as a testing device you have to make many assumptions, many of which themselves cannot be tested or verified. To make things worse, there are also only rarely strong and reliable ways of telling us which set of assumptions is to be preferred. Trying to test and infer causality from (non-experimental) data you have to rely on assumptions such as disturbance terms being ‘independent and identically distributed’; functions being additive, linear, and with constant coefficients; parameters being’ ‘invariant under intervention; variables being ‘exogenous’, ‘identifiable’, ‘structural and so on. Unfortunately, we are seldom or never informed of where that kind of ‘knowledge’ comes from, beyond referring to the economic theory that one is supposed to test. Performing technical tests is of course needed, but perhaps even more important is to know — as David Colander put it — “how to deal with situations where the assumptions of the tests do not fit the data.”

That leaves us in the awkward position of having to admit that if the assumptions made do not hold, the inferences, conclusions, and testing outcomes econometricians come up with simply do not follow from the data and statistics they use.

The central question is “how do we learn from empirical data?” Testing statistical/econometric models is one way, but we have to remember that the value of testing hinges on our ability to validate the — often unarticulated technical — basic assumptions on which the testing models build. If the model is wrong, the test apparatus simply gives us fictional values. There is always a strong risk that one puts a blind eye to some of those non-fulfilled technical assumptions that actually make the testing results — and the inferences we build on them — unwarranted.

Haavelmo’s probabilistic revolution gave econometricians their basic framework for testing economic hypotheses. It still builds on the assumption that the hypotheses can be treated as hypotheses about (joint) probability distributions and that economic variables can be treated as if pulled out of an urn as a random sample. But as far as I can see economic variables are nothing of that kind.

I still do not find any hard evidence that econometric testing uniquely has been able to “exclude a theory”. As Renzo Orsi put it: “If one judges the success of the discipline on the basis of its capability of eliminating invalid theories, econometrics has not been very successful.”

Most econometricians today … believe that the main objective of applied econometrics is the confrontation of economic theories with observable phenomena. This involves theory testing, for example testing monetarism or rational consumer behaviour. The econometrician’s task would be to find out whether a particular economic theory is true or not, using economic data and statistical tools. Nobody would say that this is easy. But is it possible? This question is discussed in Keuzenkamp and Magnus (1995). At the end of our paper we invited the readers to name a published paper that contains a test which, in their opinion, significantly changed the way economists think about some economic proposition … What happened? One Dutch colleague called me up and asked whether he could participate without having to accept the prize. I replied that he could, but he did not participate. Nobody else responded. Such is the state of current econometrics.

## The econometric illusion

7 Jan, 2023 at 17:51 | Posted in Statistics & Econometrics | 11 CommentsWhat has always bothered me about the “experimentalist” school is the false sense of certainty it conveys. The basic idea is that if we have a “really good instrument” we can come up with “convincing” estimates of “causal effects” that are not “too sensitive to assumptions.” Elsewhere I have written an extensive critique of this experimentalist perspective, arguing it presents a false panacea, andthat allstatistical inference relies on some untestable assumptions …

Consider Angrist and Lavy (1999), who estimate the effect of class size on student performance by exploiting variation induced by legal limits. It works like this: Let’s say a law prevents class size from exceeding. Let’s further assume a particular school has student cohorts that average about 90, but that cohort size fluctuates between, say, 84 and 96. So, if cohort size is 91–96 we end up with four classrooms of size 22 to 24, while if cohort size is 85–90 we end up with three classrooms of size 28 to 30. By comparing test outcomes between students who are randomly assigned to the small vs. large classes (based on their exogenous birth timing), we obtain a credible estimate of the effect of class size on academic performance. Their answer is that a ten-student reduction raises scores by about 0.2 to 0.3 standard deviations.

This example shares a common characteristic of natural experiment studies, which I think accounts for much of their popularity: At first blush, the results do seem incredibly persuasive. But if you think for awhile, you start to see they rest on a host of assumptions. For example, what if schools that perform well attract more students? In this case, incoming cohort sizes are not random, and the whole logic beaks down. What if parents who care most about education respond to large class sizes by sending their kids to a different school? What if teachers assigned to the extra classes offered in high enrollment years are not a random sample of all teachers?

Keane’s critique of econometric ‘experimentalists’ gives a fair picture of some of the unfounded and exaggerated claims put forward in many econometric natural experiment studies. But — much of the critique really applies to econometrics in general, including the kind of ‘structural’ econometrics Keane himself favours!

The processes that generate socio-economic data in the real world cannot just be assumed to always be adequately captured by a probability measure. And, so, it cannot be maintained that it even should be mandatory to treat observations and data — whether cross-section, time series or panel data — as events generated by some probability model. The important activities of most economic agents do not usually include throwing dice or spinning roulette wheels. Data-generating processes — at least outside of nomological machines like dice and roulette wheels — are not self-evidently best modelled with probability measures.

When economists and econometricians — often uncritically and without arguments — simply assume that one can apply probability distributions from statistical theory to their own area of research, they are skating on thin ice. If you cannot show that data satisfies *all* the conditions of the probabilistic ‘nomological machine,’ then the statistical inferences made in mainstream economics lack sound foundations.

Statistical — and econometric — patterns should never be seen as anything other than possible clues to follow. Behind observable data, there are real structures and mechanisms operating, things that are — if we really want to understand, explain and (possibly) predict things in the real world — more important to get hold of than to simply correlate and regress observable variables.

Statistics cannot establish the truth value of a fact. Never has. Never will.

## Combinatorics (VI)

29 Dec, 2022 at 15:21 | Posted in Statistics & Econometrics | 4 CommentsIn my library, there are n philosophy books and six economics books. If yours truly can choose two books of each type in 150 ways, how many philosophy books are there in my library?

## Do complex numbers exist? (wonkish)

22 Dec, 2022 at 12:12 | Posted in Statistics & Econometrics | Comments Off on Do complex numbers exist? (wonkish).

## Christmas combinatorics

21 Dec, 2022 at 10:49 | Posted in Statistics & Econometrics | 13 CommentsYours truly has invited five of his colleagues to share a Christmas lunch with him around the circular dinner table in his kitchen. Unfortunately, two of the colleagues are not on speaking terms with each other, so they cannot be seated together. In how many ways can we be seated around the table?

## Weekend combinatorics (V)

17 Dec, 2022 at 09:58 | Posted in Statistics & Econometrics | 8 CommentsAn easy one this week: Yours truly wants to distribute 30 one-dollar coins among his kids Linnea, David, and Tora. In how many ways can this be done if he wants all of them to get at least one coin?

## Monte Carlo simulations — no substitute for thinking

15 Dec, 2022 at 14:26 | Posted in Statistics & Econometrics | Comments Off on Monte Carlo simulations — no substitute for thinkingIn some fields—physics, geophysics, climate science, sensitivity analysis, and uncertainty quantification in particular—there is a popular impression that probabilities can be estimated in a ‘neutral’ or ‘automatic’ way by doing Monte Carlo simulations: just let the computer reveal the distribution …

Setting aside other issues in numerical modeling, Monte Carlo simulation is a way to substitute computing for hand calculation. It is not a way to discover the probability distribution of anything; it is a way to estimate the numerical values that result from an assumed distribution. It is a substitute for doing an integral, not a way to uncover laws of Nature.

Monte Carlo doesn’t tell you anything that wasn’t already baked into the simulation. The distribution of the output comes from assumptions in the input (modulo bugs): a probability model for the parameters in the simulation. It comes from what you program the computer to do. Monte Carlo reveals the consequences of your assumptions about randomness. The rabbit goes into the hat when you build the probability model and write the software. The rabbit does not come out of the hat without having gone into the hat first.

Stark’s article is an absolute must-read! One of the best statistics critiques yours truly has read for years.

## Freedman’s Rabbit Theorem

13 Dec, 2022 at 11:10 | Posted in Statistics & Econometrics | 3 CommentsIn econometrics one often gets the feeling that many of its practitioners think of it as a kind of automatic inferential machine: input data and out comes causal knowledge. This is like pulling a rabbit from a hat. Great, but as renowned statistician David Freedman had it, first you must put the rabbit in the hat. And this is where assumptions come into the picture.

The assumption of imaginary ‘superpopulations’ is one of the many dubious assumptions used in modern econometrics, and as Clint Ballinger has highlighted, this is a particularly questionable rabbit-pulling assumption:

Inferential statistics are based on taking a random sample from a larger population … and attempting to draw conclusions about a) the larger population from that data and b) the probability that the relations between measured variables are consistent or are artifacts of the sampling procedure.

However, in political science, economics, development studies and related fields the data often represents as complete an amount of data as can be measured from the real world (an ‘apparent population’). It is not the result of a random sampling from a larger population. Nevertheless, social scientists treat such data as the result of random sampling.

Because there is no source of further cases a fiction is propagated—the data is treated as if it were from a larger population, a ‘superpopulation’ where repeated realizations of the data are imagined. Imagine there could be more worlds with more cases and the problem is fixed …

What ‘draw’ from this imaginary superpopulation does the real-world set of cases we have in hand represent? This is simply an unanswerable question. The current set of cases could be representative of the superpopulation, and it could be an extremely unrepresentative sample, a one in a million chance selection from it …

The problem is not one of statistics that need to be fixed. Rather, it is a problem of the misapplication of inferential statistics to non-inferential situations.

## Weekend combinatorics work problem (IV)

10 Dec, 2022 at 15:46 | Posted in Statistics & Econometrics | 10 CommentsWhen my daughter (who studies mathematics) and yours truly solve a combinatorics problem together it takes 12 minutes. If my daughter tries to solve the problem herself it takes her 10 minutes more than it takes when I solve it alone. How long does it take me to solve the problem?

## Weekend combinatorics (III)

2 Dec, 2022 at 13:11 | Posted in Statistics & Econometrics | 3 CommentsAn easy one this week: At a small economics conference, a photographer wants to line up nine participants for a photo. Two of them — Robert and Milton — insist on standing next to each other. How many different arrangements (lineups) are possible?

## Chebyshev’s and Markov’s Inequality Theorems

28 Nov, 2022 at 14:45 | Posted in Statistics & Econometrics | Comments Off on Chebyshev’s and Markov’s Inequality TheoremsChebyshev’s Inequality Theorem — named after Russian mathematician Pafnuty Chebyshev (1821-1894) — states that for a population (or sample) at most 1/k^{2 }of the distribution’s values can be more than k standard deviations away from the mean. The beauty of the theorem is that although we may not know the exact distribution of the data — e.g. if it’s normally distributed — we may still say with certitude (since the theorem holds universally) that there are bounds on probabilities!

Another beautiful result of probability theory is Markov’s inequality (after the Russian mathematician Andrei Markov (1856-1922)):

If X is a non-negative stochastic variable (X ≥ 0) with a finite expectation value E(X), then for every a > 0

P{X ≥ a} ≤ E(X)/a

If the production of cars in a factory during a week is assumed to be a stochastic variable with an expectation value (mean) of 50 units, we can — based on nothing else but the inequality — conclude that the probability that the production for a week would be greater than 100 units can not exceed 50% [P(X≥100)≤(50/100)=0.5=50%]

I still feel humble awe at this immensely powerful result. Without knowing anything else but an expected value (mean) of a probability distribution we can deduce upper limits for probabilities. The result hits me as equally surprising today as forty-five years ago when I first run into it as a student of mathematical statistics.

## How to prove things

16 Nov, 2022 at 14:27 | Posted in Statistics & Econometrics | Comments Off on How to prove things.

Great lecture series.

Yours truly got Solow’s book when he was studying mathematics back in the 80s.

Now in its 6th edition, it’s better than ever.

Blog at WordPress.com.

Entries and Comments feeds.