Topics in econometrics: Identification, reproducibility, and education

Andrew Adrian Pua

2023-09-11

What the talk is about

I want to give an overview of some of my research interests and hope to get you interested in them too.
Three themes:
- Education
- Identification
- Reproducibility

Education: Get all of us on the same page about the method of moments.
- Why? Because I want you to know more about linear dynamic panel data methods.
- What’s the point if software already exists? Many things we do not know very well.

Identification: Can we uniquely learn an unknown parameter?
- Why? Because we need to be able to communicate about the same thing.
- What’s the point if we only look at unrealistic models? Hopefully they shed light on more complex models.

Reproducibility: Trust, but verify.
- Why? Because research resources are scarce.
- What’s the point if it is too much work? Introduce a “spellchecker” instead.

Takeaway messages

It is getting harder to pile on prerequisites to get to the point where a student is ready for advanced econometric methods. Emphasize a “shortest path” approach and exploit parallels across different toy models as much as possible.
Even a linear AR(1) panel data model is a complicated toy model.
If there is uncertainty about a report with respect to the instruments used, an algebraic check may be applied as a quick “spell-check”.

Outline

Method of moments through examples
Understand a popular identification argument used in linear dynamic panel data models
Illustrate through a simulation of a toy model the (not very well-known) identification issues
Show you reproducibility issues in a “simpler” IV setting

The method of moments

You have been using the method of moments without you knowing it:
- Sample averages, sample proportions
- Least squares coefficients à la output from a Stata command like reg y x1 x2
- Average treatment effects: difference of two means when applied to RCTs
- Instrumental variables

So why should you know more about the method?
Because it is a powerful black box.
- Many moving components you may not be aware of, even in “simple” settings
- Generalized version of the method of moments
- Affects the way we approach scientific pursuits
- Affects the way we do consulting in industrial and policy settings

Making predictions

One of the four words in the sentence “I SEE THE MOUSE” will be selected at random by a person. That person will tell you how many \(E\)’s there are in the selected word.
Call \(Y\) the number of letters in the selected word and \(X_1\) the number of \(E\)’s in the selected word.
Your task is to predict the number of letters in the selected word (ex ante).

You would be “punished” according to the square of the difference between the actual \(Y\) and your prediction.
Because you do not know for sure which word will be chosen, you have to allow for contingencies.
Therefore, losses depend on which word was chosen.
What would be your prediction rule in order to make your expected loss as small as possible?

One way to answer this question is to propose a prediction rule and hope for the best.
For example, we can say that the rule should have the form \(\beta_0+\beta_1 X_1\).
The task is now to find the unique solution to the following optimization problem: \[\min_{\beta_{0},\beta_{1}}\mathbb{E}\left[\left(Y-\beta_{0}-\beta_{1}X_{1}\right)^{2}\right].\]

An optimal solution \(\left(\beta_{0}^{*},\beta_{1}^{*}\right)\) solves the following first-order conditions: \[\begin{eqnarray*} \mathbb{E}\left(Y-\beta_{0}^{*}-\beta_{1}^{*}X_{1}\right) &=& 0, \\ \mathbb{E}\left(X_{1}\left(Y-\beta_{0}^{*}-\beta_{1}^{*}X_{1}\right)\right) &=& 0. \end{eqnarray*}\]
As a result, we have \[\begin{equation}\beta_{0}^{*} = \mathbb{E}\left(Y\right)-\beta_{1}^{*}\mathbb{E}\left(X_{1}\right),\qquad\beta_{1}^{*}=\dfrac{\mathsf{Cov}\left(X_{1},Y\right)}{\mathsf{Var}\left(X_{1}\right)}.\label{blp-coef}\end{equation}\]

Connections: “line of best fit”

If you know the population distribution of \(\left(X_1,Y\right)\), then you can figure out exact values of \(\left(\beta_{0}^{*},\beta_{1}^{*}\right)\).
Otherwise, you need to estimate \(\left(\beta_{0}^{*},\beta_{1}^{*}\right)\).
If you have IID copies of \(\left(X_1,Y\right)\), then you are back to what you already know!

Connections: method of moments

In effect, you have replaced the \(\mathbb{E}\left(\cdot\right)\) with \(\dfrac{1}{n}\displaystyle\sum_{i=1}^n \left(\cdot\right)\).
You have unwittingly used the method of moments!!
To use the method, you need moment conditions which restrict the behavior of some functions of both the data and the unknown parameters.
Where do you find these? Go back to first-order conditions.

Connections: models

It is definitely possible to think of a generative model to make sense of linear regression.
Define the error from best linear prediction to be \(u=Y-\beta_0^*-\beta_1^*X_1\).
The generative model now has a signal plus noise form \(Y=\beta_0^*+\beta_1^* X_1 +u\).
The moment conditions can be rewritten as: \[\mathbb{E}\left(u\cdot 1\right)=0,\ \ \mathbb{E}\left(u\cdot X_1\right)=0.\]

Returning to I SEE THE MOUSE

If you do the calculations, the best linear predictor is \(\beta_0^*+\beta_1^* X_1=2+X_1\). You might find this result strange, but this is what you get!
I can show you through a simulation. Try 10 IID draws from the joint distribution of \(\left(X_1,Y\right)\):

coefs <- matrix(NA, nrow=10^4, ncol=2)
for(i in 1:10^4)
{
  source <- matrix(c(1,3,3,5,0,2,1,1), ncol=2)
  data <- source[sample(nrow(source),size=10,replace=TRUE),]
  temp <- lm(data[,1]~data[,2])
  coefs[i, ] <- coef(temp)
}

par(mfrow=c(1,2))
hist(coefs[, 1], freq = FALSE)
hist(coefs[, 2], freq = FALSE)

Now try 100 IID draws.

Some of you might have felt weird about the prediction. If you thought of something like:
- If \(X=0\), then I predict \(Y\) to be 1.
- If \(X=2\), then I predict \(Y\) to be 3.
- If \(X=1\), then I predict \(Y\) to be either 3 or 5.
Then, you have travelled unwittingly on a nonparametric path and are, perhaps without knowing, questioning the specification of the linear prediction rule!

Instrumental variables

What if you are in a situation where \(Y=\beta_0+\beta_1X_1+u\) is not a linear regression?
What this means:
- \(\beta_0\) and \(\beta_1\) are not necessarily the coefficients of the best linear predictor of \(Y\) using \(X_1\).
- \(\mathbb{E}\left(u \cdot 1\right) \neq 0\) or \(\mathbb{E}\left(u \cdot X_1\right) \neq 0\) or both
But \(\beta_1\) here is still the causal effect of \(X_1\) on \(Y\).

One approach is to find an external source of variation \(Z\) correlated with \(X_1\) but not correlated with \(u\).
The argument is as follows: \[\mathsf{Cov}\left(Y, Z\right)=\beta_1\mathsf{Cov}\left(X_1, Z\right)+\mathsf{Cov}\left(u, Z\right)\] \[\Rightarrow\beta_1=\dfrac{\mathsf{Cov}\left(Y, Z\right)}{\mathsf{Cov}\left(X_1, Z\right)}.\]

The previous identification argument requires
- \(\mathsf{Cov}\left(X_1, Z\right)\neq 0\)
- \(\mathsf{Cov}\left(u, Z\right)=0\)
Violations of any of the two aforementioned conditions would already signal a failure of instrumental variables to identify \(\beta_1\).

Observe that \[\beta_1=\dfrac{\mathsf{Cov}\left(Y, Z\right)}{\mathsf{Cov}\left(X_1, Z\right)}=\dfrac{\mathsf{Cov}\left(Y, X\right)}{\mathsf{Var}\left(X_1\right)}\left(\dfrac{\mathsf{Cov}\left(X_1, Z\right)}{\mathsf{Var}\left(X_1\right)}\right)^{-1}\]
Therefore, \(\beta_1\) may be interpreted as a ratio of two regression slopes!

Since there are actually two unknown parameters \(\beta_0\) and \(\beta_1\), we can write down two moment conditions to recover both using instrumental variables: \[ \mathbb{E}\left(u \cdot 1 \right)=0,\ \ \mathbb{E}\left(u \cdot Z\right) =0\] \[\Rightarrow \mathbb{E}\left(u \cdot 1 \right)=0,\ \ \mathsf{Cov}\left(u, Z\right) =0\] \[\Rightarrow \begin{eqnarray*} \mathbb{E}\left(\left(Y-\beta_0-\beta_1 X_1\right) \cdot 1 \right)=0 \\ \mathbb{E}\left(\left(Y-\beta_0-\beta_1 X_1\right) \cdot Z\right) =0\end{eqnarray*}\]

Outline

~~Method of moments through examples~~
Understand a popular identification argument used in linear dynamic panel data models
Illustrate through a simulation of a toy model the identification issues
Show you reproducibility issues in a “simpler” IV setting

A panel data setting

Setup:
- Units indexed by \(i\) (but I will suppress this notation)
- \(t=1,2,3\) index the time periods
- \(Y_t\), \(Y_{t-1}\), \(W\): current \(Y\), previous period’s \(Y\), extra information

Task: Predict \(Y_t\) using \(Y_{t-1}\) and \(W\).
This is just going to extend the best (linear) prediction example. Nothing special here!
So, we are just in a linear regression setting: \[Y_t=\beta_0^*+\beta_1^* Y_{t-1}+\beta_2^* W+u_t\]

More complicated task: Predict \(Y_t\) using \(Y_{t-1}\) and \(W\). But \(W\) is now unobserved.
This task is much harder to achieve, but we can settle for predicting changes in \(Y\) instead.
Calculate a differenced equation: \[Y_3-Y_2 = \beta_1^* \left(Y_2-Y_1\right)+\left(u_3-u_2\right)\]
New wrinkle: Not a linear regression anymore!

\(\beta_1^*\) has a predictive interpretation, but the differenced equation effectively becomes a structural equation.
Here, we can use IV! Consider a potential instrumental variable where \(Z=Y_1\).
Mimic an earlier argument: \[\begin{eqnarray*} \mathrm{Cov}\left(\Delta Y_{3},Y_1\right) & = & \mathrm{Cov}\left(\beta_{1}^*\Delta Y_{2}+\Delta u_{3},Y_1\right)\\ \mathrm{Cov}\left(\Delta Y_{3},Y_1\right) & = & \beta_{1}^*\mathrm{Cov}\left(\Delta Y_{2},Y_1\right)+\mathrm{Cov}\left(\Delta u_{3},Y_1\right)\\ \dfrac{\mathrm{Cov}\left(\Delta Y_{3},Y_1\right)}{\mathrm{Cov}\left(\Delta Y_{2},Y_1\right)} & = & \beta_{1}^*. \end{eqnarray*}\]

When will the previous argument be plausible?

Need \(\mathrm{Cov}\left(\Delta Y_{2},Y_1\right)\neq 0\) and \(\mathrm{Cov}\left(\Delta u_{3},Y_1\right)=0\)
Note that \(Y_t\) is nothing but an accummulation of past \(u_t\)’s plus an initial starting point \(Y_1\).
But there are ways to guarantee \(\mathrm{Cov}\left(\Delta u_{3},Y_1\right)=0\):
- Posit that \(u_t\) is a “shock”: Show evidence that \(\mathbb{E}\left(u_t|Y_{t-1},...\right)=0\) for all \(t\).
- Posit that \(u_t\) does not have serial correlation over time.
Guaranteeing \(\mathrm{Cov}\left(\Delta Y_{2},Y_1\right)\neq 0\) is slightly tricky.

What if there is another observed time period?

We can also write down the moment condition used to identify \(\beta_1^*\): \[ \mathbb{E}\left(\left(\Delta Y_3-\beta_1^* \Delta Y_2\right) \cdot Y_1\right) =0\]
What happens if we observe \(Y_4\)? Then you will have extra moment conditions (of course, there is a price!) because you can take other differences.

Again, you observe \(\left(Y_1,Y_2,Y_3,Y_4\right)\).
You have something like: \[ \begin{eqnarray*} \mathbb{E}\left(\left(\Delta Y_3-\beta_1^* \Delta Y_2\right) \cdot Y_1\right) =0\\ \mathbb{E}\left(\left(\Delta Y_4-\beta_1^* \Delta Y_3\right) \cdot Y_1\right) =0 \\ \mathbb{E}\left(\left(\Delta Y_4-\beta_1^* \Delta Y_3\right) \cdot Y_2\right) =0\end{eqnarray*}\]
In the end, you will have more moment conditions than the dimension of the parameter \(\beta_1^*\).
How do you combine them? This is where the generalized method of moments (GMM) comes in.

But a particularly important moment condition requires four observed time periods to work.
- This nonlinear (NL) moment condition is based on Ahn and Schmidt (1995): \[ \mathbb{E}\left( \left(Y_4-\beta_1^* Y_3\right) \cdot \left(\Delta Y_3-\beta_1^* \Delta Y_2\right)\right) =0\]
- Nobody has used NL in empirical work, with only about 1 exception since 1995!
There is another important moment condition which requires an extra assumption called mean stationarity for the moment condition to be valid.

Outline

~~Method of moments through examples~~
~~Understand a popular identification argument used in linear dynamic panel data models~~
Illustrate through a simulation of a toy model the identification issues
Show you reproducibility issues in a “simpler” IV setting

A class of DGPs to show that linear AR(1) panel data model is complicated

I will illustrate using Monte Carlo simulation.
The model you have seen so far is not enough to allow you to generate artificial data.
I need to specify more details in order to generate data from a linear AR(1) panel data model.

Set four time periods to observe \[Y_t=\underbrace{\beta_0^*+ \beta_2^*W}_{c}+\beta_1^* Y_{t-1} +u_t\]
More explicitly: \[\begin{eqnarray*}Y_2 =c+\beta_1^* Y_1+u_2 \\ Y_3 =c+\beta_1^* Y_2+u_2 \\ Y_4 = c+\beta_1^* Y_3+u_4 \end{eqnarray*}\]

IID across \(i\), so I suppressed this notation
Initialization: \[\left(\begin{array}{c} c\\ Y_1 \end{array}\right)\sim N\left(\left(\begin{array}{c} 0 \\ 0 \end{array}\right),\left(\begin{array}{cc}\sigma_{c}^{2} & \mathsf{Cov}\left(c,Y_1\right)\\\mathsf{Cov}\left(c,Y_1\right) & \mathsf{Var}\left(Y_1\right)\end{array}\right)\right)\]
Errors: \[\left(u_{2},u_{3}, u_{4}\right) \sim N\left(0, \mathsf{diag}\left(\sigma^2_2,\sigma^2_3, \sigma^2_4\right)\right)\]

We will look at four data generating processes (DGPs) or “generative models” representing different parameter configurations.
I will focus on NL alone to give you a sense of what is happening.
Then I will make a small comparison between NL and the Arellano-Bond (AB) GMM estimator for \(\beta_1^*\) which exploits the three moment conditions earlier.

DGP1

\(n=50\), \(T=4\), \(\beta_1^*=1\), \(\sigma^2_2=1.5625\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)

DGP1, increase \(n\) tenfold

\(n=500\), \(T=4\), \(\beta_1^*=1\), \(\sigma^2_2=1.5625\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)

DGP2

\(n=50\), \(T=4\), \(\color{blue}{\beta_1^*=0.8}\), \(\sigma^2_2=1.5625\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)

DGP2, increase \(n\) tenfold

\(n=500\), \(T=4\), \(\beta_1^*=0.8\), \(\sigma^2_2=1.5625\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)

DGP3

\(n=50\), \(T=4\), \(\beta_1^*=0.8\), \(\color{blue}{\sigma^2_2=10}\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)

DGP4

\(n=50\), \(T=4\), \(\beta_1^*=0.8\), \(\color{blue}{\sigma^2_2=1}\), \(\color{blue}{\sigma^2_3=3}\), \(\sigma^2_4=1\), \(\color{blue}{\sigma^2_c=1}\), \(\color{blue}{\mathsf{Cov}\left(c,Y_1\right)=2.6}\), \(\color{blue}{\mathsf{Var}\left(Y_1\right)=10}\)

Is this a big deal?

Take \(n=50\):

	DGP1		DGP2		DGP3		DGP4
\(n\)	AB	NL	AB	NL	AB	NL	AB	NL
50	0.75	0.68	0.13	0.49	0.58		0.31	0.78
	0.56	0.16	0.90	0.18	0.43		0.80	0.17
		1.23		1.15		0.79
		0.18		0.23		0.09

Now, \(n=800\):

	DGP1		DGP2		DGP3		DGP4
n	AB	NL	AB	NL	AB	NL	AB	NL
800	1.00	0.80	0.58	0.64	0.79		0.78	0.80
	0.13	0.07	0.57	0.07	0.09		0.16	0.04
		1.08		0.96		0.80
		0.07		0.08		0.02

Outline

~~Method of moments through examples~~
~~Understand a popular identification argument used in linear dynamic panel data models~~
~~Illustrate through a simulation of a toy model the identification issues~~
Show you reproducibility issues in a “simpler” IV setting

The paper being reproduced (BC)

BC’s main specification

\[\begin{eqnarray*}\mathrm{budget share}_{iht} &=& \alpha_i + \beta_i \log \mathrm{expenditure}_{iht} \\ && +\gamma_i\mathrm{budget share}_{ih,t-1} + \sum_k \delta_{ik}\mathrm{controls}_{kht} \\ && +\underbrace{\rho_{ih}+\varepsilon_{iht}}_{u_{iht}}\end{eqnarray*}\]

\(i\) indexes goods in consumption basket, \(h\) indexes households, \(t\) indexes quarter of survey
sign and magnitude of \(\gamma_i\), magnitude of \(\beta_i/(1-\gamma_i)\)

What challenges GMM bring to reproducibility

Many decisions go into a even bigger black box.
- The number and type of moment conditions
- The weighting matrix
- How covariates and/or dummy variables are included and treated
- Iteration and possible bias correction of estimation procedure

BC’s identifying assumptions: formula display

\[\begin{eqnarray*} &\mathbb{E}\left(\Delta\varepsilon_{iht} \log \mathrm{expenditure}_{h,t-k}\right) = 0, \ \ \forall k\geq 1 \\ &\mathbb{E}\left(\Delta\varepsilon_{iht} \log \mathrm{income}_{h,t-k}\right) = 0, \ \ \forall k \\ &\mathbb{E}\left(u_{iht} \Delta\log \mathrm{expenditure}_{h,t-k}\right) = 0, \ \ \forall k\geq 1 \\ &\mathbb{E}\left(u_{iht} \Delta\log \mathrm{income}_{h,t-k}\right) = 0, \ \ \forall k \end{eqnarray*}\]

What makes BC special

Bespoke application of linear dynamic panel data methods
- Moment conditions are based on external instruments instead of “internal” instruments
- Used iterated GMM and applies weak instrument tests specifically for GMM
Spanish panel has a rotating panel structure
Uses microdata as a sanity check for macro models
But we were unable to reproduce the results, as there is uncertainty about the instrument set.

Lower the marginal cost of reproducibility

Perhaps the standard is not to reproduce published results exactly.
Reduce “hard” feelings, lower defensiveness, reduce human resources…
We now show you a more minimal standard which gives the maximum benefit of the doubt to the published results.
Idea is to exploit algebraic relations arising from the uncertainty about the instrument set.

Illustrating with a toy example: ingredients from report

Data from Angrist and Krueger (1991)
Model: \[\log\left(\mathrm{WKLYWGE}\right)=\beta_0+\beta_1 \mathrm{EDUC}+u\]
Outcome \(Y\): log weekly wages
Regressors \(X\): 1 and years of schooling

Reported instruments \(Z\): Age and age squared
Actual instruments used: Age and age squared
Reported focus coefficient \(\psi^\prime \widehat{\theta}_R=\widehat{\beta}_1\): returns to schooling
Reported SE: \(\left(\mathsf{Var}\left(\psi^\prime\widehat{\theta}_R\right)\right)^{1/2}\)

actual <- ivreg(LWKLYWGE ~ EDUC | AGEQ + AGEQSQ, data=ak2029)
# Reported focus coefficient and SE
c(coef(actual)["EDUC"], sqrt(vcov(actual)["EDUC", "EDUC"]))

       EDUC             
0.057426344 0.008433067

Illustrating with a toy example: ingredients from the checker

Split reported instrument set into \(Z=\left[\begin{array}{cc} Z_1 & Z_{2}\end{array}\right]\) with \(Z_2\) is the matrix of excluded instruments
Augment the reported model: \[\log\left(\mathrm{WKLYWGE}\right)=\beta_0+\beta_1 \mathrm{EDUC}+\gamma^\prime Z_2+u\]
Estimate the augmented model by IV with \(Z\) as the instrument set.

Obtain the estimate of the focus parameter \(\psi^\prime \widehat{\theta}=\widetilde{\beta}_1\) and its associated standard error \(\left(\mathsf{Var}\left(\psi^\prime\widehat{\theta}\right)\right)^{1/2}\).
Calculate coefficient contrast \(\psi^\prime\widehat{\theta}-\psi^\prime\widehat{\theta}_R\) and variance contrast \(\mathsf{Var}\left(\psi^\prime \widehat{\theta}\right)-\mathsf{Var}\left(\psi^\prime\widehat{\theta}_R\right)\).

# Unrestricted model
just1 <- ivreg(LWKLYWGE ~ EDUC +AGEQ | AGEQ + AGEQSQ, data=ak2029)
# Coefficient contrast
coef(just1)["EDUC"]-coef(actual)["EDUC"]

     EDUC 
0.4452792

# Variance contrast must be positive
vcov(just1)["EDUC", "EDUC"]-vcov(actual)["EDUC", "EDUC"]

[1] 0.05058544

Obtain the Wald statistic for testing the null \(\gamma=0\).

library(car)
linearHypothesis(just1, "AGEQ", test="Chisq")

Linear hypothesis test

Hypothesis:
AGEQ = 0

Model 1: restricted model
Model 2: LWKLYWGE ~ EDUC + AGEQ | AGEQ + AGEQSQ

  Res.Df Df  Chisq Pr(>Chisq)  
1 247197                       
2 247196  1 3.9508    0.04685 *
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Illustrating with a toy example: bounds

Calculate the bounds:

\[\psi^\prime \widehat{\theta} - \frac{1}{2}\left(\cos 2\alpha \pm 1\right)\left(\mathsf{Var}\left(\psi^\prime \widehat{\theta}\right)-\mathsf{Var}\left(\psi^\prime\widehat{\theta}_R\right)\right)^{1/2} \\ \times \left(\mathrm{Wald \ stat}\right)^{1/2}\]

where \(\cos 2\alpha\) is given by

\[\frac{\psi^\prime\widehat{\theta}-\psi^\prime\widehat{\theta}_R}{\left(\mathsf{Var}\left(\psi^\prime \widehat{\theta}\right)-\mathsf{Var}\left(\psi^\prime\widehat{\theta}_R\right)\right)^{1/2}\left(\mathrm{Wald \ stat}\right)^{1/2}}\]

If reported estimate of the focus parameter is not in the bounds, then something is wrong with the instrument list.
Otherwise, we can tentatively accept the reported results even if we cannot reproduce the results exactly.

# cos 2\alpha must be between -1 and 1
0.445/(sqrt(0.051)*sqrt(26.13))

[1] 0.385483

# Extreme bounds
c(0.503-0.5*(0.445/(sqrt(0.051)*sqrt(3.951))+1)*(sqrt(0.051)*sqrt(3.951)), 0.503-0.5*(0.445/(sqrt(0.051)*sqrt(3.951))-1)*(sqrt(0.051)*sqrt(3.951)))

[1] 0.05605569 0.50494431

Reported free coefficient is 0.057 and it is inside these bounds.

Illustrating with a toy example: another check

Repeat the same exercise but this time the excluded instrument is AGEQSQ.

# cos 2\alpha must be between -1 and 1
0.634/(sqrt(0.198)*sqrt(2.045))

[1] 0.9963456

# Extreme bounds
c(0.692-0.5*(0.634/(sqrt(0.198)*sqrt(2.045))+1)*(sqrt(0.198)*sqrt(2.045)), 0.692-0.5*(0.634/(sqrt(0.198)*sqrt(2.045))-1)*(sqrt(0.198)*sqrt(2.045)))

[1] 0.05683731 0.69316269

Reported free coefficient is 0.057 and it is inside these bounds.

Illustrating with a toy example: something is wrong

Reported instruments used: Age and marital status
Actual instruments used: Age and age squared
Actual instruments used are unknown to us in reality, but we rely on the report.

# Unrestricted model
just1 <- ivreg(LWKLYWGE ~ EDUC +AGEQ | AGEQ + MARRIED, data=ak2029)
# Coefficient contrast
coef(just1)["EDUC"]-coef(actual)["EDUC"]

     EDUC 
0.5829892

# Variance contrast must be positive
vcov(just1)["EDUC", "EDUC"]-vcov(actual)["EDUC", "EDUC"]

[1] 0.0005135196

library(car)
linearHypothesis(just1, "AGEQ", test="Chisq")

Linear hypothesis test

Hypothesis:
AGEQ = 0

Model 1: restricted model
Model 2: LWKLYWGE ~ EDUC + AGEQ | AGEQ + MARRIED

  Res.Df Df  Chisq Pr(>Chisq)    
1 247197                         
2 247196  1 252.42  < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# cos 2\alpha must be between -1 and 1
0.583/(sqrt(0.00051)*sqrt(252.42))

[1] 1.624881

The only ingredients from the publication side

The cleaned data
The list of instruments or moment conditions used
The estimate of the key or focus coefficients
The corresponding standard errors and the details which led to the reported standard errors
Bottom line: all these are in the published report!

Sketch of the idea: notation

\(X\) regressor matrix, some exogenous and some endogenous
\(Z=\left[\begin{array}{cc} Z_1 & Z_{2}\end{array}\right]\) with \(Z_2\) is the matrix of excluded instruments
\(S=\left[\begin{array}{cc} X & Z_{2}\end{array}\right]\)
\(\theta=\left[\begin{array}{cc} \beta^{\prime} & \gamma^{\prime}\end{array}\right]^{\prime}\)
\(\widehat{W}\) weighting matrix
\(\psi^\prime\theta\) is the focus parameter, typically an element of \(\beta\)
Prior linear constraints as \(C\theta=c\), \(C\) has full row rank

Sketch of the idea: setup

Think about IV regression of \(Y\) on \(S=\left[\begin{array}{cc} X & Z_{2}\end{array}\right]\).
Let \(M\) be an arbitrary matrix of full row rank. Find the range of estimates of \(\psi^{\prime}\theta\), when imposing linear constraints of the form \(M\left(C\theta-c\right)=0\).
From a reproducibility standpoint, we are uncertain about the instrument list (hence \(Z\)) disclosed in the published report.
Therefore, the case where \(C=\left(\begin{array}{cc} 0 & I\end{array}\right)\) and \(c=0\) is the relevant configuration. Hence, \(\gamma=0\).

Sketch of the idea: algebra

Unconstrained optimization problem: \[\widehat{\theta}=\arg\min_{\theta}\left(Y-S\theta\right)^{\prime}Z\widehat{W}^{-1}Z^{\prime}\left(Y-S\theta\right)\]
Constrained optimization problem: \[\widehat{\theta}_R=\arg\min_{\theta:\ M\left(C\theta-c\right)=0 }\left(Y-S\theta\right)^{\prime}Z\widehat{W}^{-1}Z^{\prime}\left(Y-S\theta\right)\]

There is an algebraic relationship between \(\psi^\prime \widehat{\theta}\) and \(\psi^\prime \widehat{\theta}_R\).
Sweeping through all possible \(M\) leads to extreme bounds.

Wrap up

Illustrating how to get students up to speed to advanced topics
Linear AR(1) panel data models are more complicated than they appear
Extreme bounds idea comes from Leamer (1983), but extended to IV case, repurposed for reproducibility checks
Many loose ends to tie up and hope they pique your interest