2023-09-11
I want to give an overview of some of my research interests and hope to get you interested in them too.
Three themes:
Education: Get all of us on the same page about the method of moments.
Identification: Can we uniquely learn an unknown parameter?
Reproducibility: Trust, but verify.
You have been using the method of moments without you knowing it:
reg y x1 x2
So why should you know more about the method?
Because it is a powerful black box.
One of the four words in the sentence “I SEE THE MOUSE” will be selected at random by a person. That person will tell you how many \(E\)’s there are in the selected word.
Call \(Y\) the number of letters in the selected word and \(X_1\) the number of \(E\)’s in the selected word.
Your task is to predict the number of letters in the selected word (ex ante).
You would be “punished” according to the square of the difference between the actual \(Y\) and your prediction.
Because you do not know for sure which word will be chosen, you have to allow for contingencies.
Therefore, losses depend on which word was chosen.
What would be your prediction rule in order to make your expected loss as small as possible?
One way to answer this question is to propose a prediction rule and hope for the best.
For example, we can say that the rule should have the form \(\beta_0+\beta_1 X_1\).
The task is now to find the unique solution to the following optimization problem: \[\min_{\beta_{0},\beta_{1}}\mathbb{E}\left[\left(Y-\beta_{0}-\beta_{1}X_{1}\right)^{2}\right].\]
An optimal solution \(\left(\beta_{0}^{*},\beta_{1}^{*}\right)\) solves the following first-order conditions: \[\begin{eqnarray*} \mathbb{E}\left(Y-\beta_{0}^{*}-\beta_{1}^{*}X_{1}\right) &=& 0, \\ \mathbb{E}\left(X_{1}\left(Y-\beta_{0}^{*}-\beta_{1}^{*}X_{1}\right)\right) &=& 0. \end{eqnarray*}\]
As a result, we have \[\begin{equation}\beta_{0}^{*} = \mathbb{E}\left(Y\right)-\beta_{1}^{*}\mathbb{E}\left(X_{1}\right),\qquad\beta_{1}^{*}=\dfrac{\mathsf{Cov}\left(X_{1},Y\right)}{\mathsf{Var}\left(X_{1}\right)}.\label{blp-coef}\end{equation}\]
If you do the calculations, the best linear predictor is \(\beta_0^*+\beta_1^* X_1=2+X_1\). You might find this result strange, but this is what you get!
I can show you through a simulation. Try 10 IID draws from the joint distribution of \(\left(X_1,Y\right)\):
Some of you might have felt weird about the prediction. If you thought of something like:
Then, you have travelled unwittingly on a nonparametric path and are, perhaps without knowing, questioning the specification of the linear prediction rule!
What if you are in a situation where \(Y=\beta_0+\beta_1X_1+u\) is not a linear regression?
What this means:
But \(\beta_1\) here is still the causal effect of \(X_1\) on \(Y\).
The previous identification argument requires
Violations of any of the two aforementioned conditions would already signal a failure of instrumental variables to identify \(\beta_1\).
Observe that \[\beta_1=\dfrac{\mathsf{Cov}\left(Y, Z\right)}{\mathsf{Cov}\left(X_1, Z\right)}=\dfrac{\mathsf{Cov}\left(Y, X\right)}{\mathsf{Var}\left(X_1\right)}\left(\dfrac{\mathsf{Cov}\left(X_1, Z\right)}{\mathsf{Var}\left(X_1\right)}\right)^{-1}\]
Therefore, \(\beta_1\) may be interpreted as a ratio of two regression slopes!
Setup:
Need \(\mathrm{Cov}\left(\Delta Y_{2},Y_1\right)\neq 0\) and \(\mathrm{Cov}\left(\Delta u_{3},Y_1\right)=0\)
Note that \(Y_t\) is nothing but an accummulation of past \(u_t\)’s plus an initial starting point \(Y_1\).
But there are ways to guarantee \(\mathrm{Cov}\left(\Delta u_{3},Y_1\right)=0\):
Guaranteeing \(\mathrm{Cov}\left(\Delta Y_{2},Y_1\right)\neq 0\) is slightly tricky.
We can also write down the moment condition used to identify \(\beta_1^*\): \[ \mathbb{E}\left(\left(\Delta Y_3-\beta_1^* \Delta Y_2\right) \cdot Y_1\right) =0\]
What happens if we observe \(Y_4\)? Then you will have extra moment conditions (of course, there is a price!) because you can take other differences.
Again, you observe \(\left(Y_1,Y_2,Y_3,Y_4\right)\).
You have something like: \[ \begin{eqnarray*} \mathbb{E}\left(\left(\Delta Y_3-\beta_1^* \Delta Y_2\right) \cdot Y_1\right) =0\\ \mathbb{E}\left(\left(\Delta Y_4-\beta_1^* \Delta Y_3\right) \cdot Y_1\right) =0 \\ \mathbb{E}\left(\left(\Delta Y_4-\beta_1^* \Delta Y_3\right) \cdot Y_2\right) =0\end{eqnarray*}\]
In the end, you will have more moment conditions than the dimension of the parameter \(\beta_1^*\).
How do you combine them? This is where the generalized method of moments (GMM) comes in.
But a particularly important moment condition requires four observed time periods to work.
There is another important moment condition which requires an extra assumption called mean stationarity for the moment condition to be valid.
I will illustrate using Monte Carlo simulation.
The model you have seen so far is not enough to allow you to generate artificial data.
I need to specify more details in order to generate data from a linear AR(1) panel data model.
\(n=50\), \(T=4\), \(\beta_1^*=1\), \(\sigma^2_2=1.5625\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)
\(n=500\), \(T=4\), \(\beta_1^*=1\), \(\sigma^2_2=1.5625\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)
\(n=50\), \(T=4\), \(\color{blue}{\beta_1^*=0.8}\), \(\sigma^2_2=1.5625\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)
\(n=500\), \(T=4\), \(\beta_1^*=0.8\), \(\sigma^2_2=1.5625\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)
\(n=50\), \(T=4\), \(\beta_1^*=0.8\), \(\color{blue}{\sigma^2_2=10}\), \(\sigma^2_3=1\), \(\sigma^2_4=1\), \(\sigma^2_c=0.29\), \(\mathsf{Cov}\left(c,Y_1\right)=0.2\), \(\mathsf{Var}\left(Y_1\right)=1\)
\(n=50\), \(T=4\), \(\beta_1^*=0.8\), \(\color{blue}{\sigma^2_2=1}\), \(\color{blue}{\sigma^2_3=3}\), \(\sigma^2_4=1\), \(\color{blue}{\sigma^2_c=1}\), \(\color{blue}{\mathsf{Cov}\left(c,Y_1\right)=2.6}\), \(\color{blue}{\mathsf{Var}\left(Y_1\right)=10}\)
Take \(n=50\):
DGP1 | DGP2 | DGP3 | DGP4 | |||||
---|---|---|---|---|---|---|---|---|
\(n\) | AB | NL | AB | NL | AB | NL | AB | NL |
50 | 0.75 | 0.68 | 0.13 | 0.49 | 0.58 | 0.31 | 0.78 | |
0.56 | 0.16 | 0.90 | 0.18 | 0.43 | 0.80 | 0.17 | ||
1.23 | 1.15 | 0.79 | ||||||
0.18 | 0.23 | 0.09 |
Now, \(n=800\):
DGP1 | DGP2 | DGP3 | DGP4 | |||||
---|---|---|---|---|---|---|---|---|
n | AB | NL | AB | NL | AB | NL | AB | NL |
800 | 1.00 | 0.80 | 0.58 | 0.64 | 0.79 | 0.78 | 0.80 | |
0.13 | 0.07 | 0.57 | 0.07 | 0.09 | 0.16 | 0.04 | ||
1.08 | 0.96 | 0.80 | ||||||
0.07 | 0.08 | 0.02 |
\[\begin{eqnarray*}\mathrm{budget share}_{iht} &=& \alpha_i + \beta_i \log \mathrm{expenditure}_{iht} \\ && +\gamma_i\mathrm{budget share}_{ih,t-1} + \sum_k \delta_{ik}\mathrm{controls}_{kht} \\ && +\underbrace{\rho_{ih}+\varepsilon_{iht}}_{u_{iht}}\end{eqnarray*}\]
Many decisions go into a even bigger black box.
\[\begin{eqnarray*} &\mathbb{E}\left(\Delta\varepsilon_{iht} \log \mathrm{expenditure}_{h,t-k}\right) = 0, \ \ \forall k\geq 1 \\ &\mathbb{E}\left(\Delta\varepsilon_{iht} \log \mathrm{income}_{h,t-k}\right) = 0, \ \ \forall k \\ &\mathbb{E}\left(u_{iht} \Delta\log \mathrm{expenditure}_{h,t-k}\right) = 0, \ \ \forall k\geq 1 \\ &\mathbb{E}\left(u_{iht} \Delta\log \mathrm{income}_{h,t-k}\right) = 0, \ \ \forall k \end{eqnarray*}\]
Bespoke application of linear dynamic panel data methods
Spanish panel has a rotating panel structure
Uses microdata as a sanity check for macro models
But we were unable to reproduce the results, as there is uncertainty about the instrument set.
Linear hypothesis test
Hypothesis:
AGEQ = 0
Model 1: restricted model
Model 2: LWKLYWGE ~ EDUC + AGEQ | AGEQ + AGEQSQ
Res.Df Df Chisq Pr(>Chisq)
1 247197
2 247196 1 3.9508 0.04685 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Calculate the bounds:
\[\psi^\prime \widehat{\theta} - \frac{1}{2}\left(\cos 2\alpha \pm 1\right)\left(\mathsf{Var}\left(\psi^\prime \widehat{\theta}\right)-\mathsf{Var}\left(\psi^\prime\widehat{\theta}_R\right)\right)^{1/2} \\ \times \left(\mathrm{Wald \ stat}\right)^{1/2}\]
where \(\cos 2\alpha\) is given by
\[\frac{\psi^\prime\widehat{\theta}-\psi^\prime\widehat{\theta}_R}{\left(\mathsf{Var}\left(\psi^\prime \widehat{\theta}\right)-\mathsf{Var}\left(\psi^\prime\widehat{\theta}_R\right)\right)^{1/2}\left(\mathrm{Wald \ stat}\right)^{1/2}}\]
[1] 0.385483
# Extreme bounds
c(0.503-0.5*(0.445/(sqrt(0.051)*sqrt(3.951))+1)*(sqrt(0.051)*sqrt(3.951)), 0.503-0.5*(0.445/(sqrt(0.051)*sqrt(3.951))-1)*(sqrt(0.051)*sqrt(3.951)))
[1] 0.05605569 0.50494431
[1] 0.9963456
# Extreme bounds
c(0.692-0.5*(0.634/(sqrt(0.198)*sqrt(2.045))+1)*(sqrt(0.198)*sqrt(2.045)), 0.692-0.5*(0.634/(sqrt(0.198)*sqrt(2.045))-1)*(sqrt(0.198)*sqrt(2.045)))
[1] 0.05683731 0.69316269
Linear hypothesis test
Hypothesis:
AGEQ = 0
Model 1: restricted model
Model 2: LWKLYWGE ~ EDUC + AGEQ | AGEQ + MARRIED
Res.Df Df Chisq Pr(>Chisq)
1 247197
2 247196 1 252.42 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
[1] 1.624881