\documentclass[letterpaper,oneside,fleqn,12pt]{article} \usepackage[letterpaper,centering,vscale=0.76]{geometry} \usepackage{setspace,natbib, graphicx, amsmath, amsthm, pa, endnotes} \usepackage[tablesfirst, nolists]{endfloat} \begin{document} \newtheorem{thm}{Theorem} \begin{center} {\large On the Fixed-Effects Vector Decomposition} \\ \bigskip \bigskip Trevor Breusch\footnote[1]{The authors are respectively Professor, Associate Professor, Ph.D. candidate, and Professor at the Crawford School of Economics and Government of The Australian National University.}\footnote[2]{Corresponding author: Trevor.Breusch@anu.edu.au}\\ Michael B. Ward\\ Hoa Nguyen\\ Tom Kompas\\ \bigskip \bigskip Crawford School of Economics and Government\\ The Australian National University\\ Canberra, ACT 0200\\ Australia \end{center} \thispagestyle{empty} \setcounter{page}{0} \newpage \begin{center} {\large On the Fixed-Effects Vector Decomposition} \end{center} \medskip \textsc{Abstract:} This paper analyses the properties of the fixed-effects vector decomposition estimator, an emerging and popular technique for estimating time-invariant variables in panel data models with group effects. This estimator was initially motivated on heuristic grounds, and advocated on the strength of favorable Monte Carlo results, but with no formal analysis. We show that the three-stage procedure of this decomposition is equivalent to a standard instrumental variables approach, for a specific set of instruments. The instrumental variables representation facilitates the present formal analysis which finds: (1) The estimator reproduces exactly classical fixed-effects estimates for time-varying variables. (2) The standard errors recommended for this estimator are too small for both time-varying and time-invariant variables. (3) The estimator is inconsistent when the time-invariant variables are endogenous. (4) The reported sampling properties in the original Monte Carlo evidence do not account for presence of a group effect. (5) The decomposition estimator has higher risk than existing shrinkage approaches, unless the endogeneity problem is known to be small or no relevant instruments exist. \medskip \doublespace %\onehalfspacing \section{Introduction} We analyse the properties of a recently introduced methodology for panel data, known as fixed-effects vector decomposition (\textsc{fevd}), which \citet{plumper} % Pl\"umper and Troeger (2007a) developed to produce improved estimates in cases where traditional panel data techniques have difficulty. Researchers in many fields seek to exploit the advantages of such panel data. Having repeated observations across time for each group in a panel allows one, under suitable assumptions, to control for unobserved heterogeneity across the groups which might otherwise bias the estimates. \citet{mundlak1978pooling} % Mundlak (1978) demonstrated that a generalized least squares approach to unobserved group effects, which treats them as random and potentially correlated with the regressor, gives rise to the traditional fixed-effects (\textsc{fe}) estimator. However, \textsc{fe} is a blunt instrument for controlling for correlation between observed and unobserved characteristics because it ignores any systematic average differences between groups. Thus any potential explanatory factors that are constant longitudinally (time-invariant) will be ignored by the \textsc{fe} estimator. Likewise, any explanatory variables that have little within variation (that is, slowly-changing over the longitudinal dimension) will have little explanatory power, and will result in imprecise coefficient estimates that have large standard errors. \citet{hausman1981panel} % Hausman and Taylor (1981) had previously shown that a better estimator than \textsc{fe} is available if some of the explanatory variables are known to be uncorrelated with the unobserved group effect, thus described as \emph{exogenous} explanatory variables. The Hausman-Taylor (\textsc{ht}) estimator is an instrumental variables (\textsc{iv}) procedure that combines aspects of both fixed-effects and random-effects estimation. Given a sufficient number of exogenous regressors, the \textsc{ht} procedure allows time-invariant variables to be kept in the model. It also provides more efficient estimates than \textsc{fe} for the coefficients of the exogenous time-varying variables. The downside of the \textsc{ht} estimator resides in specifying the exogeneity status for each of the time-varying and time-invariant variables in the model. In many practical applications such detailed specification is onerous. Pl\"umper and Troeger introduced \textsc{fevd} as an alternative that seemed to be superior to \textsc{ht} because it requires fewer explicit assumptions yet seemed to always have more desirable sampling properties. Like the \textsc{fe} estimator, and unlike \textsc{ht}, \textsc{fevd} does not require specifying the exogeneity status of the explanatory variables. Like the \textsc{ht} estimator, and unlike \textsc{fe}, the \textsc{fevd} procedure gives coefficient estimates for time-invariant (and slowly-changing) variables as well as the time-varying variables. Pl\"umper and Troeger motivated the \textsc{fevd} procedure on heuristic grounds, and advocated it on the strength of favorable results in a Monte Carlo simulation study. In particular, the simulation indicated that \textsc{fevd} has superior sampling properties for time-invariant explanatory variables. Although the \textsc{fevd} procedure comes out of the empirical political science literature, it is rapidly finding application in many other areas including social research and economics. At last count there were well over 150 references in Google Scholar to this emerging estimation methodology. Several empirical studies report standard errors for \textsc{fevd}-based estimates that are strikingly smaller than estimates based on traditional methods. There is, however, little formal analysis of the \textsc{fevd} procedure in this literature. The present paper is a remedy to the lack of formal analysis. We demonstrate that the \textsc{fevd} coefficient estimator can be equivalently written as an \textsc{iv} estimator, which serves to demystify the nature of the three-stage \textsc{fevd} procedure and its relationship with other estimators. As one immediate benefit, the \textsc{iv} representation allows us to draw on a standard toolkit of results. First, using the \textsc{iv} variance formula, we show that the \textsc{fevd} standard errors for coefficients of both the time-varying and time-invariant variables are uniformly \emph{too small.} In the case of the latter variables, the discrepancy in the \textsc{fevd} standard errors is \emph{unbounded}, and grows with the length of the panel and with the variance of the group effects. Second, using the moment-condition representation, we prove that the coefficients of the time-varying variables in \textsc{fevd} are exactly the same as in \textsc{fe}. This result is apparent in many of the practical studies which list \textsc{fe} estimates beside \textsc{fevd} estimates, but it is hardly mentioned in the existing analytical material. An immediate implication is that \textsc{fevd} estimates, like \textsc{fe}, are \emph{inefficient} if any of the time-varying variables are exogenous. Third, \textsc{fevd} usually produces lower variance estimates of time-varying coefficients than \textsc{ht} in small samples. However, it does so by including invalid instruments that produce \emph{inconsistent} estimates. So, even with massive quantities of data those \textsc{fevd} estimates will deviate from the truth. Further developments can also be made to the estimator, to exploit the ideas in \textsc{fevd} while avoiding the problems of that procedure. The advantage of \textsc{fevd} will be found in smaller samples where the large sample concept of consistency does not dominate. The Monte Carlo simulation studies by \citet{plumper} % Pl\"umper and Troeger (2007a) and \citet{mitze} % Mitze (2009) show a trade-off between bias and efficiency in which \textsc{fevd} often appears to be better than either \textsc{fe} or \textsc{ht} under quadratic loss. We present Monte Carlo evidence that a standard shrinkage approach combines the desirable small sample properties of \textsc{fevd} with the desirable large sample properties of the \textsc{ht} estimator, so that it has superior risk to both \textsc{fevd} and \textsc{ht} over a wide region of the parameter space. In the next section we introduce the notation to be used and describe the three-stage \textsc{fevd} estimator. We summarize the connections between these stages in a theorem, which we prove by comparing the various moment conditions. This approach demonstrates naturally the description of the \textsc{fevd} estimator as \textsc{iv}. Section 3 compares the correct IV variance formula with the formula implicit in the standard errors of the three-stage \textsc{fevd} approach. The main results are summarized in two further theorems. We also provide an empirical example to illustrate these results and some from the previous section. In Section 4 we examine the relationships between estimators in more detail, allowing the possibility of a trade-off between bias and variance to produce an estimator with lower mean-squared error. Section 5 reports some Monte Carlo evidence in the spirit of Pl\"umper and Troeger that demonstrates the superiority of a standard shrinkage estimator. Section 6 has some overall conclusions. \section{The Model} The data are ordered so that there are $N$ groups each of $T$ observations. The model for a single scalar observation is \begin{equation} \label{model} y_{it} = X_{it} \beta + Z_i \gamma + u_i + e_{it}\, \textrm{ for } i=1,\dots,N \textrm{ and } t=1,\dots,T. \end{equation} Here, $X_{it}$ is a $k \times 1$ vector of time-varying explanatory variables, and $Z_{i}$ is a $p \times 1$ vector of time-invariant explanatory variables.\endnote{The setup here describes a balanced panel with observations on every $t$ for each $i$, but the ideas extend to unbalanced panels with more complicated notation. A constant can be represented in this model by including a vector of ones as part of the time-invariant elements, $Z$.} The parameters $\beta, \gamma$, the group effect $u_i$, and the error term $e_{it}$ are all unobserved. Some elements of $X_{it}$ or $Z_{i}$ are correlated with the group effect $u_i$, in which case we call those variables \emph{endogenous}. Otherwise we call those variables \emph{exogenous}. With endogenous explanatory variables standard linear regression techniques may produce estimates of the unknown parameters which are inconsistent in the sense that they do not converge to the true parameter values as the sample size grows large. One standard approach to this endogeneity problem is to use the instrumental variables technique developed by Hausman and Taylor. \subsection*{Notation} The presentation is considerably simplified by introducing some projection matrix notation. Let \begin{equation} D = I_N \otimes \iota_T ,\end{equation} where $I_N$ is an $N \times N$ identity matrix and $\iota_T$ is a $T \times 1$ vector of ones. That is, $D$ is a matrix of dummy variables indicating group membership. For any matrix $M$, we use $P_M = M (M'M)^{-1} M'$ to indicate the projection matrix for $M$, and we use $Q_M = I - P_M$ to indicate the projection matrix for the nullspace of $M$. For example, \begin{equation} P_D = D (D'D)^{-1} D' = \frac{1}{T}(I_N \otimes \iota_T \iota_T') \end{equation} is the matrix which projects a vector onto $D$. This particular projection produces a vector of group means. That is, $P_D y = \{\bar{y_i}\} \otimes \iota_T,$ where $\bar{y_i} = \frac{1}{T} \sum_{t=1}^T y_{it}.$ Also, \begin{equation} Q_D = I_{NT} - P_D \end{equation} is the matrix which produces the within-group variation. That is, $Q_D y = \{y_{it} - \bar{y_i}\}$ is the $NT \times 1$ vector of within-group differences. \subsection*{The \textsc{FEVD} Estimator} The \textsc{fevd} proceeds in three stages, which we detail below. To sharpen the analysis, we assume that the elements of $Z$ are exactly time-invariant (not just slowly-changing), so that $P_D Z = Z$. An explicit analysis of the slowly-changing case yields qualitatively similar insights. \subsubsection*{Stage 1} Perform a fixed effects regression of $y$ on the time-varying $X$. The moment condition corresponding to a fixed effects regression is \begin{equation} \label{m1} (y-X b)'Q_D X = 0.\end{equation} The unexplained component after this first step is $y-X b$. The group-average of the unexplained component is $P_D (y-X b)$. \subsubsection*{Stage 2} Regress the group-average of the unexplained component from the first step on the time-invariant $Z$. The moment condition is $\bigl( P_D (y-X b) - Z g\bigr)' Z = 0$. Using the fact that $P_D Z = Z$, this moment condition can be equivalently written as \begin{equation} \label{m2} (y-X b - Z g)' Z = 0. \end{equation} The group-average residuals from this regression are \begin{equation} \label{hdef} h = P_D\, (y - X b - Z g). \end{equation} \subsubsection*{Stage 3} Regress $y$ on $X$, $Z$, and $h$. The coefficients from this step are the final \textsc{fevd} estimates. The moment conditions are \begin{equation} \label{m3} (y - X \beta - Z \gamma - h \delta)'[X,Z,h] = 0.\end{equation} \begin{thm} \label{solution} The solution for $\beta$ is $b$ from Stage 1; the solution for $\gamma$ is $g$ from Stage 2; and the solution for $\delta$ is one. \end{thm} \begin{proof} We need to verify that the moment conditions (\ref{m3}) are satisfied at $\beta=b$, $\gamma=g$, and $\delta=1.$ This requires that \begin{equation} (y - X b - Z g - h)'[X,Z,h] = 0.\end{equation} Substituting in the definition of $h$ from (\ref{hdef}) and gathering terms, this simplifies to \begin{equation} ( y - X b - Z g)' Q_D [X,Z,h] = 0.\end{equation} Using the fact that $Q_D Z = 0$, this further simplifies to \begin{equation} \label{eq11} (y - X b)' Q_D [X,Z,h] = 0.\end{equation} The first set of equalities in (\ref{eq11}) must be satisfied, since it is identical to the moment condition (\ref{m1}) that defines $b$. The second set of equalities must be satisfied since $Q_D Z = 0$. Similarly, the third set of equalities must be satisfied since $Q_D h = 0$, which follows from the definition of $h$ in (\ref{hdef}) and the fact that $Q_D P_D = 0.$ \end{proof} \subsection*{Instrumental Variables Representation} Using Theorem \ref{solution} we can show that the \textsc{fevd} estimator can also be expressed as an \textsc{iv} estimator for a particular set of instruments. The major benefit of using the \textsc{iv} representation is that one can draw on a standard toolkit of results. Theorem \ref{solution} shows that the \textsc{fevd} estimates of $\beta$ are identical to the standard fixed effects estimator $b$ from Stage~1. This estimator is defined by the moment condition (\ref{m1}). Theorem \ref{solution} also shows that the \textsc{fevd} estimates of $\gamma$ are equivalent to the estimator of $g$ from Stage~2. This estimator is defined by the moment condition (\ref{m2}). Combining both moment conditions, and using the fact that $Q_D Z = 0$, the full moment conditions for the \textsc{fevd} estimator are \begin{equation} \label{m4} (y-X \beta - Z \gamma)' [Q_D X, Z] = 0. \end{equation} In other words, the \textsc{fevd} estimator is equivalent to an \textsc{iv} estimator using the instruments $Q_D X$ and $Z$. \section{Variance Formulae} Using standard results for \textsc{iv} estimators, the asymptotically correct sampling variance of the \textsc{fevd} procedure is \begin{equation} \label{coviv} V_{\textsc{iv}}(\beta,\gamma) = (H'W)^{-1} H' \Omega H (W'H)^{-1} \textrm{ for } H = [Q_D X, Z] \textrm{ and } W = [X,Z]. \end{equation} Here, $H$ is the matrix of instruments and $W$ is the matrix of explanatory variables. $\Omega$ is the covariance of the residual, $u_i + e_{it},$ which can be expressed as \begin{equation} \Omega = {\sigma^2_{e}} I_{NT} + {\sigma^2_{u}} I_N \otimes \iota_T \iota_T' = {\sigma^2_{e}} Q_D + ({\sigma^2_{e}} + T {\sigma^2_{u}}) P_D. \end{equation} Using straightforward algebraic manipulation of (\ref{coviv}), we will later separately expand out the variances of $\beta$ and of $\gamma$ for more detailed inspection. We now compare the correct \textsc{iv} variance formula with the \textsc{fevd} variance formula. Pl\"umper and Troeger state that the sampling variance of the \textsc{fevd} estimator can be obtained by applying the standard \textsc{ols} formula to the Stage 3 regression. Therefore, \begin{equation} \label{fevd} V_{\textsc{fevd}}(\beta,\gamma,\delta) = s^2 \bigl([X, Z, h]'[X, Z, h]\bigr)^{-1} = s^2 \begin{pmatrix} X'X & X'Z & X'h \\ Z'X & Z'Z & Z'h \\ h'X & h'Z & h'h \end{pmatrix}^{-1}. \end{equation} \ Here, $s^2 = \lVert y - X\beta - Z\gamma -h \rVert^2 / \textrm{\emph{dof}},$ where $\textrm{\emph{dof}}$ is the degrees of freedom. By application of (\ref{hdef}), the expression for $s^2$ can be simplified to \begin{equation} \label{s2} s^2 = \lVert Q_D (y - X \beta) \rVert^2/\textrm{\emph{dof}}, \end{equation} which we note is the standard textbook \textsc{fe} estimator for ${\sigma^2_{e}}$ when $\textrm{\emph{dof}} = N T- N - k$ \citep[see \emph{e.g.}][p. 271]{wooltext}.\endnote{ % (see \emph{e.g.} Wooldridge 2002, p. 271). The usual \textsc{ols} formula for the standard errors from the Stage 3 regression would calculate the scale term using $\textrm{\emph{dof}} = NT - k - p -1$, where $p$ is the number of $Z$ variables including the constant and the final minus one allows for the additional regressor $h$. This divisor would clearly produce an inconsistent estimator of ${\sigma^2_{e}}$ for large $N$ and small $T$. \citet[p. 129]{plumper} % Pl\"umper and Troeger (2007a, p. 129) mention briefly an adjustment to the degrees of freedom and, although they do not give an explicit formula, their software employs the divisor $\textrm{\emph{dof}} = N T - N - k - p + 1$ \citep{software}. % (Pl\"umper and Troeger, 2007b) This adjustment would yield a consistent estimate of ${\sigma^2_{e}}$, but it is nonstandard and slightly biased. To sharpen the subsequent analysis, we use the standard unbiased estimator of ${\sigma^2_{e}}$, in which $\textrm{\emph{dof}} = NT - N - k$.} Now consider the variance of $\beta$. The \textsc{fevd} variance formula for $\beta$ is the top-left block of the overall \textsc{fevd} variance formula in (\ref{fevd}); using the partitioned-inverse formula this submatrix can be written as \begin{equation} \label{fevdb} V_{\textsc{fevd}}(\beta) = s^2(X' Q_{[Z,h]} X)^{-1}. \end{equation} By expanding out (\ref{coviv}), the correct variance for $\beta$ can be written as \begin{equation} \label{covbeta} V_{\textsc{iv}}(\beta) = {\sigma^2_{e}} (X'Q_DX)^{-1}. \end{equation} Note that this is exactly the textbook fixed effects variance formula. Now we note from (\ref{s2}) that $s^2$ is a consistent estimator of ${\sigma^2_{e}}$. However, the matrices in the \textsc{fevd} formula (\ref{fevdb}) and the correct formula (\ref{covbeta}) differ. The \textsc{fevd} variance formula for $\beta$ must therefore be incorrect, and we can show the direction of the error. \begin{thm} \label{smallb} The \textsc{fevd} variance formula for coefficients on time-varying variables is too small. \end{thm} \begin{proof} Now $P_D [Z,h] = [Z,h]$, so that $P_D P_{[Z,h]} = P_{[Z,h]}$. Such a relationship between projection matrices implies that $P_D - P_{[Z,h]}$ is positive semi-definite (in matrix shorthand, $P_D \ge P_{[Z,h]}$). So, $Q_D \le Q_{[Z,h]}$. That $(X' Q_{[Z,h]} X)^{-1} \le (X'Q_D X)^{-1}$ follows immediately. This inequality will almost always be strict because the $p+1$ variables $[Z,h]$ cannot span the whole of the $N$-dimensional space of group operator $D$, and the $X$'s have arbitrary within-group variation. \end{proof} The \textsc{fevd} formula for the variance of $\beta$ is \emph{biased} in that it systematically \emph{understates} the true sampling variance of the estimator. The essential inequality does not disappear as $N$ gets larger, so the formula is also \emph{inconsistent}. The usual reported standard errors will be \emph{too small}. Now, consider the variance of $\gamma$. The \textsc{fevd} variance formula for $\gamma$ is the middle block of the overall \textsc{fevd} variance formula in (\ref{fevd}). Using an alternative representation of the partitioned inverse, this submatrix can be written as \begin{equation} \label{fevdg0} V_{\textsc{fevd}}(\gamma) = s^2 (Z'Z)^{-1}\Bigl(I + Z'[X,h]\bigl([X,h] Q_Z [X,h]\bigr)^{-1}[X,h]' Z (Z'Z)^{-1}\Bigr). \end{equation} Note that $Z'h = 0$, so that in the partitioned central matrix of the second term only the submatrix corresponding to $X$ will be selected. Then, we have the simplification of (\ref{fevdg0}), \begin{equation} \label{fevdg} V_{\textsc{fevd}}(\gamma) = s^2 (Z'Z)^{-1} + s^2 (Z'Z)^{-1} Z'X\bigl(X' Q_Z X\bigr)^{-1} X'Z (Z'Z)^{-1}. \end{equation} In contrast, by expanding out (\ref{coviv}), the correct variance for $\gamma$ can be written as \begin{equation} \label{covgamma} V_{\textsc{iv}}(\gamma) = {\sigma^2_{e}} (Z'Z)^{-1} + T {\sigma^2_{u}} (Z'Z)^{-1} + {\sigma^2_{e}} (Z'Z)^{-1} Z'X (X'Q_D X)^{-1} X'Z (Z'Z)^{-1}. \end{equation} Again, $s^2$ is a consistent estimator of ${\sigma^2_{e}}$, so the first term in (\ref{fevdg}) and in (\ref{covgamma}) is essentially the same. However, the expressions are otherwise different, so the \textsc{fevd} variance formula for $\gamma$ must also be incorrect. Again, we can show the direction of the error. \begin{thm} \label{smallg} The \textsc{fevd} variance formula for time-invariant variables is too small. \end{thm} \begin{proof} As shown in the proof of Theorem \ref{smallb}, $(X'Q_D X)^{-1} \ge (X'Q_{[Z,h]} X)^{-1}$ with almost certain strict inequality, so the last term in the \textsc{fevd} variance formula (\ref{fevdg}) understates the corresponding term in the correct variance expression (\ref{covgamma}). The only exception would be the unlikely event that $X$ and $Z$ are exactly orthogonal, causing those terms to vanish. But even then, the \textsc{fevd} variance formula will be an understatement because it omits the term $T {\sigma^2_{u}} (Z'Z)^{-1}$, which must be positive definite whenever there are random group effects. \end{proof} In general the \textsc{fevd} variance formula for $\gamma$ is systematically \emph{biased} and \emph{inconsistent}. The usual reported standard errors will be \emph{too small}. The extent of the downward bias is unbounded. The correct variance expression includes a term that is directly proportional to the number of observations per group $T$ and to the variance of the group effects ${\sigma^2_{u}}$. In contrast the \textsc{fevd} variance formula, and hence the standard errors, are unaffected by these parameters. By increasing either or both of these parameters, with everything else held constant, the extent of the downward bias in the \textsc{fevd} variance formula becomes arbitrarily large. \subsection*{Empirical Example} Reported results from the applied empirical literature align with these theoretical results. Table 1 presents our replication of Table 1 in \citet{belke}, %Belke and Spies (2008), and shows results for pooled \textsc{ols} (\textsc{pols}), \textsc{fe}, \textsc{fevd}, and \textsc{ht}. We add a column for the results from Stage 2 of \textsc{fevd} and a row for the coefficient $\delta$ that arises in Stage 3 to further illustrate our theoretical results.\endnote{ We are grateful to those authors for supplying their data. We found some occasional small differences in reported standard errors, probably due to use of \textquotedblleft robust\textquotedblright\ standard errors in the published results. The full table of our replication results is provided in the appendix.} The first six variables only are shown for brevity. They include logged nominal GDP of the importing country \textit{lngdim}, logged nominal GDP of the exporting country \textit{lngdpex} and logged bilateral real exchange rate \textit{lrer}, as time-varying variables. The time-invariant variables shown are logged great circle distance in km \textit{ldist}, border length in km \textit{border}, and dummy for one or both countries being landlocked \textit{ll}. Results are estimated from a panel sample of $N = 420$ trading pairs for $T = 14$ years giving $5262$ observations. The coefficients for the first three (time-varying) variables are the same for \textsc{fe} and \textsc{fevd}, as shown by Theorem 1. To illustrate the second aspect of Theorem 1, the coefficients for the next three (time-invariant) variables are exactly equal in Stage 2 and FEVD, and the solution for $\delta $ is one. Theorem 2 is illustrated by the way the first three \textsc{fevd} reported standard errors are systematically smaller than the \textsc{fe} ones, in an order of 0.01, 0.01 and 0.01, against 0.11, 0.07 and 0.06, even though the coefficients themselves are identical and the standard error formula for \textsc{fe} is well established as being correct under the assumptions of the model. \begin{table}[ht!] \caption{Partial replication of Belke and Spies (2008).} \begin{tabular*}{\textwidth}{cccccc} \hline & (1) & (2) & (3) & (4) & (5) \\ & POLS & FE & Stage 2 & FEVD & HT \\ \hline \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} \\ \multicolumn{1}{l}{lngdpim} & \multicolumn{1}{l}{0.88***} & \multicolumn{1}{l}{0.68***} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{ 0.68***} & \multicolumn{1}{l}{0.68***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{(0.04)} & \multicolumn{1}{l}{(0.11) } & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{(0.01)} & \multicolumn{1}{l}{ (0.11)} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} \\ \multicolumn{1}{l}{lngdpex} & \multicolumn{1}{l}{0.89***} & \multicolumn{1}{l}{0.71***} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{ 0.71***} & \multicolumn{1}{l}{0.71***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{(0.03)} & \multicolumn{1}{l}{(0.07) } & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{(0.01)} & \multicolumn{1}{l}{ (0.07)} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} \\ \multicolumn{1}{l}{lrer} & \multicolumn{1}{l}{-0.01} & \multicolumn{1}{l}{ 0.13**} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{0.13***} & \multicolumn{1}{l}{0.13**} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{(0.01)} & \multicolumn{1}{l}{(0.06) } & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{(0.01)} & \multicolumn{1}{l}{ (0.06)} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} \\ \multicolumn{1}{l}{ldist} & \multicolumn{1}{l}{-1.27***} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{-1.41***} & \multicolumn{1}{l}{ -1.41***} & \multicolumn{1}{l}{-1.75***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{(0.11)} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{(0.04)} & \multicolumn{1}{l}{(0.00)} & \multicolumn{1}{l}{ (0.16)} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} \\ \multicolumn{1}{l}{border} & \multicolumn{1}{l}{-0.00} & \multicolumn{1}{l}{. } & \multicolumn{1}{l}{0.00**} & \multicolumn{1}{l}{0.00***} & \multicolumn{1}{l}{-0.00} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{(0.00)} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{(0.00)} & \multicolumn{1}{l}{(0.00)} & \multicolumn{1}{l}{ (0.00)} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} & \multicolumn{1}{l}{} \\ \multicolumn{1}{l}{ll} & \multicolumn{1}{l}{-0.16*} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{-0.23***} & \multicolumn{1}{l}{-0.23***} & \multicolumn{1}{l}{-0.16} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{(0.10)} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{(0.04)} & \multicolumn{1}{l}{(0.00)} & \multicolumn{1}{l}{ (0.14)} \\ & & & & & \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{...} & \multicolumn{1}{l}{...} & \multicolumn{1}{l}{...} & \multicolumn{1}{l}{...} & \multicolumn{1}{l}{...} \\ & & & & & \\ \multicolumn{1}{l}{$\delta $} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{1.00***} & \multicolumn{1}{l}{.} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{.} & \multicolumn{1}{l}{(0.00)} & \multicolumn{1}{l}{.}\\\hline \end{tabular*} {\footnotesize Notes: One, two, and three asterisks reflect significance at the 0.10, 0.05, and 0.01 confidence levels, respectively. Robust standard errors are in parentheses.} \end{table} It is a little harder to illustrate Theorem 3, which says that the \textsc{fevd} standard errors on the time-invariant variables are similarly understated. However the \textsc{ht} estimator is just-indentified in this case, which is the reason the \textsc{ht} coefficients and standard errors for time-varying variables are exactly the same as \textsc{fe}. It is no surprise, then, that the coefficient estimates of three time-invariant variables (which are all exogenous) are generally similar for \textsc{pols}, \textsc{fevd}, and \textsc{ht}. As expected, the \textsc{ht} standard errors are slightly larger but very close to those for \textsc{pols}, in an order of 0.16, 0.00 and 0.14 against 0.11, 0.00 and 0.10. However the \textsc{fevd} standard errors are very small, at 0.00 in every case for the precision that is shown. This is most implausible, because one would not expect \textsc{pols} to be generally less efficient, given the structure of this example. \citet{belke} % Belke and Spies (2008) is the only paper to our knowledge that reports results for all methods including \textsc{pols}, \textsc{fe}, \textsc{fevd}, and \textsc{ht}. Several other applications report both \textsc{fe} and \textsc{fevd} results \citep[\emph{e.g.}][]{caporale, mitze, krogstrup}. % (\emph{e.g.} Caporale et al., 2009; Mitze, 2009; Krogstrup and W{\"a}lti, 2008). In the studies we examined, the \textsc{fe} t-statistics were consistently smaller than those reported for \textsc{fevd} time-varying variables --- and often much smaller --- except for few cases affected by robust standard error formulae. Again, this is despite the fact that the coefficient estimators were actually identical by construction. \section{Comparison to Alternative Estimators} The \textsc{fevd} estimator was introduced as an alternative to the \textsc{ht} instrumental variable estimator. By also expressing \textsc{fevd} in its instrumental variable representation we are able to develop insights into their comparative properties. Hausman and Taylor showed that the standard fixed effects estimator is equivalent to an \textsc{iv} estimator with instrument set $Q_D X$. To that, they add any exogenous elements of $X$ or of $Z$ as further instruments.\endnote{Hausman and Taylor describe $P_D X$ as the additional instrument, but this interpretation follows \citet{Trevor}. %Breusch et al. (1989). } To see the relationship more clearly, decompose $X$ and $Z$ into exogenous and potentially endogenous sets: $X = [X_1, X_2]$ and $Z = [Z_1, Z_2]$, where the subscript $1$ indicates exogenous variables and the subscript $2$ indicates endogenous variables. The \textsc{ht} procedure is then an \textsc{iv} estimator which uses the instrument set $[Q_D X, X_1, Z_1].$ In contrast, the \textsc{fevd} procedure is an \textsc{iv} estimator which uses the instrument set $[Q_D X, Z_1, Z_2]$. The first essential difference between these estimators is that the \textsc{fevd} instrument set \emph{excludes} the exogenous time-varying variables $X_1$. Of course, $X_1$ may have no members. In that case, the \textsc{ht} estimator for endogenous $Z$ is not identified, so no useful comparisons can be made.\endnote{Ideally, one would have theoretical grounds for identifying which elements of $X$ are exogenous. As a practical matter, one could also use an over-identification test to confirm this assumption, since the fixed effects estimator of $\beta$ is consistent.} However, if $X_1$ has known members, then a more efficient estimator than \textsc{fevd} could be created by augmenting the instrument set with $X_1$. The second essential difference is that the \textsc{fevd} instrument set \emph{includes} the potentially endogenous time-invariant variables $Z_2$. If these variables are in fact correlated with the group effect, then the \textsc{fevd} estimator is inconsistent. The \textsc{fevd} and \textsc{ht} estimators coincide exactly when there are no exogenous elements of $X$ and no endogenous elements of $Z$.\endnote{More precisely, the two estimators are identical when all elements of $X$ are treated as if endogenous and all elements of $Z$ are treated as if exogenous, regardless of the actual endogeneity status.} The \textsc{fevd} procedure is thus primarily of interest when some $Z$ may in fact be endogenous. The essential question raised by Pl\"umper and Troeger is then whether it is better to use a biased and inconsistent but lower-variance estimator, or a consistent but higher-variance estimator. The question of whether a weak-instruments cure is worse than the disease is a sound one, which has been considered in other contexts by a variety of authors; see for example \citet{bound1995problems}. %Bound et al. (1995). Under a mean-squared error (\textsc{mse}) loss function, neither the \textsc{fevd} procedure nor the \textsc{ht} procedure will uniformly dominate the other. \textsc{mse} can be expressed as variance plus bias squared. Thus, a consistent estimator such as \textsc{ht} will be preferable to the \textsc{fevd} for sufficiently large sample size.\endnote{Of course, consistency does require that valid instruments correlated with $Z_2$ exist.} In contrast, for a small sample with a small endogeneity problem, it might be preferable to include the the time-invariant endogenous variables $Z_2$ as instruments, as \textsc{fevd} does. A more efficient estimator of this type than \textsc{fevd} would be the \textsc{iv} estimator which augments the set of all valid instruments with $Z_2$, forming the instrument set $[Q_D X, X_1, Z]$. One conventional approach to finding a balance would be to select between the competing estimators based on a specification test \citep{baltagi2003fixed}. % (Baltagi et al., 2003). If the test rejects the null hypothesis of no difference between estimators, then \textsc{ht} would be selected. Otherwise, the efficient estimator estimator would be selected because the evidence of endogeneity is too weak. Selection of a final estimator based on the results of a preliminary test is known as a \emph{pretest} procedure. Inference based on the standard errors of the final selected estimator alone may be misleading; however, bootstrap techniques which include the model selection step can circumvent this problem \citep{wong1997effects}. %(Wong, 1997). Since the work of \citet{jamesstein}, % James and Stein (1961), statisticians have understood that shrinking (biasing) an estimator toward a low-variance target can lower the \textsc{mse}. An extensive literature suggests shrinkage approaches based on using a weighted average of two estimators when one estimator is efficient and the other is consistent; see for example \citet{sawa1973}, \citet{feldstein1974errors}, \citet{mundlak1978pooling}, \citet{green1991james}, \citet{judgeshrink}, or \citet{mittelhammer2005combining}. %Sawa (1973), Feldstein (1974), Mundlak (1978), Green and Strawderman (1991), Judge %and Mittelhammer (2004), or Mittelhammeer and Judge (2005). We consider a shrinkage estimator which combines the consistent but inefficient \textsc{ht} estimator and the efficient but possible inconsistent \textsc{iv} estimator. For purposes of illustration, we choose a particularly simple shrinkage approach, but the literature contains many variations on the basic theme, which will have different strengths and weaknesses. If the bias, variance, and covariance of two estimators are known, it is algebraically straightforward to find the weight which minimizes the \textsc{mse} of a combined estimator. In particular, suppose one estimator $\phi$ is unbiased. The other estimator $\chi$ is biased, but has lower variance. The shrinkage estimator then has the form $\chi + w (\phi - \chi)$, where $w$ is the weight placed on the consistent estimator. Straightforward calculus shows that optimal weight which minimizes \textsc{mse} is \begin{equation} \label{w} w = \frac{\mu_{\chi}^2 + \sigma^2_{\chi} - \sigma_{\chi \phi}}{\mu_{\chi}^2 + \sigma^2_{\chi} + \sigma^2_{\phi} - 2 \sigma_{\chi \phi}}, \end{equation} where bias is indicated by $\mu$ and where variance is indicated by $\sigma$. Of course, the exact bias and variances will usually not be known; however, practical estimates of these terms are readily available for \textsc{iv} estimators. \citet{mittelhammer2005combining} % Mittelhammer and Judge (2005) show that plugging in such empirical estimates produces a practical weighted-average estimator. They choose a single $w$ to minimize the sum of \textsc{mse} over all coefficients. Since we are primarily interested in the \textsc{mse} of a single coefficient in this analysis, we apply the solution for $w$, as presented in (\ref{w}) which is the single-covariate case of equation 3.5 of Mittelhammer and Judge. We use standard empirical estimates of the variance and covariance terms from application of the basic \textsc{iv} formula (\ref{coviv}). The difference between the two estimators provides our estimate of the bias of the efficient estimator, since \textsc{ht} is asymptotically unbiased. Mittelhammer and Judge provide detailed discussion on calculating bootstrap percentiles and standard errors, through application of Efron's bias-corrected and accelerated bootstrap \citep{efronbca}. %(Efron, 1987). The only change needed for the present context is to account for the panel structure, which is most simply done by resampling at the group level rather than resampling single observations independently. \section{Monte Carlo Evidence} In this section we compare the practical performance under a range of conditions of various estimators for an endogenous time-invariant $Z$. In addition to the \textsc{fevd} and \textsc{ht} estimators, we consider a pretest estimator and a shrinkage estimator. The pretest estimator selects between \textsc{ht} and the \textsc{iv} estimator based on the instrument set set $[Q_D X, X_1, Z]$, which treats all $Z$ as exogenous (as \textsc{fevd} does) in addition to using the \textsc{ht} instruments. The pretest selection is based on the 95\% critical value of the Durbin-Wu-Hausman specification test for exogeneity of $Z$ \citep[see \emph{e.g.}][p. 237]{davidson}. % (see \emph{e.g.} Davidson and MacKinnon, 1993, p. 237). The shrinkage estimator assigns weights for the two estimators according to a first-stage empirical estimate of formula (\ref{w}). Pl\"umper and Troeger argue for the superiority of the \textsc{fevd} procedure over the \textsc{ht} approach based on Monte Carlo evidence. While our simulation design stays close to the original design where appropriate, our design differ from theirs in two fundamental respects.\endnote{The authors graciously provided the original simulation code upon request.} The first difference is that in the Pl\"umper and Troeger Monte Carlo study, the \textsc{ht} estimator was not actually consistent. This is because their data generating process had no correlation between $X$ and $Z$. The fact that the available instruments had, by construction, zero explanatory power for the endogenous variable contrasts sharply with their characterization of the Monte Carlo results (p. 130): ``the advantages of the \textsc{fevd} estimator over the Hausman-Taylor cannot be explained by the poor quality of the instruments.'' Pl\"umper and Troeger note (in footnote 11) that the advantage of \textsc{fevd} persists in their experiments regardless of sample size. However, the asymptotic bias of an \textsc{iv} estimator is the same as the bias of \textsc{ols} when the instruments are uncorrelated with the endogenous variable, and thus irrelevant \citep{han2001asymptotic}. % (Han and Schmidt, 2001). In contrast, with a valid and relevant instrument, the bias of the \textsc{iv} estimator will approach zero asymptotically. We therefore consider scenarios in our simulation where the \textsc{ht} estimator is consistent, that is at where at least one instrument for the endogenous $Z$ is valid and relevant. The second difference is that our simulations account for random variation in the group effect, while the Pl\"umper and Troeger code holds the effect ($u$) fixed across all replications. \citet{mundlak1978pooling} % Mundlak (1978) shows there is no loss of generality in assuming the effect is random, because the fixed-effects estimator and its related procedures can be described as inference conditional on the realizations of the effect in the sample. Further, the effect needs to be at least potentially random if the relationship between the effect and the regressors is to be described as \emph{correlation}. As Mundlak shows, if the random effect is correlated with the group-averages of regressors in unknown ways, then the optimal linear estimator in the random-effects \emph{model} is in fact the fixed-effects \emph{estimator}. The code used by Pl\"umper and Troeger does not simply fix the replicated effects at some sample realization, rather it uses the Stata command `corr2data' to fix the sample moments of the variables and the group effects exactly in every replication. The vector of effects is thereby `fixed' by making it exactly orthogonal to the exogenous variables, effectively excluding any practical influence of the group effect in the simulated data. That process does not simulate a fixed-effects model, but rather one in which there is no group effect at all! By contrast, our random-effects simulation represents the situation where the analyst is uncertain of the magnitudes of the group effects. We run a series of experiments which vary the degree of endogeneity and strength of instrument. The data generating process for our simulation is \begin{equation} y_{it} = 1 + 0.5 x_1 + 2 x_2 -1.5 x_3 - 2.5 z_1 + 1.8 z_2 + 3 z_3 + u_i + e_{it}. \end{equation} Here, $[x_1, x_2, x_3]$ is a time-varying mean-zero orthonormal design matrix, fixed across all experiments. $[z_1, z_2]$ is a time-invariant mean-zero orthonormal design matrix, fixed across all experiments. $z_3$ is fixed for all replications in each experiment. $z_3$ has sample mean zero and variance 1, and is orthogonal to all other variables except $x_1$. The sample covariance of the group mean of $x_1$ with $z_3$ is set exactly to an experiment-specific level, which allows us to vary the strength of the instrument across experiments.\endnote{Conditional on a non-zero sample correlation of the endogenous variable and the instrument, the moments of the \textsc{iv} estimator exist, so the Monte Carlo \textsc{mse} is well-defined.} The idiosyncratic error term $e$ is standard normal. The random effect $u$ is drawn from a normal distribution in each replication. The expectation of $u$ conditional on $z_3$ is $\rho z_3$, where $\rho$ works out to be the value of cov($z_3,u$) set in the experimental design. All other variables are uncorrelated with $u$, and the variance of $u$ conditional on all variables is 1.\endnote{The specified pattern of covariance is implemented through a Choleski decomposition approach.} The level of endogeneity is varied across experiments by changing the value of cov$(z_3,u)$. Each experiment has 1000 replications, which vary the random components $u$ and $e$. There are 30 groups ($N$) and 20 periods ($T$), as reported in \citet{plumper}. % Pl\"umper and Troeger (2007a). In implementing the estimators $[x_1, x_2, z_1, z_2]$ are treated as known exogenous, while $[x_3, z_3]$ are treated as potentially endogenous. \newcommand\fn{\fontsize{8}{12}\selectfont} \begin{figure}[ht!] \centering \caption{Performance of the four estimators for varying instrument strengths} \smallskip \includegraphics[width=\textwidth]{fig1.eps} \end{figure} Figure 1 illustrates the simulation results for varying instrument strengths and endogeneity levels. The vertical axis in each panel is the square root of \textsc{mse} of various estimators for the endogenous time-invariant variable $z_{3}$. The horizontal axis of each panel is the covariance between the random effect $u$ and $z_{3}$. Each panel illustrates different instrument strength, as indicated by stronger instruments having higher correlation between the group-means of $x_{1}$ and the endogenous variable $z_{3}$. The four panels display the experiments for corr$(\bar{x}_1,z_3) = 0.15, 0.30, 0.45,$ and $0.60$ respectively.\endnote{Because variances of $\bar{x}_1$ and $z_3$ are both 1, the covariance of these variables equals their correlation.} Note that, within each panel, the \textsc{ht} results are unchanging as a consequence of the experimental design. Also, across panels, the \textsc{fevd} results are unchanging by design. The most notable feature of Figure 1 is that neither \textsc{ht} nor \textsc{fevd} uniformly dominates the other. If reasonably strong instruments are available to implement the \textsc{ht} procedure, and endogeneity is an issue, \textsc{ht} can greatly outperform \textsc{fevd} as shown in Panel 4 because the higher variance of \textsc{ht} is compensated by lower bias.\endnote{ The discussion here focuses on the small sample properties. When $N$ is very large, \textsc{ht} will always outperform \textsc{fevd} if there is endogeneity and valid and relevant instruments exist. For a modest example of relative estimator performance as $N$ grows, see the Appendix, where the case of $N = 300$ and $T=2$ is illustrated. } For all cases when endogeneity is absent (or is mild), \textsc{fevd} will be the most efficient estimator, as shown at the far left of all panels, because \textsc{fevd} exploits the true (or approximately true) restriction that $z_3$ is uncorrelated with $u$. If the investigator has strong prior reason to believe that endogeneity is not an issue, it makes sense to use that information. Indeed, with informative priors over endogeneity, using a Bayesian procedure which minimizes risk against that prior would be the ideal approach. However, usually, the investigator will be using \textsc{fe}, or \textsc{ht}, or \textsc{fevd} precisely because of concern that endogeneity might be a significant problem. Rather than relying solely on prior information about the degree of endogeneity, the investigator can rely on evidence from within the dataset. Both the shrinkage and the pretest estimators are in this spirit. The shrinkage estimator in particular exhibits remarkably good risk characteristics across all ranges of all four panels, and it clearly dominates the pretest approach under \textsc{mse} loss. Indeed the shrinkage estimator often has an \textsc{mse} lower than \emph{both} the \textsc{ht} and the \textsc{fevd}, and never is much worse than the better of the two. The Monte Carlo evidence suggests that a shrinkage estimator would almost certainly be the best choice in the absence of prior information that the endogeneity problem is quite small.\endnote{While our focus is on estimator performance, it is worth noting that the Monte Carlo results do confirm that the asymptotic variance formula in (\ref{coviv}) provides unbiased estimates of the \textsc{ht} and \textsc{fevd} sampling variance, when ${\sigma^2_{e}}$ and ${\sigma^2_{u}}$ are calculated with appropriate degrees of freedom corrections for small sample. Further, the bootstrap quantiles for the shrinkage estimator are reasonably accurate, confirming the results of \citet{mittelhammer2005combining}.} %Mittelhammer and Judge (2005).} More generally, if incomplete or uncertain prior information is available, alternatives which explicitly model that information, such as traditional Bayesian techniques or recent variants such as Bayesian model averaging \citep{hoeting1999bayesian}, % (Hoeting et al. 1999), will likely be the best approach. \section{Conclusions} The \textsc{fevd} estimator of \citet{plumper} % Pl\"umper and Troeger (2007a) offers the analyst of panel data a way to include time-invariant (and slowly-changing) variables in the presence of group effects that are possibly correlated with the explanatory variables. Thus it appears superior to the existing leading approaches of fixed-effects (which omits the time-invariant variables) and Hausman-Taylor (which requires specifying the exogeneity status of each explanatory variable). Pl\"umper and Troeger's motivation for the procedure was mostly heuristic and their evidence came from Monte Carlo experiments showing that \textsc{fevd} often displays better mean-squared error properties than both \textsc{fe} and \textsc{ht}. The procedure can be implemented in three easy stages, or even more conveniently in the Stata package provided by \citet{software}. %Pl\"umper and Troeger (2007b). This procedure has proved popular with panel data analysts. Our analytical results and revised Monte Carlo experiments challenge the value of \textsc{fevd}. Is it still a useful tool? We find that the coefficients of all the time-varying variables after the three stages of \textsc{fevd} are exactly the same as \textsc{fe} in the first stage. This fact is sometimes seen in the empirical applications but rarely commented upon with any clarity. Obviously, there is no gain in using \textsc{fevd} over the simpler \textsc{fe} if these coefficients are the objects of interest. Further, if something is known about the exogeneity of explanatory variables then these estimates are inefficient because they ignore the extra information. What is worse, unlike the simple first-stage \textsc{fe}, the standard errors from \textsc{fevd} are too small --- sometimes very much too small, judging from our empirical example and other published applications. In this case \textsc{fevd} is a definite step backwards. The main attraction of \textsc{fevd} is its ability to estimate coefficients of time-invariant explanatory variables. But, again the third stage is questionable. The same coefficient estimates are given in the second stage, which is a simple regression of the group-averaged residuals from \textsc{fe} on the time-invariant variables. The purported value of the third stage is to correct the standard errors, but this reasoning is now known to be false. Indeed there will be cases where the second-stage standard errors --- even though they are known to be wrong --- will be more accurate than those from the third stage. The example we have provided in Section 3 shows this possibility. So if \textsc{fevd} is the label to describe the three-stage procedure, it cannot be recommended for making inferences about any of the coefficients. The coefficient estimator, however, also represents a particular choice of instruments in standard \textsc{iv}. Dropping the three-stage methodology and reverting to an explicit \textsc{iv} approach would allow correct standard errors to be obtained in the cases where the estimator is consistent. However, since all of the time-invariant variables are used as instruments, the \textsc{fevd} estimator will be inconsistent if any of these are endogenous. The value of this estimator relative to others then depends on the trade-off between inconsistency and inefficiency. When the objective is reduced mean-squared error, the literature is replete with other methods such as shrinkage estimators known to have good properties. We have provided one such estimator that clearly dominates the \textsc{fevd} estimator over much of the parameter space and also limits the risk in regions where the \textsc{fevd} risk is unbounded. We demonstrate the feasibility of such an estimator with standard errors found empirically by bootstrapping. In undertaking these investigations we have also uncovered an explanation for the misleading evidence favouring \textsc{fevd} that was suggested in the previous Monte Carlo studies. \newpage \singlespace \theendnotes %\bibliographystyle{chicago} %\bibliography{fevd2} \newpage \begin{thebibliography}{} \bibitem[\protect\citeauthoryear{Baltagi, Bresson, and Pirotte}{Baltagi et~al.}{2003}]{baltagi2003fixed} Baltagi, Badi~H., Georges Bresson, and Alain Pirotte. 2003. \newblock {Fixed effects, random effects or Hausman--Taylor? A pretest estimator}. \newblock {\em Economics Letters\/}~{\em 79\/}(3), 361--369. \bibitem[\protect\citeauthoryear{Belke and Spies}{Belke and Spies}{2008}]{belke} Belke, Ansgar and Julia Spies. 2008. \newblock Enlarging the {EMU} to the east: What effects on trade? \newblock {\em Empirica\/}~{\em 35\/}(4), 369--89. \bibitem[\protect\citeauthoryear{Bound, Jaeger, and Baker}{Bound et~al.}{1995}]{bound1995problems} Bound, John., David~A. Jaeger, and Regina~M. Baker. 1995. \newblock Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. \newblock {\em Journal of the American Statistical Association\/}~{\em 90\/}(430), 443--50. \bibitem[\protect\citeauthoryear{Breusch, Mizon, and Schmidt}{Breusch et~al.}{1989}]{Trevor} Breusch, Trevor~S., Grayham~E. Mizon, and Peter Schmidt. 1989. \newblock Efficient estimation using panel data. \newblock {\em Econometrica\/}~{\em 57\/}(3), 695--700. \bibitem[\protect\citeauthoryear{Caporale, Rault, Sova, and Sova}{Caporale et~al.}{2009}]{caporale} Caporale, Guglielmo~M., Christophe Rault, Robert Sova, and Anamaria Sova. 2009. \newblock On the bilateral trade effects of free trade agreements between the {EU}-15 and the {CEEC}-4 countries. \newblock {\em Review of World Economics\/}~{\em 145\/}(2), 189--206. \bibitem[\protect\citeauthoryear{Davidson and Mac{K}innon}{Davidson and Mac{K}innon}{1993}]{davidson} Davidson, Russell and James~G. Mac{K}innon. 1993. \newblock {\em Estimation and Inference in Econometrics}. \newblock Oxford University Press. \bibitem[\protect\citeauthoryear{Efron}{Efron}{1987}]{efronbca} Efron, Bradley. 1987. \newblock Better bootstrap confidence intervals. \newblock {\em Journal of the American Statistical Association\/}~{\em 82\/}(397), 171--85. \bibitem[\protect\citeauthoryear{Feldstein}{Feldstein}{1974}]{feldstein1974errors} Feldstein, Martin. 1974. \newblock Errors in variables: A consistent estimator with smaller {MSE} in finite samples. \newblock {\em Journal of the American Statistical Association\/}~{\em 69\/}(348), 990--96. \bibitem[\protect\citeauthoryear{Green and Strawderman}{Green and Strawderman}{1991}]{green1991james} Green, Edwin~J. and William~E. Strawderman. 1991. \newblock {A James-Stein type estimator for combining unbiased and possibly biased estimators}. \newblock {\em Journal of the American Statistical Association\/}~{\em 86\/}(416), 1001--06. \bibitem[\protect\citeauthoryear{Han and Schmidt}{Han and Schmidt}{2001}]{han2001asymptotic} Han, Chirok and Peter Schmidt. 2001. \newblock {The asymptotic distribution of the instrumental variable estimators when the instruments are not correlated with the regressors}. \newblock {\em Economics Letters\/}~{\em 74\/}(1), 61--66. \bibitem[\protect\citeauthoryear{Hausman and Taylor}{Hausman and Taylor}{1981}]{hausman1981panel} Hausman, Jerry~A. and William~E. Taylor. 1981. \newblock {Panel data and unobservable individual effects}. \newblock {\em Econometrica\/}~{\em 49\/}(6), 1377--98. \bibitem[\protect\citeauthoryear{Hoeting, Madigan, Raftery, and Volinsky}{Hoeting et~al.}{1999}]{hoeting1999bayesian} Hoeting, Jennifer~A., David Madigan, Adrian~E. Raftery, and Chris~T. Volinsky. 1999. \newblock {Bayesian model averaging: A tutorial}. \newblock {\em Statistical Science\/}~{\em 14\/}(4), 382--401. \bibitem[\protect\citeauthoryear{James and Stein}{James and Stein}{1961}]{jamesstein} James, W. and Charles Stein. 1961. \newblock Estimation with quadratic loss. \newblock In J.~Neyman (Ed.), {\em Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability}, Volume~1, pp. 361--79. University of California Press. \bibitem[\protect\citeauthoryear{Judge and Mittelhammer}{Judge and Mittelhammer}{2004}]{judgeshrink} Judge, George~G. and Ron~C. Mittelhammer. 2004. \newblock A semiparametric basis for combining estimation problems under quadratic loss. \newblock {\em Journal of the American Statistical Association\/}~{\em 99\/}(466), 479--87. \bibitem[\protect\citeauthoryear{Krogstrup and W{\"a}lti}{Krogstrup and W{\"a}lti}{2008}]{krogstrup} Krogstrup, Signe and S\'ebastien W{\"a}lti. 2008. \newblock {Do fiscal rules cause budgetary outcomes?} \newblock {\em Public Choice\/}~{\em 136\/}(1), 123--138. \bibitem[\protect\citeauthoryear{Mittelhammer and Judge}{Mittelhammer and Judge}{2005}]{mittelhammer2005combining} Mittelhammer, Ron~C. and George~G. Judge. 2005. \newblock Combining estimators to improve structural model estimation and inference under quadratic loss. \newblock {\em Journal of Econometrics\/}~{\em 128\/}(1), 1--29. \bibitem[\protect\citeauthoryear{Mitze}{Mitze}{2009}]{mitze} Mitze, Timo. 2009. \newblock Endogeneity in panel data models with time-varying and time-fixed regressors: to {IV} or not {IV}? \newblock Ruhr Economic Paper No. 83. \bibitem[\protect\citeauthoryear{Mundlak}{Mundlak}{1978}]{mundlak1978pooling} Mundlak, Yair. 1978. \newblock {On the pooling of time series and cross section data}. \newblock {\em Econometrica\/}~{\em 46\/}(1), 69--85. \bibitem[\protect\citeauthoryear{Pl\"umper and Troeger}{Pl\"umper and Troeger}{2007a}]{plumper} Pl\"umper, Thomas and Vera~E. Troeger. 2007a. \newblock Efficient estimation of time-invariant and rarely changing variables in finite sample panel analyses with unit fixed effects. \newblock {\em Political Analysis\/}~{\em 15\/}(2), 124--39. \bibitem[\protect\citeauthoryear{Pl\"umper and Troeger}{Pl\"umper and Troeger}{2007b}]{software} Pl\"umper, Thomas and Vera~E. Troeger. 2007b. \newblock xtfevd.ado version 2.00 beta. \newblock Accessed from http://www.polsci.org/pluemper/xtfevd.ado. \bibitem[\protect\citeauthoryear{Sawa}{Sawa}{1973}]{sawa1973} Sawa, Takamitsu. 1973. \newblock The mean square error of a combined estimator and numerical comparison with the {TSLS} estimator. \newblock {\em Journal of Econometrics\/}~{\em 1\/}(2), 115--32. \bibitem[\protect\citeauthoryear{Wong}{Wong}{1997}]{wong1997effects} Wong, Ka-fu. 1997. \newblock Effects on inference of pretesting the exogeneity of a regressor. \newblock {\em Economics Letters\/}~{\em 56\/}(3), 267--71. \bibitem[\protect\citeauthoryear{Wooldridge}{Wooldridge}{2002}]{wooltext} Wooldridge, Jeffrey~M. 2002. \newblock {\em Econometric Analysis of Cross Section and Panel Data}. \newblock The MIT Press. \end{thebibliography} \newpage \section*{Appendix} \subsection*{Monte Carlo results for large $N$ and small $T$.} \begin{figure}[ht!] \centering \caption{Relative estimator performance when $N=300$ and $T=2$.} \smallskip \includegraphics[width=\textwidth]{fig2.eps} \end{figure} In applications such as labor market studies the number of groups can be quite large, often in the tens of thousands, since there may be a distinct group for each individual in the study. Figure 2 presents a modest example of the relative behavior of the four estimators as the number of groups grows larger. Each panel in Figure 2 illustrates the same parameter settings as the corresponding panel in Figure 1. The simulation code for the figures is identical, except for the $N$ and $T$ settings. While the overall number of observations is the same in the two figures, the larger number of groups provides more information about the time-invariant variables. Panel 4 illustrates that the relative performance of $\textsc{fevd}$ can be quite poor for reasonable parameter settings and a modest number of observations. \subsection*{Full results for the empirical example} \begin{table}[ht!] \caption{Full replication of Belke and Spies (2008).} {\small \begin{tabular*}{\textwidth}{cccccc} \hline & { (1)} & { (2)} & { (3)} & { (4)} & { (5)} \\ & { POLS} & { FE} & { Stage 2} & { FEVD} & { HT% } \\ \hline \multicolumn{1}{l}{ lngdpim} & \multicolumn{1}{l}{ 0.88***} & \multicolumn{1}{l}{ 0.68***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.68***} & \multicolumn{1}{l}{ 0.68***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.04)} & \multicolumn{1}{l}% { (0.11)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.01)} & \multicolumn{1}{l}{ (0.11)} \\ \multicolumn{1}{l}{ lngdpex} & \multicolumn{1}{l}{ 0.89***} & \multicolumn{1}{l}{ 0.71***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.71***} & \multicolumn{1}{l}{ 0.71***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.03)} & \multicolumn{1}{l}% { (0.07)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.01)} & \multicolumn{1}{l}{ (0.07)} \\ \multicolumn{1}{l}{ lrer} & \multicolumn{1}{l}{ -0.01} & \multicolumn{1}{l}{ 0.13**} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.13***} & \multicolumn{1}{l}{ 0.13**} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.01)} & \multicolumn{1}{l}% { (0.06)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.01)} & \multicolumn{1}{l}{ (0.06)} \\ \multicolumn{1}{l}{ ldist} & \multicolumn{1}{l}{ -1.27***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ -1.41***} & \multicolumn{1}{l}{ -1.41***} & \multicolumn{1}{l}{ -1.75***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.11)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.04)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.16)} \\ \multicolumn{1}{l}{ border} & \multicolumn{1}{l}{ -0.00} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.00**} & \multicolumn{1}{l}{ 0.00***} & \multicolumn{1}{l}{ -0.00} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.00)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.00)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.00)} \\ \multicolumn{1}{l}{ ll} & \multicolumn{1}{l}{ -0.16*} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ -0.23***} & \multicolumn{1}{l}{ -0.23***} & \multicolumn{1}{l}{ -0.16} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.10)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.04)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.14)} \\ \multicolumn{1}{l}{ cl} & \multicolumn{1}{l}{ 0.23*} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.13***} & \multicolumn{1}{l}{ 0.13***} & \multicolumn{1}{l}{ 0.02} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.12)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.05)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.16)} \\ \multicolumn{1}{l}{ eu} & \multicolumn{1}{l}{ 0.08} & \multicolumn{1}{l}{ 0.03} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.03***} & \multicolumn{1}{l}{ 0.03} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.09)} & \multicolumn{1}{l}% { (0.05)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.01)} & \multicolumn{1}{l}{ (0.05)} \\ \multicolumn{1}{l}{ ea} & \multicolumn{1}{l}{ 0.16*} & \multicolumn{1}{l}{ 0.22***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.22***} & \multicolumn{1}{l}{ 0.22***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.10)} & \multicolumn{1}{l}% { (0.06)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.01)} & \multicolumn{1}{l}{ (0.06)} \\ \multicolumn{1}{l}{ ewu} & \multicolumn{1}{l}{ 0.13**} & \multicolumn{1}{l}{ 0.07**} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.07**} & \multicolumn{1}{l}{ 0.07**} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.05)} & \multicolumn{1}{l}% { (0.03)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.03)} \\ \multicolumn{1}{l}{ lavrer3} & \multicolumn{1}{l}{ 1.22***} & \multicolumn{1}{l}{ 0.45**} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.45***} & \multicolumn{1}{l}{ 0.45***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.41)} & \multicolumn{1}{l}% { (0.22)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.03)} & \multicolumn{1}{l}{ (0.22)} \\ \multicolumn{1}{l}{ lavdist} & \multicolumn{1}{l}{ 0.55***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.93***} & \multicolumn{1}{l}{ 0.93***} & \multicolumn{1}{l}{ 1.45***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.15)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.04)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ 0.24)} \\ \multicolumn{1}{l}{ avborder} & \multicolumn{1}{l}{ 0.00***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.01***} & \multicolumn{1}{l}{ 0.01***} & \multicolumn{1}{l}{ 0.01***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.00)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.00)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.00)} \\ \multicolumn{1}{l}{ avll} & \multicolumn{1}{l}{ -0.10***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ -0.14***} & \multicolumn{1}{l}{ -0.14***} & \multicolumn{1}{l}{ -0.18***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.03)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.01)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.06)} \\ \multicolumn{1}{l}{ avcl} & \multicolumn{1}{l}{ -0.02} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ -0.40***} & \multicolumn{1}{l}{ -0.40***} & \multicolumn{1}{l}{ -0.44} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.26)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.10)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.39)} \\ \multicolumn{1}{l}{ aveu} & \multicolumn{1}{l}{ -0.74***} & \multicolumn{1}{l}{ -0.22*} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ -0.22***} & \multicolumn{1}{l}{ -0.22*} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.21)} & \multicolumn{1}{l}% { (0.12)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.01)} & \multicolumn{1}{l}{ (0.12)} \\ \multicolumn{1}{l}{ avea} & \multicolumn{1}{l}{ 0.34} & \multicolumn{1}{l}{ -0.07} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ -0.07***} & \multicolumn{1}{l}{ -0.07} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.23)} & \multicolumn{1}{l}% { (0.10)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.01)} & \multicolumn{1}{l}{ (0.10)} \\ \multicolumn{1}{l}{ avewu} & \multicolumn{1}{l}{ 0.22*} & \multicolumn{1}{l}{ 0.69***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.69***} & \multicolumn{1}{l}{ 0.69***} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.12)} & \multicolumn{1}{l}% { (0.08)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.01)} & \multicolumn{1}{l}{ (0.08)} \\ \multicolumn{1}{l}{ hc1} & \multicolumn{1}{l}{ 0.09***} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 0.10***} & \multicolumn{1}{l}{ 0.10***} & \multicolumn{1}{l}{ 0.09*} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.03)} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ (0.01)} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.05)} \\ \multicolumn{1}{l}{ hc3} & \multicolumn{1}{l}{ -0.18***} & \multicolumn{1}{l}{ -0.03} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ -0.03***} & \multicolumn{1}{l}{ -0.03} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ (0.04)} & \multicolumn{1}{l}% { (0.03)} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ % (0.00)} & \multicolumn{1}{l}{ (0.03)} \\ \multicolumn{1}{l}{${\delta }$} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ 1.00***} & \multicolumn{1}{l}{ .} \\ \multicolumn{1}{l}{} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}% { .} & \multicolumn{1}{l}{ .} & \multicolumn{1}{l}{ (0.00)} & \multicolumn{1}{l}{ .} \\ \hline \end{tabular*} } {\footnotesize Notes: One, two, and three asterisks reflect significance at the 0.10, 0.05, and 0.01 confidence levels, respectively. Robust standard errors are in parentheses.} \end{table} \subsection*{Variable descriptions for Table 2} \begin{tabular}{ll} lnimports & logged imports in 1000 US\$ (Source: OECD ICTS) \\ lngdprep & logged nominal gdp importing country (Source: UN NAMAD) \\ lngdppar & logged nominal gdp exporting country (Source: UN NAMAD) \\ lrer & logged bilateral real exchange rate \\ ldist & logged great circle distance in km as calculated by haversine formula \\ border & border length in km \\ ll & dummy=1 for one country and =2 for both countries of the trading \\ & pair being land locked \\ cl & dummy=1 if trading partners share an official language \\ eu & dummy=1 if EU member state \\ ea & dummy=1 if Europe agreement \\ ewu & dummy=1 if EWU member state (incl. 1998) \\ lavrer3 & logged multilateral exchange rate \\ lavdist & logged multilateral distance \\ avborder & Multilateral border \\ avll & Multilateral landlocked \\ avcl & Multilateral language \\ aveu & Multilateral EU \\ avea & Multilateral EA \\ avewu & Multilateral EWU \\ hc1 & number of years of a trading pair in the sample \\ hc3 & dummy=1 if the trading pair is present in the sample in t-1% \end{tabular} \newpage \pagestyle{empty} \end{document}