Financial Analysis

Panel Data Analysis for Business and Economics

February 5, 202514 min read

Master fixed effects, random effects, and GMM estimators for longitudinal company data. Includes Stata and R code examples.

What is Panel Data?

Panel data (also called longitudinal data) contains observations on multiple entities — firms, individuals, countries — observed at multiple time points. For example, financial data for 200 companies over 10 years creates a panel of 2,000 observations. Panel data is powerful because it allows you to control for unobserved individual heterogeneity (factors specific to each entity that do not change over time), which cross-sectional data alone cannot address.

Fixed Effects vs Random Effects

The two main panel data approaches are fixed effects (FE) and random effects (RE). Fixed effects control for all time-invariant characteristics of each entity by estimating entity-specific intercepts. This means FE cannot estimate the effect of time-invariant variables (like a firm's country of incorporation), but it eliminates omitted variable bias from unobserved constants. Random effects treat entity-specific effects as random draws from a distribution, allowing estimation of time-invariant variables but assuming these effects are uncorrelated with the regressors.

The Hausman Test

The Hausman test is the standard method for choosing between FE and RE. It tests whether the unique entity effects are correlated with the regressors. If the test is significant (p < 0.05), use fixed effects — the random effects assumption is violated. If not significant, random effects is preferred because it is more efficient. In R, you can run the Hausman test with phtest(fe_model, re_model) from the plm package. In Stata: hausman fixed random.

Implementation in R and Stata

In R, the plm package is the standard tool. Install it with install.packages("plm"), then: library(plm); pdata <- pdata.frame(df, index=c("firm_id","year")); fe <- plm(revenue ~ rd_spending + employees, data=pdata, model="within"); re <- plm(revenue ~ rd_spending + employees, data=pdata, model="random"). In Stata: xtset firm_id year; xtreg revenue rd_spending employees, fe; xtreg revenue rd_spending employees, re.

GMM Estimation

When panel data has endogeneity issues (explanatory variables correlated with the error term), the Generalised Method of Moments (GMM) provides a solution. The Arellano-Bond estimator uses lagged values of the dependent variable as instruments. In R, use pgmm() from the plm package. In Stata, use xtabond2. GMM is particularly common in corporate finance research where causality between financial decisions and performance is difficult to establish.

Best Practices for Panel Data Research

Always check for: stationarity of your time series variables (unit root tests), serial correlation in errors (Wooldridge test), heteroscedasticity (Breusch-Pagan test), and cross-sectional dependence (Pesaran CD test). Use clustered standard errors at the entity level to account for within-entity correlation. Report results from both FE and RE specifications for transparency. When publishing, clearly describe your panel structure (number of entities, time periods, balance/unbalance) and justify your model choice with diagnostic test results.

Financial Analysis