Hostname: page-component-586b7cd67f-rdxmf Total loading time: 0 Render date: 2024-11-30T20:43:26.894Z Has data issue: false hasContentIssue false

Uncovering Sparsity and Heterogeneity in Firm-Level Return Predictability Using Machine Learning

Published online by Cambridge University Press:  13 September 2022

Theodoros Evgeniou
Affiliation:
INSEAD Decision Sciences [email protected]
Ahmed Guecioueur*
Affiliation:
INSEAD Finance
Rodolfo Prieto
Affiliation:
INSEAD Finance [email protected]
*
[email protected] (corresponding author)
Rights & Permissions [Opens in a new window]

Abstract

Core share and HTML view are not available for this content. However, as you have access to this content, a full PDF is available via the ‘Save PDF’ action button.

We develop an approach that combines the estimation of monthly firm-level expected returns with an assignment of firms to (possibly) latent groups, both based on observable characteristics, using machine learning principles with linear models. The best-performing methods are flexible two-stage sparse models that capture group-membership predictive relationships. Portfolios formed to exploit such group-varying predictions based on a parsimonious set of characteristics deliver economically meaningful returns with low turnover. We propose statistical tests based on nonparametric bootstrapping for our results, and detail how different characteristics may matter for different groups of firms, making comparisons to the existing literature.

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of the Michael G. Foster School of Business, University of Washington

Footnotes

We thank Jennifer Conrad (the editor) and Alberto Martín-Utrera (the referee) for their constructive comments. We are grateful to Panos Mavrokonstantis for excellent research assistance while he was a Senior Research Scientist at INSEAD. We also thank participants at the 13th Annual SoFiE Conference, the 3rd Future of Financial Information Conference, the inaugural Miami Herbert Winter Research Conference on ML and Business, the 2021 AFA PhD Poster Session, the 2020 European Winter Meetings of the Econometric Society, the 22nd INFER Annual Conference, the 9th Wharton-INSEAD Doctoral Consortium, and the INSEAD Accounting and Finance PhD seminar series, as well as Alex Chinco (discussant), Victor DeMiguel, Scott Murray (discussant), Joël Peress, Marcel Rindisbacher, Raman Uppal, Jinyuan Zhang, and Guofu Zhou for their helpful comments. A previous version of this article was circulated under the title “Modeling Heterogeneity in Firm-Level Return Predictability with Machine Learning.”

References

Ando, T., and Bai, J.. “Clustering Huge Number of Financial Time Series: A Panel Data Approach with High-Dimensional Predictors and Factor Structures.” Journal of the American Statistical Association, 112 (2017), 11821198.CrossRefGoogle Scholar
Asness, C. S.; Porter, R. B.; and Stevens, R. L.. “Predicting Stock Returns Using Industry-Relative Firm Characteristics.” Available at SSRN, 213872 (2000).CrossRefGoogle Scholar
Balasubramaniam, V.; Campbell, J. Y.; Ramadorai, T.; and Ranish, B.. “Who Owns What? A Factor Model for Direct Stockholding.” Journal of Finance, forthcoming (2023).CrossRefGoogle Scholar
Barrot, J.-N., and Sauvagnat, J.. “Input Specificity and the Propagation of Idiosyncratic Shocks in Production Networks.” Quarterly Journal of Economics, 131 (2016), 15431592.CrossRefGoogle Scholar
Belloni, A.; Chen, D.; Chernozhukov, V.; and Hansen, C.. “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain.” Econometrica, 80 (2012), 23692429.Google Scholar
Belloni, A.; Chernozhukov, V.; and Hansen, C.. “Inference on Treatment Effects After Selection Among High-Dimensional Controls.” Review of Economic Studies, 81 (2014), 608650.CrossRefGoogle Scholar
Bonhomme, S., and Manresa, E.. “Grouped Patterns of Heterogeneity in Panel Data.” Econometrica, 83 (2015), 11471184.CrossRefGoogle Scholar
Brown, G., and Kapadia, N.. “Firm-Specific Risk and Equity Market Development.” Journal of Financial Economics, 84 (2007), 358388.CrossRefGoogle Scholar
Cameron, A. C., and Trivedi, P. K.. Microeconometrics: Methods and Applications. New York, NY: Cambridge University Press (2005).CrossRefGoogle Scholar
Campbell, J. Y., and Thompson, S. B.. “Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?Review of Financial Studies, 21 (2008), 15091531.CrossRefGoogle Scholar
Carhart, M. M.On Persistence in Mutual Fund Performance.” Journal of Finance, 52 (1997), 5782.CrossRefGoogle Scholar
Chernick, M. R. Bootstrap Methods: A Guide for Practitioners and Researchers, 2nd ed. Hoboken, NJ: John Wiley & Sons (2007).CrossRefGoogle Scholar
Chernozhukov, V.; Chetverikov, D.; Demirer, M.; Duflo, E.; Hansen, C.; and Newey, W.. “Double/Debiased/Neyman Machine Learning of Treatment Effects.” American Economic Review Papers and Proceedings, 107 (2017), 261265.CrossRefGoogle Scholar
Chetty, R.; Looney, A.; and Kroft, K.. “Salience and Taxation: Theory and Evidence.” American Economic Review, 99 (2009), 11451477.CrossRefGoogle Scholar
Cochrane, J. H.Presidential Address: Discount Rates.” Journal of Finance, 66 (2011), 10471108.CrossRefGoogle Scholar
Cohen, L., and Frazzini, A.. “Economic Links and Predictable Returns.” Journal of Finance, 63 (2008), 19772011.CrossRefGoogle Scholar
Conrad, J., and Kaul, G.. “An Anatomy of Trading Strategies.” Review of Financial Studies, 11 (1998), 489519.CrossRefGoogle Scholar
Daniel, K.; Mota, L.; Rottke, S.; and Santos, T.. “The Cross-Section of Risk and Returns.” Review of Financial Studies, 33 (2020), 19271979.CrossRefGoogle Scholar
DeMiguel, V.; Garlappi, L.; and Uppal, R.. “Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?Review of Financial Studies, 22 (2009), 19151953.CrossRefGoogle Scholar
DeMiguel, V.; Martin-Utrera, A.; Nogales, F. J.; and Uppal, R.. “A Transaction-Cost Perspective on the Multitude of Firm Characteristics.” Review of Financial Studies, 33 (2020), 21802222.CrossRefGoogle Scholar
DeMiguel, V.; Nogales, F. J.; and Uppal, R.. “Stock Return Serial Dependence and Out-of-Sample Portfolio Performance.” Review of Financial Studies, 27 (2014), 10311073.CrossRefGoogle Scholar
Diebold, F., and Mariano, R.. “Comparing Predictive Accuracy.” Journal of Business and Economic Statistics, 13 (1995), 253263.CrossRefGoogle Scholar
Diebold, F. X., and Shin, M.. “Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and Its Derivatives.” International Journal of Forecasting, 35 (2019), 16791691.CrossRefGoogle Scholar
Ding, C., and He, X., “K-Means Clustering via Principal Component Analysis.” In Proceedings of the Twenty-First International Conference on Machine Learning. New York, NY: Association for Computing Machinery (2004), 29.Google Scholar
Dorn, D., and Huberman, G.. “Preferred Risk Habitat of Individual Investors.” Journal of Financial Economics, 97 (2010), 155173.CrossRefGoogle Scholar
Fama, E. F.Market Efficiency, Long-Term Returns, and Behavioral Finance.” Journal of Financial Economics, 49 (1998), 283306.CrossRefGoogle Scholar
Fama, E. F., and French, K. R.. “A Five-Factor Asset Pricing Model.” Journal of Financial Economics, 116 (2015), 122.CrossRefGoogle Scholar
Farmer, L.; Schmidt, L.; and Timmermann, A.. “Pockets of Predictability.” Available at SSRN, 3152386 (2019).CrossRefGoogle Scholar
Feng, G.; Giglio, S.; and Xiu, D.. “Taming the Factor Zoo: A Test of New Factors.” Journal of Finance, 75 (2020), 13271370.CrossRefGoogle Scholar
Fisher, J. D.; Puelz, D. W.; and Carvalho, C. M.. “Monotonic Effects of Characteristics on Returns.” Annals of Applied Statistics, 14 (2020), 16221650.CrossRefGoogle Scholar
Freyberger, J.; Neuhierl, A.; and Weber, M.. “Dissecting Characteristics Nonparametrically.” Review of Financial Studies, 33 (2020), 23262377.CrossRefGoogle Scholar
Fuster, A.; Goldsmith-Pinkham, P.; Ramadorai, T.; and Walther, A.. “Predictably Unequal? The Effects of Machine Learning on Credit Markets.” Journal of Finance, 77 (2022), 547.CrossRefGoogle Scholar
Gabaix, X.A Sparsity-Based Model of Bounded Rationality.” Quarterly Journal of Economics, 129 (2014), 16611710.CrossRefGoogle Scholar
Gabaix, X.Behavioral Inattention.” In Handbook of Behavioral Economics: Applications and Foundations, Vol. 2. Amsterdam, Netherlands: Elsevier (2019), 261343.CrossRefGoogle Scholar
Gabaix, X.A Behavioral New Keynesian Model.” American Economic Review, 110 (2020), 22712327.CrossRefGoogle Scholar
Giannone, D.; Lenza, M.; and Primiceri, G. E.. “Economic Predictions with Big Data: The Illusion of Sparsity.” Econometrica, 89 (2021), 24092437.CrossRefGoogle Scholar
Green, J.; Hand, J. R.; and Zhang, X. F.. “The Characteristics That Provide Independent Information About Average US Monthly Stock Returns.” Review of Financial Studies, 30 (2017), 43894436.CrossRefGoogle Scholar
Grishchenko, O. V., and Rossi, M.. “The Role of Heterogeneity in Asset Pricing: The Effect of a Clustering Approach.” Journal of Business & Economic Statistics, 30 (2012), 297311.CrossRefGoogle Scholar
Gu, S.; Kelly, B.; and Xiu, D.. “Empirical Asset Pricing via Machine Learning.” Review of Financial Studies, 33 (2020), 22232273.CrossRefGoogle Scholar
Gu, S.; Kelly, B.; and Xiu, D.. “Autoencoder Asset Pricing Models.” Journal of Econometrics, 222 (2021), 429450.CrossRefGoogle Scholar
Guecioueur, A. “How Do Investors Learn as Data Becomes Bigger? Evidence from a FinTech Platform.” Available at SSRN, 3708476 (2020).CrossRefGoogle Scholar
Han, Y.; He, A.; Rapach, D.; and Zhou, G.. “Expected Stock Returns and Firm Characteristics: E-LASSO, Assessment, and Implications.” Available at SSRN, 3185335 (2021).Google Scholar
Hanna, R.; Mullainathan, S.; and Schwartzstein, J.. “Learning Through Noticing: Theory and Evidence from a Field Experiment.” Quarterly Journal of Economics, 129 (2014), 13111353.CrossRefGoogle Scholar
Hastie, T.; Tibshirani, R.; and Friedman, J.. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer Science & Business Media (2009).CrossRefGoogle Scholar
Hoberg, G., and Phillips, G.. “Text-Based Network Industries and Endogenous Product Differentiation.” Journal of Political Economy, 124 (2016), 14231465.CrossRefGoogle Scholar
Hou, K.Industry Information Diffusion and the Lead–Lag Effect in Stock Returns.” Review of Financial Studies, 20 (2007), 11131138.CrossRefGoogle Scholar
Hou, K., and Robinson, D. T.. “Industry Concentration and Average Stock Returns.” Journal of Finance, 61 (2006), 19271956.CrossRefGoogle Scholar
Huang, S.; O’Hara, M.; and Zhong, Z.. “Innovation and Informed Trading: Evidence from Industry ETFs.” Review of Financial Studies, 34 (2021), 12801316.CrossRefGoogle Scholar
Jiang, G.; Lee, C. M.; and Zhang, Y.. “Information Uncertainty and Expected Returns.” Review of Accounting Studies, 10 (2005), 185221.CrossRefGoogle Scholar
Kapetanios, G.A Bootstrap Procedure for Panel Data Sets with Many Cross-Sectional Units.” Econometrics Journal, 11 (2008), 377395.CrossRefGoogle Scholar
Karolyi, G. A., and Van Nieuwerburgh, S.. “New Methods for the Cross-Section of Returns.” Review of Financial Studies, 33 (2020), 18791890.CrossRefGoogle Scholar
Kelly, B. T.; Pruitt, S.; and Su, Y.. “Characteristics are Covariances: A Unified Model of Risk and Return.” Journal of Financial Economics, 134 (2019), 501524.CrossRefGoogle Scholar
Koijen, R. S. J., and Yogo, M.. “A Demand System Approach to Asset Pricing.” Journal of Political Economy, 127 (2019), 14751515.CrossRefGoogle Scholar
Lee, J. D.; Sun, D. L.; Sun, Y.; and Taylor, J. E.. “Exact Post-Selection Inference, with Application to the Lasso.” Annals of Statistics, 44 (2016), 907927.CrossRefGoogle Scholar
Lewellen, J.The Time-Series Relations Among Expected Return, Risk, and Book-to-Market.” Journal of Financial Economics, 54 (1999), 543.CrossRefGoogle Scholar
Lewellen, J.The Cross-Section of Expected Stock Returns.” Critical Finance Review, 4 (2015), 144.CrossRefGoogle Scholar
Lien, D., and Vuong, Q. H.. “Selecting the Best Linear Regression Model: A Classical Approach.” Working Paper No. 606, California Institute of Technology Social Science (1986).Google Scholar
Lustig, H.; Van Nieuwerburgh, S.; and Verdelhan, A.. “The Wealth-Consumption Ratio.” Review of Asset Pricing Studies, 3 (2013), 3894.CrossRefGoogle Scholar
Menzly, L., and Ozbas, O.. “Market Segmentation and Cross-Predictability of Returns.” Journal of Finance, 65 (2010), 15551580.CrossRefGoogle Scholar
Menzly, L.; Santos, T.; and Veronesi, P.. “Understanding Predictability.” Journal of Political Economy, 112 (2004), 147.CrossRefGoogle Scholar
Merton, R. C.An Intertemporal Capital Asset Pricing Model.” Econometrica: Journal of the Econometric Society, 41 (1973), 867887.CrossRefGoogle Scholar
Moskowitz, T. J., and Grinblatt, M.. “Do Industries Explain Momentum?Journal of Finance, 54 (1999), 12491290.CrossRefGoogle Scholar
Nagel, S. Machine Learning in Asset Pricing. Princeton, NJ: Princeton University Press (2021).Google Scholar
Novy-Marx, R., “Testing Strategies Based on Multiple Signals.” Working Paper, University of Rochester (2016).CrossRefGoogle Scholar
Novy-Marx, R., and Velikov, M.. “A Taxonomy of Anomalies and Their Trading Costs.” Review of Financial Studies, 29 (2016), 104147.CrossRefGoogle Scholar
Patton, A. J., and Weller, B.. “Risk Price Variation: The Missing Half of Empirical Asset Pricing.” Review of Financial Studies, 35 (2022), 51275184.CrossRefGoogle Scholar
Peng, L., and Xiong, W.. “Investor Attention, Overconfidence and Category Learning.” Journal of Financial Economics, 80 (2006), 563602.CrossRefGoogle Scholar
Rapach, D., and Zhou, G.. “Forecasting Stock Returns.” In Handbook of Economic Forecasting, Vol. 2. Amsterdam, Netherlands: Elsevier (2013), 328383.Google Scholar
Rapach, D. E.; Strauss, J. K.; Tu, J.; and Zhou, G.. “Industry Return Predictability: A Machine Learning Approach.” Journal of Financial Data Science, 1 (2019), 928.CrossRefGoogle Scholar
Rapach, D. E.; Strauss, J. K.; and Zhou, G.. “Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy.” Review of Financial Studies, 23 (2010), 821862.CrossRefGoogle Scholar
Rapach, D. E., and Zhou, G.. “Chapter 1: Time-Series and Cross-Sectional Stock Return Forecasting: New Machine Learning Methods.” In Machine Learning for Asset Management. Hoboken, NJ: John Wiley & Sons (2020), 133.Google Scholar
Reis, R.Inattentive Consumers.” Journal of Monetary Economics, 53 (2006), 17611800.CrossRefGoogle Scholar
Ross, S. A.The Arbitrage Theory of Capital Asset Pricing.” Journal of Economic Theory, 13 (1976), 341360.CrossRefGoogle Scholar
Ross, S. A. Neoclassical Finance. Princeton, NJ: Princeton University Press (2005).CrossRefGoogle Scholar
Rousseeuw, P. J.Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.” Journal of Computational and Applied Mathematics, 20 (1987), 5365.CrossRefGoogle Scholar
Santos, T., and Veronesi, P.. “Labor Income and Predictable Stock Returns.” Review of Financial Studies, 19 (2006), 144.CrossRefGoogle Scholar
Sims, C. A.Implications of Rational Inattention.” Journal of Monetary Economics, 50 (2003), 665690.CrossRefGoogle Scholar
Tibshirani, R. J.; Taylor, J.; Lockhart, R.; and Tibshirani, R.. “Exact Post-Selection Inference for Sequential Regression Procedures.” Journal of the American Statistical Association, 111 (2016), 600620.CrossRefGoogle Scholar
Timmermann, A.Forecasting Methods in Finance.” Annual Review of Financial Economics, 10 (2018), 449479.CrossRefGoogle Scholar
Welch, I., and Goyal, A.. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies, 21 (2007), 14551508.CrossRefGoogle Scholar
Zou, H., and Hastie, T.. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Methodological), 67 (2005), 301320.CrossRefGoogle Scholar
Supplementary material: PDF

Evgeniou et al. supplementary material

Evgeniou et al. supplementary material

Download Evgeniou et al. supplementary material(PDF)
PDF 454.6 KB