Uncovering Sparsity and Heterogeneity in Firm-Level Return Predictability Using Machine Learning

Theodoros Evgeniou; Ahmed Guecioueur; Rodolfo Prieto

doi:10.1017/S0022109022001028

Uncovering Sparsity and Heterogeneity in Firm-Level Return Predictability Using Machine Learning

Published online by Cambridge University Press: 13 September 2022

and

Theodoros Evgeniou: Affiliation:
INSEAD Decision Sciences [email protected]
Ahmed Guecioueur*: Affiliation:
INSEAD Finance
Rodolfo Prieto: Affiliation:
INSEAD Finance [email protected]
*: [email protected] (corresponding author)

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

We develop an approach that combines the estimation of monthly firm-level expected returns with an assignment of firms to (possibly) latent groups, both based on observable characteristics, using machine learning principles with linear models. The best-performing methods are flexible two-stage sparse models that capture group-membership predictive relationships. Portfolios formed to exploit such group-varying predictions based on a parsimonious set of characteristics deliver economically meaningful returns with low turnover. We propose statistical tests based on nonparametric bootstrapping for our results, and detail how different characteristics may matter for different groups of firms, making comparisons to the existing literature.

Type: Research Article
Information: Journal of Financial and Quantitative Analysis , Volume 58 , Issue 8 , December 2023 , pp. 3384 - 3419

DOI: https://doi.org/10.1017/S0022109022001028 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of the Michael G. Foster School of Business, University of Washington

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

Footnotes

We thank Jennifer Conrad (the editor) and Alberto Martín-Utrera (the referee) for their constructive comments. We are grateful to Panos Mavrokonstantis for excellent research assistance while he was a Senior Research Scientist at INSEAD. We also thank participants at the 13th Annual SoFiE Conference, the 3rd Future of Financial Information Conference, the inaugural Miami Herbert Winter Research Conference on ML and Business, the 2021 AFA PhD Poster Session, the 2020 European Winter Meetings of the Econometric Society, the 22nd INFER Annual Conference, the 9th Wharton-INSEAD Doctoral Consortium, and the INSEAD Accounting and Finance PhD seminar series, as well as Alex Chinco (discussant), Victor DeMiguel, Scott Murray (discussant), Joël Peress, Marcel Rindisbacher, Raman Uppal, Jinyuan Zhang, and Guofu Zhou for their helpful comments. A previous version of this article was circulated under the title “Modeling Heterogeneity in Firm-Level Return Predictability with Machine Learning.”

References

Ando, T., and Bai, J.. “Clustering Huge Number of Financial Time Series: A Panel Data Approach with High-Dimensional Predictors and Factor Structures.” Journal of the American Statistical Association, 112 (2017), 1182–1198.CrossRef Google Scholar

Asness, C. S.; Porter, R. B.; and Stevens, R. L.. “Predicting Stock Returns Using Industry-Relative Firm Characteristics.” Available at SSRN, 213872 (2000).CrossRef Google Scholar

Balasubramaniam, V.; Campbell, J. Y.; Ramadorai, T.; and Ranish, B.. “Who Owns What? A Factor Model for Direct Stockholding.” Journal of Finance, forthcoming (2023).CrossRef Google Scholar

Barrot, J.-N., and Sauvagnat, J.. “Input Specificity and the Propagation of Idiosyncratic Shocks in Production Networks.” Quarterly Journal of Economics, 131 (2016), 1543–1592.CrossRef Google Scholar

Belloni, A.; Chen, D.; Chernozhukov, V.; and Hansen, C.. “Sparse Models and Methods for Optimal Instruments with an Application to Eminent Domain.” Econometrica, 80 (2012), 2369–2429.Google Scholar

Belloni, A.; Chernozhukov, V.; and Hansen, C.. “Inference on Treatment Effects After Selection Among High-Dimensional Controls.” Review of Economic Studies, 81 (2014), 608–650.CrossRef Google Scholar

Bonhomme, S., and Manresa, E.. “Grouped Patterns of Heterogeneity in Panel Data.” Econometrica, 83 (2015), 1147–1184.CrossRef Google Scholar

Brown, G., and Kapadia, N.. “Firm-Specific Risk and Equity Market Development.” Journal of Financial Economics, 84 (2007), 358–388.CrossRef Google Scholar

Cameron, A. C., and Trivedi, P. K.. Microeconometrics: Methods and Applications. New York, NY: Cambridge University Press (2005).CrossRef Google Scholar

Campbell, J. Y., and Thompson, S. B.. “Predicting Excess Stock Returns Out of Sample: Can Anything Beat the Historical Average?” Review of Financial Studies, 21 (2008), 1509–1531.CrossRef Google Scholar

Carhart, M. M. “On Persistence in Mutual Fund Performance.” Journal of Finance, 52 (1997), 57–82.CrossRef Google Scholar

Chernick, M. R. Bootstrap Methods: A Guide for Practitioners and Researchers, 2nd ed. Hoboken, NJ: John Wiley & Sons (2007).CrossRef Google Scholar

Chernozhukov, V.; Chetverikov, D.; Demirer, M.; Duflo, E.; Hansen, C.; and Newey, W.. “Double/Debiased/Neyman Machine Learning of Treatment Effects.” American Economic Review Papers and Proceedings, 107 (2017), 261–265.CrossRef Google Scholar

Chetty, R.; Looney, A.; and Kroft, K.. “Salience and Taxation: Theory and Evidence.” American Economic Review, 99 (2009), 1145–1477.CrossRef Google Scholar

Cochrane, J. H. “Presidential Address: Discount Rates.” Journal of Finance, 66 (2011), 1047–1108.CrossRef Google Scholar

Cohen, L., and Frazzini, A.. “Economic Links and Predictable Returns.” Journal of Finance, 63 (2008), 1977–2011.CrossRef Google Scholar

Conrad, J., and Kaul, G.. “An Anatomy of Trading Strategies.” Review of Financial Studies, 11 (1998), 489–519.CrossRef Google Scholar

Daniel, K.; Mota, L.; Rottke, S.; and Santos, T.. “The Cross-Section of Risk and Returns.” Review of Financial Studies, 33 (2020), 1927–1979.CrossRef Google Scholar

DeMiguel, V.; Garlappi, L.; and Uppal, R.. “Optimal Versus Naive Diversification: How Inefficient is the 1/N Portfolio Strategy?” Review of Financial Studies, 22 (2009), 1915–1953.CrossRef Google Scholar

DeMiguel, V.; Martin-Utrera, A.; Nogales, F. J.; and Uppal, R.. “A Transaction-Cost Perspective on the Multitude of Firm Characteristics.” Review of Financial Studies, 33 (2020), 2180–2222.CrossRef Google Scholar

DeMiguel, V.; Nogales, F. J.; and Uppal, R.. “Stock Return Serial Dependence and Out-of-Sample Portfolio Performance.” Review of Financial Studies, 27 (2014), 1031–1073.CrossRef Google Scholar

Diebold, F., and Mariano, R.. “Comparing Predictive Accuracy.” Journal of Business and Economic Statistics, 13 (1995), 253–263.CrossRef Google Scholar

Diebold, F. X., and Shin, M.. “Machine Learning for Regularized Survey Forecast Combination: Partially-Egalitarian Lasso and Its Derivatives.” International Journal of Forecasting, 35 (2019), 1679–1691.CrossRef Google Scholar

Ding, C., and He, X., “K-Means Clustering via Principal Component Analysis.” In Proceedings of the Twenty-First International Conference on Machine Learning. New York, NY: Association for Computing Machinery (2004), 29.Google Scholar

Dorn, D., and Huberman, G.. “Preferred Risk Habitat of Individual Investors.” Journal of Financial Economics, 97 (2010), 155–173.CrossRef Google Scholar

Fama, E. F. “Market Efficiency, Long-Term Returns, and Behavioral Finance.” Journal of Financial Economics, 49 (1998), 283–306.CrossRef Google Scholar

Fama, E. F., and French, K. R.. “A Five-Factor Asset Pricing Model.” Journal of Financial Economics, 116 (2015), 1–22.CrossRef Google Scholar

Farmer, L.; Schmidt, L.; and Timmermann, A.. “Pockets of Predictability.” Available at SSRN, 3152386 (2019).CrossRef Google Scholar

Feng, G.; Giglio, S.; and Xiu, D.. “Taming the Factor Zoo: A Test of New Factors.” Journal of Finance, 75 (2020), 1327–1370.CrossRef Google Scholar

Fisher, J. D.; Puelz, D. W.; and Carvalho, C. M.. “Monotonic Effects of Characteristics on Returns.” Annals of Applied Statistics, 14 (2020), 1622–1650.CrossRef Google Scholar

Freyberger, J.; Neuhierl, A.; and Weber, M.. “Dissecting Characteristics Nonparametrically.” Review of Financial Studies, 33 (2020), 2326–2377.CrossRef Google Scholar

Fuster, A.; Goldsmith-Pinkham, P.; Ramadorai, T.; and Walther, A.. “Predictably Unequal? The Effects of Machine Learning on Credit Markets.” Journal of Finance, 77 (2022), 5–47.CrossRef Google Scholar

Gabaix, X. “A Sparsity-Based Model of Bounded Rationality.” Quarterly Journal of Economics, 129 (2014), 1661–1710.CrossRef Google Scholar

Gabaix, X. “Behavioral Inattention.” In Handbook of Behavioral Economics: Applications and Foundations, Vol. 2. Amsterdam, Netherlands: Elsevier (2019), 261–343.CrossRef Google Scholar

Gabaix, X. “A Behavioral New Keynesian Model.” American Economic Review, 110 (2020), 2271–2327.CrossRef Google Scholar

Giannone, D.; Lenza, M.; and Primiceri, G. E.. “Economic Predictions with Big Data: The Illusion of Sparsity.” Econometrica, 89 (2021), 2409–2437.CrossRef Google Scholar

Green, J.; Hand, J. R.; and Zhang, X. F.. “The Characteristics That Provide Independent Information About Average US Monthly Stock Returns.” Review of Financial Studies, 30 (2017), 4389–4436.CrossRef Google Scholar

Grishchenko, O. V., and Rossi, M.. “The Role of Heterogeneity in Asset Pricing: The Effect of a Clustering Approach.” Journal of Business & Economic Statistics, 30 (2012), 297–311.CrossRef Google Scholar

Gu, S.; Kelly, B.; and Xiu, D.. “Empirical Asset Pricing via Machine Learning.” Review of Financial Studies, 33 (2020), 2223–2273.CrossRef Google Scholar

Gu, S.; Kelly, B.; and Xiu, D.. “Autoencoder Asset Pricing Models.” Journal of Econometrics, 222 (2021), 429–450.CrossRef Google Scholar

Guecioueur, A. “How Do Investors Learn as Data Becomes Bigger? Evidence from a FinTech Platform.” Available at SSRN, 3708476 (2020).CrossRef Google Scholar

Han, Y.; He, A.; Rapach, D.; and Zhou, G.. “Expected Stock Returns and Firm Characteristics: E-LASSO, Assessment, and Implications.” Available at SSRN, 3185335 (2021).Google Scholar

Hanna, R.; Mullainathan, S.; and Schwartzstein, J.. “Learning Through Noticing: Theory and Evidence from a Field Experiment.” Quarterly Journal of Economics, 129 (2014), 1311–1353.CrossRef Google Scholar

Hastie, T.; Tibshirani, R.; and Friedman, J.. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer Science & Business Media (2009).CrossRef Google Scholar

Hoberg, G., and Phillips, G.. “Text-Based Network Industries and Endogenous Product Differentiation.” Journal of Political Economy, 124 (2016), 1423–1465.CrossRef Google Scholar

Hou, K. “Industry Information Diffusion and the Lead–Lag Effect in Stock Returns.” Review of Financial Studies, 20 (2007), 1113–1138.CrossRef Google Scholar

Hou, K., and Robinson, D. T.. “Industry Concentration and Average Stock Returns.” Journal of Finance, 61 (2006), 1927–1956.CrossRef Google Scholar

Huang, S.; O’Hara, M.; and Zhong, Z.. “Innovation and Informed Trading: Evidence from Industry ETFs.” Review of Financial Studies, 34 (2021), 1280–1316.CrossRef Google Scholar

Jiang, G.; Lee, C. M.; and Zhang, Y.. “Information Uncertainty and Expected Returns.” Review of Accounting Studies, 10 (2005), 185–221.CrossRef Google Scholar

Kapetanios, G. “A Bootstrap Procedure for Panel Data Sets with Many Cross-Sectional Units.” Econometrics Journal, 11 (2008), 377–395.CrossRef Google Scholar

Karolyi, G. A., and Van Nieuwerburgh, S.. “New Methods for the Cross-Section of Returns.” Review of Financial Studies, 33 (2020), 1879–1890.CrossRef Google Scholar

Kelly, B. T.; Pruitt, S.; and Su, Y.. “Characteristics are Covariances: A Unified Model of Risk and Return.” Journal of Financial Economics, 134 (2019), 501–524.CrossRef Google Scholar

Koijen, R. S. J., and Yogo, M.. “A Demand System Approach to Asset Pricing.” Journal of Political Economy, 127 (2019), 1475–1515.CrossRef Google Scholar

Lee, J. D.; Sun, D. L.; Sun, Y.; and Taylor, J. E.. “Exact Post-Selection Inference, with Application to the Lasso.” Annals of Statistics, 44 (2016), 907–927.CrossRef Google Scholar

Lewellen, J. “The Time-Series Relations Among Expected Return, Risk, and Book-to-Market.” Journal of Financial Economics, 54 (1999), 5–43.CrossRef Google Scholar

Lewellen, J. “The Cross-Section of Expected Stock Returns.” Critical Finance Review, 4 (2015), 1–44.CrossRef Google Scholar

Lien, D., and Vuong, Q. H.. “Selecting the Best Linear Regression Model: A Classical Approach.” Working Paper No. 606, California Institute of Technology Social Science (1986).Google Scholar

Lustig, H.; Van Nieuwerburgh, S.; and Verdelhan, A.. “The Wealth-Consumption Ratio.” Review of Asset Pricing Studies, 3 (2013), 38–94.CrossRef Google Scholar

Menzly, L., and Ozbas, O.. “Market Segmentation and Cross-Predictability of Returns.” Journal of Finance, 65 (2010), 1555–1580.CrossRef Google Scholar

Menzly, L.; Santos, T.; and Veronesi, P.. “Understanding Predictability.” Journal of Political Economy, 112 (2004), 1–47.CrossRef Google Scholar

Merton, R. C. “An Intertemporal Capital Asset Pricing Model.” Econometrica: Journal of the Econometric Society, 41 (1973), 867–887.CrossRef Google Scholar

Moskowitz, T. J., and Grinblatt, M.. “Do Industries Explain Momentum?” Journal of Finance, 54 (1999), 1249–1290.CrossRef Google Scholar

Nagel, S. Machine Learning in Asset Pricing. Princeton, NJ: Princeton University Press (2021).Google Scholar

Novy-Marx, R., “Testing Strategies Based on Multiple Signals.” Working Paper, University of Rochester (2016).CrossRef Google Scholar

Novy-Marx, R., and Velikov, M.. “A Taxonomy of Anomalies and Their Trading Costs.” Review of Financial Studies, 29 (2016), 104–147.CrossRef Google Scholar

Patton, A. J., and Weller, B.. “Risk Price Variation: The Missing Half of Empirical Asset Pricing.” Review of Financial Studies, 35 (2022), 5127–5184.CrossRef Google Scholar

Peng, L., and Xiong, W.. “Investor Attention, Overconfidence and Category Learning.” Journal of Financial Economics, 80 (2006), 563–602.CrossRef Google Scholar

Rapach, D., and Zhou, G.. “Forecasting Stock Returns.” In Handbook of Economic Forecasting, Vol. 2. Amsterdam, Netherlands: Elsevier (2013), 328–383.Google Scholar

Rapach, D. E.; Strauss, J. K.; Tu, J.; and Zhou, G.. “Industry Return Predictability: A Machine Learning Approach.” Journal of Financial Data Science, 1 (2019), 9–28.CrossRef Google Scholar

Rapach, D. E.; Strauss, J. K.; and Zhou, G.. “Out-of-Sample Equity Premium Prediction: Combination Forecasts and Links to the Real Economy.” Review of Financial Studies, 23 (2010), 821–862.CrossRef Google Scholar

Rapach, D. E., and Zhou, G.. “Chapter 1: Time-Series and Cross-Sectional Stock Return Forecasting: New Machine Learning Methods.” In Machine Learning for Asset Management. Hoboken, NJ: John Wiley & Sons (2020), 1–33.Google Scholar

Reis, R. “Inattentive Consumers.” Journal of Monetary Economics, 53 (2006), 1761–1800.CrossRef Google Scholar

Ross, S. A. “The Arbitrage Theory of Capital Asset Pricing.” Journal of Economic Theory, 13 (1976), 341–360.CrossRef Google Scholar

Ross, S. A. Neoclassical Finance. Princeton, NJ: Princeton University Press (2005).CrossRef Google Scholar

Rousseeuw, P. J. “Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis.” Journal of Computational and Applied Mathematics, 20 (1987), 53–65.CrossRef Google Scholar

Santos, T., and Veronesi, P.. “Labor Income and Predictable Stock Returns.” Review of Financial Studies, 19 (2006), 1–44.CrossRef Google Scholar

Sims, C. A. “Implications of Rational Inattention.” Journal of Monetary Economics, 50 (2003), 665–690.CrossRef Google Scholar

Tibshirani, R. J.; Taylor, J.; Lockhart, R.; and Tibshirani, R.. “Exact Post-Selection Inference for Sequential Regression Procedures.” Journal of the American Statistical Association, 111 (2016), 600–620.CrossRef Google Scholar

Timmermann, A. “Forecasting Methods in Finance.” Annual Review of Financial Economics, 10 (2018), 449–479.CrossRef Google Scholar

Welch, I., and Goyal, A.. “A Comprehensive Look at the Empirical Performance of Equity Premium Prediction.” Review of Financial Studies, 21 (2007), 1455–1508.CrossRef Google Scholar

Zou, H., and Hastie, T.. “Regularization and Variable Selection via the Elastic Net.” Journal of the Royal Statistical Society: Series B (Methodological), 67 (2005), 301–320.CrossRef Google Scholar

Evgeniou et al. supplementary material

PDF 454.6 KB

Article contents

Uncovering Sparsity and Heterogeneity in Firm-Level Return Predictability Using Machine Learning

Abstract

Access options

Article purchase

Temporarily unavailable

Footnotes

References

Evgeniou et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests