Understanding Deep Learning with Statistical Relevance

Tim Räz

doi:10.1017/psa.2021.12

Understanding Deep Learning with Statistical Relevance

Published online by Cambridge University Press: 31 January 2022

Tim Räz

Show author details

Tim Räz*: Affiliation:
University of Bern, Institute of Philosophy, Bern, Switzerland
*: E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This paper argues that a notion of statistical explanation, based on Salmon’s statistical relevance model, can help us better understand deep neural networks. It is proved that homogeneous partitions, the core notion of Salmon’s model, are equivalent to minimal sufficient statistics, an important notion from statistical inference. This establishes a link to deep neural networks via the so-called Information Bottleneck method, an information-theoretic framework, according to which deep neural networks implicitly solve an optimization problem that generalizes minimal sufficient statistics. The resulting notion of statistical explanation is general, mathematical, and subcausal.

Type: Article
Information: Philosophy of Science , Volume 89 , Issue 1 , January 2022 , pp. 20 - 41

DOI: https://doi.org/10.1017/psa.2021.12 [Opens in a new window]
Copyright: © The Author(s), 2022. Published by Cambridge University Press on behalf of the Philosophy of Science Association

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Achille, Alessandro, and Soatto, Stefano. 2018. “Information dropout: Learning optimal representations through noisy computation.” IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI).Google Scholar

Alemi, Alexander A., Fischer, Ian, Dillon, Joshua V., and Murphy, Kevin. 2017. “Deep variational information bottleneck.” arXiv:1612.00410v5.Google Scholar

Baumberger, Christoph, Beisbart, Claus, and Brun, Georg. 2017. “What is understanding? An overview of recent debates in epistemology and philosophy of science.” In Explaining Understanding: New Perspectives from Epistemolgy and Philosophy of Science, edited by Stephen Grimm Christoph Baumberger and Sabine Ammon, 1–34. New York: Routledge.Google Scholar

Casella, George, and Berger, Roger L.. 2002. Statistical Inference. 2nd ed. Duxbury.Google Scholar

Cover, Thomas M., and Thomas, Joy A.. 2006. Elements of Information Theory. 2nd ed. Hoboken, NJ: Wiley.Google Scholar

Goodfellow, Ian, Bengio, Yoshua, and Courville, Aaron. 2016. Deep Learning. Cambridge, MA: MIT Press.Google Scholar

Greeno, James G. 1970. “Evaluation of statistical hypotheses using information transmitted.” Philosophy of Science 37 (2):279–94.CrossRef Google Scholar

Hastie, Trevor, Tibshirani, Roberto, and Friedman, Jerome. 2009. The Elements of Statistical Learning. 2nd ed. Springer Series in Statistics. Springer.CrossRef Google Scholar

Kitcher, Philip. 1989. “Explanatory unification and the causal structure of the world.” In Scientific Explanation, Volume XIII of Minnesota Studies in the Philosophy of Science, edited by Philip Kitcher and Wesley C. Salmon, 410–505. Minneapolis: University of Minnesota Press.Google Scholar

Kitcher, Kitcher, and Salmon, Wesley C., eds. 1989. Scientific Explanation, Volume XIII of Minnesota Studies in the Philosophy of Science. Minneapolis: University of Minnesota Press.Google Scholar

Krishnan, Maya. 2016. “Against interpretability: a critical examination of the interpretability problem in machine learning.” Philosophy & Technology 33:487–502.CrossRef Google Scholar

Lange, Marc. 2016. Because Without Cause: Non-Causal Explanations in Science and Mathematics. Oxford: Oxford University Press.CrossRef Google Scholar

LeCun, Yann, Bengio, Yoshua, and Hinton, Geoffrey. 2015. “Deep learning.” Nature 521:436–44.CrossRef Google Scholar

Lehmann, E. L., and Casella, George 1998. Theory of Point Estimation. 2nd ed. Springer Texts in Statistics. New York, Berlin, Heidelberg: Springer.Google Scholar

Lipton, Zachary C. 2016. “The mythos of model interpretability.” arXiv:1606.03490.Google Scholar

Mancosu, Paolo. 2018. “Explanation in mathematics.” In The Stanford Encyclopedia of Philosophy, edited by E. N. Zalta. Metaphysics Research Lab, Stanford University.Google Scholar

Nielsen, Michael A. 2015. Neural Networks and Deep Learning. Determination Press.Google Scholar

Pedregosa, Fabian, Varoquaux, Gaël, Gramfort, Alexandre, Michel, Vincent, Thirion, Bertrand, Grisel, Olivier, Blondel, Mathieu, et al. 2011. “Scikit-learn: Machine learning in Python.” Journal of Machine Learning Research 12:2825–30.Google Scholar

Pincock, Christopher. 2015. “Abstract explanations in science.” British Journal for the Philosophy of Science 66 (4):857–82.Google Scholar

Räz, Tim. 2017. “The Volterra principle generalized.” Philosophy of Science 84 (4):737–60.CrossRef Google Scholar

Räz, Tim. 2018. “Euler’s Königsberg: the explanatory power of mathematics.” European Journal for Philosophy of Science 8:331–46.CrossRef Google Scholar

Reutlinger, Alexander, and Saatsi, Juha, eds. 2018. Explanation Beyond Causation: Philosophical Perspectives on Non-Causal Explanations. Oxford: Oxford University Press.CrossRef Google Scholar

Salmon, W. C. 1971a. “Statistical Explanation.” In Statistical Explanation and Statistical Relevance, edited by Wesley C. Salmon, 29–87. Pittsburgh: Pittsburgh University Press.CrossRef Google Scholar

Salmon, Wesley C., ed. 1971b. Statistical Explanation and Statistical Relevance. Pittsburgh: Pittsburgh University Press.CrossRef Google Scholar

Salmon, Wesley C. 1984. Scientific Explanation and the Causal Structure of the World. Princeton: Princeton University Press.Google Scholar

Saxe, Andrew M., Bansal, Yamini, Dapello, Joel, Advani, Madhu, Kolchinsky, Artemy, Tracey, Brendan D., and Cox., David D. 2018. On the information bottleneck theory of deep learning. ICLR.Google Scholar

Schwartz-Ziv, Ravid, and Tishby, Naftali. 2017. “Opening the black box of deep neural networks via information.” arXiv:1703.00810.Google Scholar

Shamir, Ohad, Sabato, Sivan, and Tishby, Naftali. 2011. “Learning and generalization with the information bottleneck.” Theoretical Computer Science 411:2696–2711.CrossRef Google Scholar

Spirtes, Peter, Glymour, Clark, and Scheines, Richard. 2000. Causation, Prediction and Search. Cambridge, MA:MIT Press.Google Scholar

Tishby, Naftali, Pereira, Fernando C., and Bialek, William. 1999. “The information bottleneck method.” In Proc. of the 37th Allerton Conference on Communication, Control and Computing, Allerton House, Monticello, Illinois, September 22-24, 1999.Google Scholar

Vidal, René, Bruna, Joan, Giryes, Raja, and Soatto, Stefano. 2017. “Mathematics of deep learning.” arXiv:1712.04741.Google Scholar

Woodward, James. 1987. “On an information-theoretic model of explanation.” Philosophy of Science 54 (1):21–44.CrossRef Google Scholar

Woodward, James. 2019. “Scientific explanation.” In The Stanford Encyclopedia of Philosophy, edited by E. N. Zalta. Metaphysics Research Lab, Stanford University.Google Scholar

Zhang, Chiyuan, Bengio, Samy, Hardt, Moritz, Recht, Benjamin, and Vinyals, Oriol. 2017. “Understanding deep learning requires rethinking generalization.” arXiv:1611.03530.Google Scholar

Article contents

Understanding Deep Learning with Statistical Relevance

Abstract

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests