Neural embeddings: accurate and readable inferences based on semantic kernels

Danilo Croce; Daniele Rossini; Roberto Basili

doi:10.1017/S1351324919000238

Neural embeddings: accurate and readable inferences based on semantic kernels

Published online by Cambridge University Press: 31 July 2019

Danilo Croce ,

Daniele Rossini and

Roberto Basili

Show author details

Danilo Croce*: Affiliation:
Department of Enterprise Engineering, University of Roma, Tor Vergata, Rome, Italy
Daniele Rossini: Affiliation:
Department of Enterprise Engineering, University of Roma, Tor Vergata, Rome, Italy
Roberto Basili: Affiliation:
Department of Enterprise Engineering, University of Roma, Tor Vergata, Rome, Italy
*: *Corresponding author. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Sentence embeddings are the suitable input vectors for the neural learning of a number of inferences about content and meaning. Similarity estimation, classification, emotional characterization of sentences as well as pragmatic tasks, such as question answering or dialogue, have largely demonstrated the effectiveness of vector embeddings to model semantics. Unfortunately, most of the above decisions are epistemologically opaque as for the limited interpretability of the acquired neural models based on the involved embeddings. We think that any effective approach to meaning representation should be at least epistemologically coherent. In this paper, we concentrate on the readability of neural models, as a core property of any embedding technique consistent and effective in representing sentence meaning. In this perspective, this paper discusses a novel embedding technique (the Nyström methodology) that corresponds to the reconstruction of a sentence in a kernel space, inspired by rich semantic similarity metrics (a semantic kernel) rather than by a language model. In addition to being based on a kernel that captures grammatical and lexical semantic information, the proposed embedding can be used as the input vector of an effective neural learning architecture, called Kernel-based deep architectures (KDA). Finally, it also characterizes by design the KDA explanatory capability, as the proposed embedding is derived from examples that are both human readable and labeled. This property is obtained by the integration of KDAs with an explanation methodology, called layer-wise relevance propagation (LRP), already proposed in image processing. The Nyström embeddings support here the automatic compilation of argumentations in favor or against a KDA inference, in form of an explanation: each decision can in fact be linked through LRP back to the real examples, that is, the landmarks linguistically related to the input instance. The KDA network output is explained via the analogy with the activated landmarks. Quantitative evaluation of the explanations shows that richer explanations based on semantic and syntagmatic structures characterize convincing arguments, as they effectively help the user in assessing whether or not to trust the machine decisions in different tasks, for example, Question Classification or Semantic Role Labeling. This confirms the epistemological benefit that Nyström embeddings may bring, as linguistically rich and meaningful representations for a variety of inference tasks.

Keywords

readable inference semantic kernels neural embeddings of sentences

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 4 , July 2019 , pp. 519 - 541

DOI: https://doi.org/10.1017/S1351324919000238 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Annesi, P., Croce, D. and Basili, R. (2014). Semantic compositionality in tree kernels. CIKM. ACM.Google Scholar

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.-R., Samek, W. and Suárez, Ó.D. (2015). On pixel-wise explanations for nonlinear classifier decisions by layer-wise relevance propagation. PloS One 10, 1–46.CrossRef Google Scholar PubMed

Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K. and Müller, K.-R. (2010). How to explain individual classification decisions. Journal of Machine Learning Research 11, 1803–1831.Google Scholar

Bastianelli, E., Castellucci, G., Croce, D., Iocchi, L., Basili, R. and Nardi, D. (2014). Huric: a human robot interaction corpus. LREC. ELRA.Google Scholar

Bastianelli, E., Croce, D., Vanzo, A., Basili, R. and Nardi, D. (2016). A discriminative approach to grounded spoken language understanding in interactive robotics. IJCAI.Google Scholar

Bengio, Y., Courville, A. and Vincent, P. (2013). Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8), 1798–1828.CrossRef Google Scholar PubMed

Cancedda, N., Gaussier, É., Goutte, C., and Renders, J.-M. (2003). Word-sequence kernels. Journal of Machine Learning Research 3, 1059–1082.Google Scholar

Chakraborty, S., Tomsett, R., Raghavendra, R., Harborne, D., Alzantot, M., Cerutti, F., Srivastava, M.B., Preece, A.D., Julier, S.J., Rao, R.M., Kelley, T.D., Braines, D., Sensoy, M., Willis, C.J. and Gurram, P. (2017). Interpretability of deep learning models: A survey of results. SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI.CrossRef Google Scholar

Chang, C.-C. and Lin, C.-J. (2011). Libsvm: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2(3), 27:1–27:27.CrossRef Google Scholar

Collins, M. and Duffy, N. (2001). Convolution kernels for natural language. NIPS 625–632.Google Scholar

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Artificial Intelligence Research 12, 2493–2537.Google Scholar

Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine Learning 20(3), 273–297.CrossRef Google Scholar

Croce, D., Filice, S., Castellucci, G. and Basili, R. (2017). Deep learning in semantic kernel spaces. ACL.CrossRef Google Scholar

Croce, D., Moschitti, A. and Basili, R. (2011). Structured lexical similarity via convolution kernels on dependency trees. EMNLP.Google Scholar

Devlin, J., Chang, M.-W., Lee, K. and Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805.Google Scholar

Drineas, P. and Mahoney, M.W. (2005). On the nyström method for approximating a gram matrix for improved kernel-based learning. Journal of Machine Learning Research 6, 2153–2175.Google Scholar

Erhan, D., Courville, A. and Bengio, Y. (2010). Understanding representations learned in deep architectures. Technical Report 1355, Montreal, QC, Canada: Université de Montréal/DIRO.Google Scholar

Faruqui, M., Tsvetkov, Y., Yogatama, D., Dyer, C. and Smith, N.A. (2015). Sparse overcomplete word vector representations. ACL-IJCNLP.CrossRef Google Scholar

Filice, S., Castellucci, G., Croce, D. and Basili, R. (2015). Kelp: a kernel-based learning platform for natural language processing. ACL System Demonstrations. 1, 19–24.Google Scholar

Filice, S., Castellucci, G., Martino, G.D.S., Moschitti, A., Croce, D., and Basili, R. (2018). Kelp: a kernel-based learning platform. Journal of Machine Learning Research 18(191), 1–5.Google Scholar

Fillmore, C.J. (1985). Frames and the semantics of understanding. Quaderni di Semantica 6(2).Google Scholar

Frosst, N. and Hinton, G. (2017). Distilling a neural network into a soft decision. Proceedings of the First International Workshop on Comprehensibility and Explanation in AI and ML 2017 co-located with 16th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2017), Bari, Italy, November 16th and 17th, 2017.Google Scholar

Goldberg, Y. (2016). A primer on neural network models for natural language processing. Journal of Artificial Intelligence Research 57, 56–65.CrossRef Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural Computation 9(8), 1735–1780.CrossRef Google Scholar PubMed

Hsieh, C.-J., Chang, K.-W., Lin, C.-J., Keerthi, S.S. and Sundararajan, S. (2008). A dual coordinate descent method for large-scale linear svm. ICML. ACM.Google Scholar

Jacovi, A., Sar Shalom, O. and Goldberg, Y. (2018). Understanding convolutional neural networks for text classification. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. ACL.Google Scholar

Kim, Y. (2014). Convolutional neural networks for sentence classification. EMNLP.CrossRef Google Scholar

Kononenko, I. and Bratko, I. (1991). Information-based evaluation criterion for classifier’s performance. Machine Learning 6(1), 67–80.CrossRef Google Scholar

Lei, T., Barzilay, R. and Jaakkola, T. (2016). Rationalizing neural predictions. EMNLP. ACL.Google Scholar

Li, X. and Roth, D. (2006). Learning question classifiers: the role of semantic information. Natural Language Engineering 12(3), 229–249.CrossRef Google Scholar

Lipton, Z.C. (2018). The mythos of model interpretability. Queue 16(3), 30:31–30:57.Google Scholar

Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J. and McClosky, D. (2014). The Stanford CoreNLP natural language processing toolkit. In Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Baltimore, Maryland. pp. 55–60.Google Scholar

Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR abs/1301.3781.Google Scholar

Mitchell, J. and Lapata, M. (2010). Composition in distributional models of semantics. Cognitive Science 34(8), 161–199.CrossRef Google Scholar PubMed

Moschitti, A. (2006). Efficient convolution kernels for dependency and constituent syntactic trees. ECML.CrossRef Google Scholar

Moschitti, A. (2012). State-of-the-art kernels for natural language processing. ACL (Tutorial Abstracts). Association for Computational Linguistics, p. 2.Google Scholar

Moschitti, A., Pighin, D. and Basili, R. (2008). Tree kernels for semantic role labeling. Computational Linguistics 34, 193–224.CrossRef Google Scholar

Padó, S. and Lapata, M. (2007). Dependency-based construction of semantic space models. Computational Linguistics 33(2), 161–199.CrossRef Google Scholar

Palmer, M., Gildea, D. and Xue, N. (2010). Semantic Role Labeling. IEEE Morgan & Claypool Synthesis eBooks Library. San Rafael, CA, USA: Morgan & Claypool Publishers.Google Scholar

Pennington, J., Socher, R. and Manning, C.D. (2014). Glove: Global vectors for word representation. EMNLP.CrossRef Google Scholar

Peters, M.E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. and Zettlemoyer, L. (2018). Deep contextualized word representations. NAACL.CrossRef Google Scholar

Ribeiro, M.T., Singh, S. and Guestrin, C. (2016). “Why should I trust you?”: Explaining the predictions of any classifier. CoRR abs/1602.04938.CrossRef Google Scholar

Robert Müller, K., Mika, S., Rätsch, G., Tsuda, K. and Schölkopf, B. (2001). An introduction to kernel-based learning algorithms. IEEE Transactions on Neural Networks 12(2), 181–201.CrossRef Google Scholar

Sahlgren, M. (2006). The Word-Space Model. PhD Thesis, Stockholm University.Google Scholar

Schütze, H. (1993). Word space. Advances in Neural Information Processing Systems, Vol. 5. Burlington, MA, USA: Morgan-Kaufmann.Google Scholar

Shawe-Taylor, J. and Cristianini, N. (2004). Kernel Methods for Pattern Analysis. New York, NY, USA: Cambridge University Press.CrossRef Google Scholar

Simonyan, K., Vedaldi, A. and Zisserman, A. (2013). Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR abs/1312.6034.Google Scholar

Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C.D., Ng, A. and Potts, C. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP.Google Scholar

Spinks, G. and Moens, M.-F. (2018). Evaluating textual representations through image generation. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP. ACL.Google Scholar

Strubell, E., Verga, P. andor, D., Weiss, D. and McCallum, A. (2018). Linguistically-informed self-attention for semantic role labeling. EMNLP.CrossRef Google Scholar

Subramanian, A., Pruthi, D., Jhamtani, H., Berg-Kirkpatrick, T. and Hovy, E.H. (2018). Spine: Sparse interpretable neural embeddings. AAAI.Google Scholar

Tai, K.S., Socher, R. and Manning, C.D. (2015). Improved semantic representations from tree-structured long short-term memory networks. ACL-IJCNLP.Google Scholar

Trifonov, V., Ganea, O.-E., Potapenko, A. and Hofmann, T. (2018). Learning and evaluating sparse interpretable sentence embeddings. Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP.Google Scholar

Vapnik, V.N. (1998). Statistical Learning Theory. New York, NY, USA: Wiley-Interscience.Google Scholar

Walton, D., Reed, C. and Macagno, F. (2008). Argumentation Schemes. Cambridge, England, UK: Cambridge University Press.CrossRef Google Scholar

Williams, C.K.I. and Seeger, M. (2001). Using the Nyström method to speed up kernel machines. NIPS.Google Scholar

Zeiler, M.D. and Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR abs/1311.2901.Google Scholar

Zhang, R., Lee, H. and Radev, D.R. (2016). Dependency sensitive convolutional neural networks for modeling sentences and documents. NAACL-HLT.CrossRef Google Scholar

Zhou, C., Sun, C., Liu, Z. and Lau, F.C.M. (2015). A C-LSTM neural network for text classification. CoRR abs/1511.08630.Google Scholar

Article contents

Neural embeddings: accurate and readable inferences based on semantic kernels

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests