Effectiveness of data-driven induction of semantic spaces and traditional classifiers for sarcasm detection

Mattia Antonino Di Gangi; Giosué Lo Bosco; Giovanni Pilato

doi:10.1017/S1351324919000019

Effectiveness of data-driven induction of semantic spaces and traditional classifiers for sarcasm detection

Published online by Cambridge University Press: 01 April 2019

Mattia Antonino Di Gangi ,

Giosué Lo Bosco

and

Giovanni Pilato

Show author details

Mattia Antonino Di Gangi: Affiliation:
Fondazione Bruno Kessler and Università degli Studi di Trento, Trento, Italy
Giosué Lo Bosco*: Affiliation:
Dipartimento di Matematica e Informatica, Università degli Studi di Palermo, Palermo, Italy
Giovanni Pilato: Affiliation:
Italian National Research Council, ICAR-CNR, Istituto di Calcolo e Reti ad alte prestazioni, Palermo, Italy
*: *Corresponding author. Email: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Irony and sarcasm are two complex linguistic phenomena that are widely used in everyday language and especially over the social media, but they represent two serious issues for automated text understanding. Many labeled corpora have been extracted from several sources to accomplish this task, and it seems that sarcasm is conveyed in different ways for different domains. Nonetheless, very little work has been done for comparing different methods among the available corpora. Furthermore, usually, each author collects and uses their own datasets to evaluate his own method. In this paper, we show that sarcasm detection can be tackled by applying classical machine-learning algorithms to input texts sub-symbolically represented in a Latent Semantic space. The main consequence is that our studies establish both reference datasets and baselines for the sarcasm detection problem that could serve the scientific community to test newly proposed methods.

Keywords

natural language processing semantic spaces machine learning sarcasm detection irony detection

Type: Article
Information: Natural Language Engineering , Volume 25 , Issue 2 , March 2019 , pp. 257 - 285

DOI: https://doi.org/10.1017/S1351324919000019 [Opens in a new window]
Copyright: © Cambridge University Press 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Abbott, R., Ecker, B., Anand, P. and Walker, M. (2016). Internet Argument Corpus 2.0: An SQL schema for dialogic social media and the corpora to go with it. In Language Resources and Evaluation Conference (LREC), 2016.Google Scholar

Altszyler, E., Sigman, M. and Slezak, D.F. (2016). Comparative study of LSA vs Word2vec embeddings in small corpora: a case study in dreams database. arXiv preprint: 1610.01520.Google Scholar

Amir, S., Wallace, B.C., Lyu, H., Carvalho, P. and Silva, M.J. (2016). Modelling context with user embeddings for sarcasm detection in social media. arXiv preprint: 1607.00976.Google Scholar

Astudillo, R., Amir, S., Ling, W., Silva, M. and Trancoso, I. (2015). Learning word representations from scarce and noisy data with embedding subspaces. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, Vol. 1, Long Papers, pp. 1074–1084CrossRef Google Scholar

Attardo, S. (2007). Irony as Relevant Inappropriateness: Irony in Language and Thought. New York: Routledge, pp. 135–174.Google Scholar

Attardo, S. (2010). Irony: Concise Encyclopedia of Philosophy of Language and Linguistics. Oxford, UK: Elsevier, pp. 341–343.Google Scholar

Bamman, D. and Smith, N.A. (2015). Contextualized sarcasm detection on twitter. In Ninth International AAAI Conference on Web and Social Media.Google Scholar

Baziotis, C., Athanasiou, N., Papalampidi, P., Kolovou, A., Paraskevopoulos, G., Ellinas, N. and Potamianos, A. (2018). NTUA-SLP at SemEval-2018 Task 3: Tracking ironic tweets using ensembles of word and character level Attentive RNNs. In 12th International Workshop on Semantic Evaluation, pp. 613–621.Google Scholar

Bellegarda, J.R. (1998). A multispan language modelling framework for large vocabulary speech recognition. IEEE Transactions on Speech and Audio processing 6(5), 456–457.Google Scholar

Bellegarda, J.R. (1998). Exploiting both local and global constraints for multi-span statistical language modeling. ICASSP 2, 677–680.Google Scholar

Bharti, S.K., Babu, K.S. and Jena, S.K. (2015). Parsing-based sarcasm sentiment recognition in Twitter data. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2015, pp. 1373–80.CrossRef Google Scholar

Breiman, L. (2011). Random forests. Machine Learning 45(1), 5–32.CrossRef Google Scholar

Breiman, L., Friedman, J., Stone, C.J. and Olshen, R.A. (1984). Classification and Regression Trees. Boca Raton: CRC Press.Google Scholar

Buschmeier, K., Cimiano, P. and Klinger, R. (2014). An impact analysis of features in a classification approach to irony detection in product reviews. In 5th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 42–9.Google Scholar

Chen, T. and Guestrin, C. (2016). XGBoost: A scalable tree boosting System. In 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2016, pp. 785–794.CrossRef Google Scholar

Chiavetta, F., Lo Bosco, G. and Pilato, G. (2016). A lexicon-based approach for sentiment classification of Amazon books reviews in Italian language. In 12th International Conference on Web Information Systems and Technologies (WEBIST), 2016, Vol. 2, pp. 159–70.CrossRef Google Scholar

Chiavetta, F., Lo Bosco, G. and Pilato, G. (2017). A layered architecture for sentiment classification of products reviews in Italian language. Lecture Notes in Business Information Processing 292, 120–141.CrossRef Google Scholar

Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P. (2011). Natural language processing (almost) from scratch. Journal of Machine Learning Research 12, 2493–2537.Google Scholar

Cristianini, N. and Shawe-Taylor, J. (2000). An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge: Cambridge University Press.CrossRef Google Scholar

Crossley, S.A., Dascalu, M. and McNamarac, D.S. (2017). How important is size? An Investigation of corpus size and meaning in both latent semantic analysis and latent Dirichlet allocation. In 30th International Florida Artificial Intelligence Research Society Conference (FLAIRS), 2017, AAAI Press, pp. 293–296.Google Scholar

Davidov, D., Tsur, O. and Rappoport, A. (2010). Semi-supervised recognition of sarcastic sentences in twitter and amazon. In 14th Conference on Computational Natural Language Learning. Association for Computational Linguistics, pp. 107–116.Google Scholar

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K. and Harshman, R. (1990). Indexing by latent semantic analysis. Journal of the American society for information science 41(6), 391.3.0.CO;2-9>CrossRef Google Scholar

Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S. and Harshman, R. (May, 1988). Using latent semantic analysis to improve access to textual information. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, pp. 281–285.Google Scholar

Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R. and Lin, C.J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research 9(Aug), 1871–1874.Google Scholar

Filatova, E. (2012). Irony and sarcasm: Corpus generation and analysis using crowdsourcing. In 8th International Conference on Language Resources and Evaluation (LREC), 2012, pp. 392–398.Google Scholar

Foltz, P.W. and Dumais, S.T. (1992). An analysis of information filtering methods. Communications of the ACM 35(12), 51–60.CrossRef Google Scholar

Ghosh, D., Guo, W. and Muresan, S. (2015). Sarcastic or not: Word embeddings to predict the literal or sarcastic meaning of words. In 2015 Conference on Empirical Methods in Natural Language Processing, pp. 1003–1012.CrossRef Google Scholar

Ghosh, A. and Veale, T. (2015). Fracking sarcasm using neural network. In 7th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp. 161–169.Google Scholar

Golub, G.H. and Van Loan, C.F. (1996). Matrix Computations. Baltimore, MD: Johns Hopkins University.Google Scholar

González, J.Á., Hurtado, L.F. and Pla, F. (2018). ELiRF-UPV at SemEval-2018 Tasks 1 and 3: Affect and irony detection in tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 565–569.Google Scholar

González-Ibánez, R., Muresan, S. and Wacholder, N. (2011). Identifying sarcasm in twitter: a closer look. In 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. Short papers, Vol. 2. Association for Computational Linguistics, pp. 581–586.Google Scholar

Harris, Z. (1954). Distributional structure. WORD 10(23), 146–162.CrossRef Google Scholar

Irazù Hernàndez Farías, D., Patti, V. and Rosso, P. (2016). Irony detection in Twitter: The role of affective content. ACM Transactions on Internet Technology 16(3), 1–24.CrossRef Google Scholar

Joshi, A., Sharma, V. and Bhattacharyya, P. (2015). Harnessing context incongruity for sarcasm detection. In 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing, pp. 757–762.CrossRef Google Scholar

Joshi, A., Tripathi, V., Patel, K., Bhattacharyya, P. and Carman, M. (2016). Are word embedding-based features useful for sarcasm detection? In 54th Annual Meeting of the Association for Computational Linguistics.Google Scholar

Joshi, A., Bhattacharyya, P. and Carman, M. (2017). Automatic sarcasm detection: A survey. ACM Computing Surveys 50(5), 73.CrossRef Google Scholar

Joshi, A., Bhattacharyya, P. and Carman, M. (2018). Investigations in Computational Sarcasm. (Cognitive Systems Monographs 37). Singapore: Springer.CrossRef Google Scholar

Justo, R., Corcoran, T., Lukin, S. M., Walker, M. and Torres, M.I. (2014). Extracting relevant knowledge for the detection of sarcasm and nastiness in the social web. Knowledge-Based Systems 69, 124–133.CrossRef Google Scholar

Koeman, J. and Rea, W. (2014). How does latent semantic analysis work? A visualisation approach. arXiv 1402.0543.Google Scholar

Kotsiantis, S.B. (2013). Decision trees: a recent overview. Artificial Intelligence Review 39(4), 261–283.CrossRef Google Scholar

Kreuz, R. and Glucksberg, S. (1989). How to be sarcastic: The echoic reminder theory of verbal irony. Journal of Experimental Psychology General 118(4), 374–386.CrossRef Google Scholar

Landauer, T.K. (1999). Latent semantic analysis: A theory of the psychology of language and mind. Discourse Processes 27(3), 303–310.CrossRef Google Scholar

Landauer, T.K., Foltz, P.W. and Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes 25(2–3), 259–284.CrossRef Google Scholar

Lin, C.J., Weng, R.C. and Keerthi, S.S. (2008). Trust region newton methods for large-scale logistic regression. In Proceedings of the 24th International Conference on Machine Learning. ACM, 2007.Google Scholar

Ling, J. and Klinger, R. (2016). An empirical, quantitative analysis of the differences between sarcasm and irony. In European Semantic Web Conference, 2016, pp. 203–216.CrossRef Google Scholar

Lukin, S. and Walker, M. (2013). Really? well. apparently bootstrapping improves the performance of sarcasm and nastiness classifiers for online dialogue, Proceedings of the Workshop on Language Analysis in Social Media, pp. 30–40.Google Scholar

Maynard, D. and Greenwood, M.A. (2014). Who cares about sarcastic tweets? investigating the impact of sarcasm on sentiment analysis. In 9th International Conference on Language Resources and Evaluation (LREC), 2014, pp. 4238–4243.Google Scholar

McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models, 2nd Edn. London: Chapman & Hall.CrossRef Google Scholar

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S. and Dean, J. (2013). Distributed representations of words and phrases and their compositionality. In 27th Annual Conference on Neural Information Processing Systems (NIPS), 2013, pp. 3111–3119.Google Scholar

Oraby, S., Harrison, V., Reed, L., Hernandez, E., Riloff, E. and Walker, M. (2016). Creating and characterizing a diverse corpus of sarcasm in dialogue. In 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pp. 31–41.CrossRef Google Scholar

Pamungkas, E.W. and Patti, V. (2018). #NonDicevoSulSerio at SemEval-2018 Task 3: Exploiting emojis and affective content for irony detection in English tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 649–54.Google Scholar

Peled, L. and Reichart, R. (2017). Sarcasm SIGN: Interpreting sarcasm with sentiment based monolingual machine translation. In 55th Annual Meeting of the Association for Computational Linguistics, Vol. 1, pp. 1690–1700.CrossRef Google Scholar

Pennington, J., Socher, R. and Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543CrossRef Google Scholar

Picard, R. (1995). Affective computing. M.I.T. Media Laboratory Perceptual Computing Section. Technical report. 321. Cambridge, MA: MIT Press.Google Scholar

Pilato, G. and Vassallo, G. (2015). TSVD as a statistical estimator in the latent semantic analysis paradigm. IEEE Transactions on Emerging Topics in Computing 3(2), 185–192.CrossRef Google Scholar

Poria, S., Cambria, E., Hazarika, D. and Vij, P. (2016). A deeper look into sarcastic tweets using deep convolutional neural networks. In 26th International Conference on Computational Linguistics (COLING), 2016, pp. 1601–12.Google Scholar

Rangwani, H., Kulshreshtha, D. and Singh, A.K. (2018). NLPRL-IITBHU at SemEval-2018 Task 3: Combining linguistic features and emoji pre-trained CNN for irony detection in tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 638–642.Google Scholar

Reyes, A. and Rosso, P. (2011). Mining subjective knowledge from customer reviews: A Specific case of irony detection. In Proceedings of the 2nd Workshop on Computational Approaches to Subjectivity and Sentiment Analysis (WASSA), 2011, pp. 118–124.Google Scholar

Reyes, A., Rosso, P. and Veale, T. (2013). A multidimensional approach for detecting irony in twitter. Language Resources and Evaluation 47(1), 239–268.CrossRef Google Scholar

Reyes, A. and Rosso, P. (2014). On the difficulty of automatically detecting irony: beyond a simple case of negation. Knowledge and Information Systems 40(3), 595–614.CrossRef Google Scholar

Riloff, E., Qadir, A., Surve, P., De Silva, L., Gilbert, N. and Huang, R. (2013). Sarcasm as contrast between a positive sentiment and negative situation. In Conference on Empirical Methods in Natural Language Processing (EMNLP), 2013, pp. 704–14.Google Scholar

Rohanian, O., Taslimipoor, S., Evans, R. and Mitkov, R. (2018). WLV at SemEval-2018 Task 3: Dissecting tweets in search of irony. IN Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 553–559.Google Scholar

Juliano, E.S., Andre, F., Brian, D. and Siegfried, H. (2016). A compositional-distributional semantic model for searching complex entity categories. In Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics (SEM 2016), pp. 199–208Google Scholar

Salton, G. and Buckley, C. (1988). Term-weighting approaches in automatic text retrieval. Information Processing & Management 24(5), 513–523.CrossRef Google Scholar

Sparck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 28(1), 11–21.CrossRef Google Scholar

Sulis, E., Irazù Hernàndez Farías, D., Rosso, P., Patti, V. and Ruffo, G. (2016). Figurative messages and affect in Twitter: Differences between #irony, #sarcasm and #not. Knowledge-Based Systems 108, 132–143.CrossRef Google Scholar

Tsur, O., Davidov, D. and Rappoport, A. (2010). ICWSM—a great catchy name: Semi-supervised recognition of sarcastic sentences in online product reviews. In Proceedings of the Fourth International Conference on Weblogs and Social Media (ICWSM), 2010.Google Scholar

Turney, P.D. and Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research 37, 141–188.CrossRef Google Scholar

Van Hee, C., Lefever, E. and Hoste, V. (2018). Semeval-2018 task 3: Irony detection in English tweets. In Proceedings of The 12th International Workshop on Semantic Evaluation.Google Scholar

Vassallo, G., Pilato, G., Augello, A. and Gaglio, S. (2010). Phase coherence in conceptual spaces for conversational agents. Semantic Computing. New Jersey: John Wiley & Sons, pp. 357–371.Google Scholar

Vu, T., Nguyen, D.Q., Vu, X.S., Nguyen, D.Q., Catt, M. and Trenell, M. (2018). NIHRIO at SemEval-2018 Task 3: A simple and accurate neural network model for irony detection in Twitter. In Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 525–530.Google Scholar

Walker, M.A., Fox Tree, J.E., Anand, P., Abbott, R. and King, J. (2012). A corpus for research on deliberation and debate. In 8th International Conference on Language Resources and Evaluation (LREC), 2012, pp. 812–817.Google Scholar

Wallace, B.C., Choe, D.K., Kertz, L. and Charniak, E. (2014). Humans require context to infer ironic intent (so computers probably do, too). In 52nd Annual Meeting of the Association for Computational Linguistics, pp. 512–516.CrossRef Google Scholar

Wang, P-Y.A. (2013). #Irony or #Sarcasm– a quantitative and qualitative study based on Twitter. In 27th Pacific Asia Conference on Language, Information, and Computation, pp. 349–356.Google Scholar

Wang, Z., Wu, Z., Wang, R. and Ren, Y. (2015). Twitter sarcasm detection exploiting a context-based model. In International Conference on Web Information Systems Engineering, pp. 77–91.CrossRef Google Scholar

Weitzel, L., Prati, R.C. and Aguiar, R.F. (2016). The comprehension of figurative language: What is the influence of irony and sarcasm on NLP techniques? In Sentiment Analysis and Ontology Engineering: Studies in Computational Intelligence. Cham: Springer, pp. 639.Google Scholar

Wu, C., Wu, F., Wu, S., Liu, J., Yuan, Z. and Huang, Y. (2018). THU_NGN at SemEval-2018 Task 3: Tweet irony detection with densely connected LSTM and multi-task learning. In Proceedings of The 12th International Workshop on Semantic Evaluation, pp. 51–56.Google Scholar

Article contents

Effectiveness of data-driven induction of semantic spaces and traditional classifiers for sarcasm detection

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests