Sentiment analysis in Turkish: Supervised, semi-supervised, and unsupervised techniques

Cem Rıfkı Aydın; Tunga Güngör

doi:10.1017/S1351324920000200

Sentiment analysis in Turkish: Supervised, semi-supervised, and unsupervised techniques

Published online by Cambridge University Press: 17 April 2020

Cem Rıfkı Aydın and

Tunga Güngör

Show author details

Cem Rıfkı Aydın*: Affiliation:
Department of Computer Engineering, Boğaziçi University, Istanbul34342, Turkey
Tunga Güngör: Affiliation:
Department of Computer Engineering, Boğaziçi University, Istanbul34342, Turkey
*: *Corresponding author. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Although many studies on sentiment analysis have been carried out for widely spoken languages, this topic is still immature for Turkish. Most of the works in this language focus on supervised models, which necessitate comprehensive annotated corpora. There are a few unsupervised methods, and they utilize sentiment lexicons either built by translating from English lexicons or created based on corpora. This results in improper word polarities as the language and domain characteristics are ignored. In this paper, we develop unsupervised (domain-independent) and semi-supervised (domain-specific) methods for Turkish, which are based on a set of antonym word pairs as seeds. We make a comprehensive analysis of supervised methods under several feature weighting schemes. We then form ensemble of supervised classifiers and also combine the unsupervised and supervised methods. Since Turkish is an agglutinative language, we perform morphological analysis and use different word forms. The methods developed were tested on two datasets having different styles in Turkish and also on datasets in English to show the portability of the approaches across languages. We observed that the combination of the unsupervised and supervised approaches outperforms the other methods, and we obtained a significant improvement over the state-of-the-art results for both Turkish and English.

Keywords

Sentiment analysis Opinion mining Machine learning Text classification Morphological analysis

Type: Article
Information: Natural Language Engineering , Volume 27 , Issue 4 , July 2021 , pp. 455 - 483

DOI: https://doi.org/10.1017/S1351324920000200 [Opens in a new window]
Copyright: © Cambridge University Press 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Abdul-Mageed, M., Diab, M.T. and Korayem, M. (2011). Subjectivity and sentiment analysis of modern standard Arabic. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 2, Portland, OR, USA, pp. 587–591.Google Scholar

Akın, A.A. and Akın, M.D. (2007). Zemberek, an open source NLP framework for Turkic languages. Structure 10, 1–5.Google Scholar

Baccianella, S., Esuli, A. and Sebastiani, F. (2010). SENTIWORDNET 3.0: an enhanced lexical resource for sentiment analysis and opinion mining. In Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), Valletta, Malta, pp. 2200–2204.Google Scholar

Baziotis, C., Pelekis, N. and Doulkeridis, C. (2017). DataStories at SemEval-2017 Task 4: deep LSTM with attention for message-level and topic-based sentiment analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp. 747–754.CrossRef Google Scholar

Britz, D. (2017). Convolutional neural network for text classification in TensorFlow. https://github.com/dennybritz/cnn-text-classification-tf.Google Scholar

Çetin, M. and Amasyalı, M.F. (2013). Active learning for Turkish sentiment analysis. In Proceedings of the International Symposium on Innovations in Intelligent Systems and Applications, Albenia, Bulgaria, pp. 1–4.Google Scholar

Chen, R. and Yu, K. (2018). Fast OOV words incorporation using structured word embeddings for neural network language model. In International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, Canada, pp. 6119–6123.CrossRef Google Scholar

Davidov, D., Tsur, O. and Rappoport, A. (2010). Enhanced sentiment learning using Twitter hashtags and smileys. In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, Beijing, China, pp. 241–249.Google Scholar

Dehkhargani, R., Saygn, Y., Yanıkoğlu, B. and Oflazer, K. (2016). SentiTurkNet: a Turkish polarity lexicon for sentiment analysis. Language Resources and Evaluation 50(3), 667–685.CrossRef Google Scholar

Farhadloo, M. and Rolland, E. (2013). Multi-class sentiment analysis with clustering and score representation. In Proceedings of the 2013 IEEE 13th International Conference on Data Mining Workshops, Dallas, TX, USA, pp. 904–912.CrossRef Google Scholar

Felbo, B., Mislove, A., Søgaard, A., Rahwan, I. and Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. In Proceedings of the EMNLP 2017: Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1615–1625.CrossRef Google Scholar

Fontes, L.A. (2009). Interviewing Client Across Cultures: A Practitioner’s Guide. New York: Guilford Press.Google Scholar

Garneau, N., Leboeuf, J.S. and Lamontagne, L. (2018). Predicting and interpreting embeddings for out of vocabulary words in downstream tasks. In Proceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, Brussels, Belgium, pp. 331–333.CrossRef Google Scholar

Go, A., Bhayani, R. and Huang, L. (2009). Twitter sentiment classification using distant supervision. Processing 150, 1–6.Google Scholar

Goldberg, Y. and Hirst, G. (2017). Neural Network Methods in Natural Language Processing. San Rafael: Morgan & Claypool Publishers.CrossRef Google Scholar

Guha, S., Joshi, A. and Varma, V. (2015). SIEL: aspect based sentiment analysis in reviews. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, USA, pp. 759–766.CrossRef Google Scholar

Güngör, O. and Yıldız, E. (2017). Linguistic features in Turkish word representations. In Proceedings of the 25th Signal Processing and Communications Applications Conference (SIU), Antalya, Turkey, pp. 1–4.Google Scholar

Hamilton, W., Clark, K., Leskovec, J. and Jurafsky, D. (2016). Inducing domain-specific sentiment lexicons from unlabeled corpora. In Proceedings of the EMNLP 2016: Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, pp. 1–11.CrossRef Google Scholar

Hatzivassiloglou, V. and McKeown, K.R. (1997). Predicting the semantic orientation of adjectives. In Proceedings of the 35th Annual Meeting of the ACL and the 8th Conference of the European Chapter of the ACL, New Brunswick, NJ, USA, pp. 174–181.Google Scholar

Hochreiter, S. and Schmidhuber, J. (1997). Long short-term memory. Neural computation 9(8), 1735–1780.CrossRef Google Scholar PubMed

Horn, F. (2017). Context encoders as a simple but powerful extension of word2vec. In Proceedings of the 2nd Workshop on Representation Learning for NLP, Vancouver, Canada, pp. 10–14.CrossRef Google Scholar

Jang, H. and Shin, H. (2010). Language-specific sentiment analysis in morphologically rich language. In Proceedings of the 23rd International Conference on Computational Linguistics (COLING ’10): Posters, Beijing, China, pp. 498–506.Google Scholar

Jiang, M., Lan, M. and Wu, Y. (2017). ECNU at SemEval-2017 Task 5: an ensemble of regression algorithms with effective features for fine-grained sentiment analysis in financial domain. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp. 888–893.CrossRef Google Scholar

Joshi, A., Bhattacharyya, P. and Balamurali, A.R. (2010). A fall-back strategy for sentiment analysis in Hindi: a case study. In Proceedings of the 8th ICON, Kharagbur, India, pp. 1–6.Google Scholar

Kaya, M., Fidan, G. and Toroslu, İ. (2012). Sentiment analysis of Turkish political news. In Proceedings of the 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01, Washington, DC, USA, pp. 174–180.CrossRef Google Scholar

Kulcu, S. and Doğdu, E. (2016). A scalable approach for sentiment analysis of Turkish tweets and linking tweets to news. In Proceedings of the 2016 IEEE Tenth International Conference on Semantic Computing, Noida, India, pp. 471–476.CrossRef Google Scholar

Lango, M., Brzezinski, D. and Stefanowski, J. (2016). PUT at SemEval-2016 Task 4: the ABC of Twitter sentiment analysis. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016), San Diego, CA, USA, pp. 126–132.CrossRef Google Scholar

Li, G. and Liu, F. (2010). A clustering-based approach on sentiment analysis. In Proceedings of the 2010 IEEE International Conference on Intelligent Systems and Knowledge Engineering, Hangzhou, China, pp. 331–337.Google Scholar

Maas, A.L., Daly, R.E., Pham, P.T., Huang, D., Ng, A.Y. and Pott, C. (2011). Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1, Portland, OR, USA, pp. 142–150.Google Scholar

Martnez-Cámara, E., Martın-Valdivia, M.T., Molina-González, M.D. and Perea-Ortega, J.M. (2014). Integrating Spanish lexical resources by meta-classifiers for polarity classification. Journal of Information Science 3, 538–554.CrossRef Google Scholar

Martineau, J. and Finin, T. (2009). Delta TFIDF: an improved feature space for sentiment analysis. In Proceedings of the Third AAAI International Conference on Weblogs and Social Media, San Jose, CA, USA, pp. 258–261.Google Scholar

Medagoda, N. (2016) Sentiment analysis on morphologically rich languages: an artificial neural network (ANN) approach. In Shanmuganathan S. and Samarasinghe S. (eds), Artificial Neural Network Modelling. Springer International Publishing, pp. 377–393.CrossRef Google Scholar

Medagoda, N. (2017). Framework for Sentiment Classification for Morphologically Rich Languages: A Case Study for Sinhala. PhD Thesis. Auckland, New Zealand: Auckland University of Technology.Google Scholar

Mikolov, T., Chen, K., Corrado, G. and Dean, J. (2013). Efficient estimation of word representations in vector space. CoRR 1301(3), 1–12.Google Scholar

Ng, A.Y. and Jordan, M.I. (2002). On discriminative vs generative classifiers: a comparison of logistic regression and naive Bayes. In Neural Information Processing Systems, Vancouver, Canada, pp. 841–848.Google Scholar

Pang, B. and Lee, L. (2005). Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Sydney, Australia, pp. 115–124.CrossRef Google Scholar

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M. and Duchesnay, E. (2011). Scikit-learn: machine learning in Python. Journal of Machine Learning Research 12, 2825–2830.Google Scholar

Rosenthal, S., Farra, N. and Nakov, P. (2017). SemEval-2017 Task 4: sentiment analysis in Twitter. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, Canada, pp. 502–518.CrossRef Google Scholar

Sak, H., Güngör, T. and Saraçlar, M. (2007). Morphological disambiguation of Turkish text with perceptron algorithm. In Proceedings of the CICLing 2007, Mexico City, Mexico, pp. 107–118.CrossRef Google Scholar

Sak, H., Güngör, T. and Saraçlar, M. (2008). Turkish language resources: morphological parser, morphological disambiguator and web corpus. In Proceedings of the GoTAL 2008, Gothenburg, Sweden, pp. 417–427.CrossRef Google Scholar

Santos, C.N. and Gatti, M. (2014). Deep convolutional neural networks for sentiment analysis of short texts. In Proceedings of the COLING 2014, The 25th International Conference on Computational Linguistics: Technical Papers, Dublin, Ireland, pp. 69–78.Google Scholar

Saroufim, C., Almatarky, A. and Abdel Hady, M. (2018). Language independent sentiment analysis with sentiment-specific word embeddings. In Proceedings of the 9th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, Brussels, Belgium, pp. 14–23.CrossRef Google Scholar

Taboada, M., Anthony, C. and Voll, K. (2006). Methods for creating semantic orientation dictionaries. In Proceedings of Fifth International Conference on Language Resources and Evaluation (LREC), Genoa, Italy, pp. 427–432.Google Scholar

Thelwall, M., Buckley, K. and Paltoglou, G. (2012). Sentiment strength detection for the social web. Journal of the American Society for Information Science and Technology 63(1), 163–173.CrossRef Google Scholar

Torunoğlu, D. and Eryiğit, G. (2014). A cascaded approach for social media text normalization of Turkish. In Proceedings of the 5th Workshop on Language Analysis for Social Media at EACL, Gothenburg, Sweden, pp. 62–70.CrossRef Google Scholar

Turney, P.D. (2002). Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, pp. 417–424.Google Scholar

Türkmenoğlu, C. and Tantuğ, A.C. (2014). Sentiment analysis in Turkish media. In Proceedings of the Workshop on Issues of Sentiment Discovery and Opinion Mining, International Conference on Machine Learning, Beijing, China, pp. 1–11.Google Scholar

Vural, A.G., Cambazoğlu, B.B., Şenkul, P. and Tokgöz, Z.Ö. (2012). A framework for sentiment analysis in Turkish: application to polarity detection of movie reviews in Turkish. In Proceedings of the 27th International Symposium on Computer and Information Sciences, Paris, France, pp. 437–445.Google Scholar

Wang, S. and Manning, C.D. (2012). Baselines and bigrams: simple, good sentiment and topic classification. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, Jeju Island, Korea, pp. 90–94.Google Scholar

Yang, H. and Chao, A.F.Y. (2015). Sentiment analysis for Chinese reviews of movies in multi-genre based on morpheme-based features and collocations. Information Systems Frontiers 17(6), 1335–1352.CrossRef Google Scholar

Yıldırım, E., Çetin, F.S., Eryiğit, G. and Temel, T. (2014). The impact of NLP on Turkish sentiment analysis. In Proceedings of the TURKLANG’14 International Conference on Turkic Language Processing, Istanbul, Turkey, pp. 1–6.Google Scholar

Yıldız, E., Tırkaz, C., Şahin, H.B., Eren, M.T. and Sönmez, O.O. (2016). A morphology-aware network for morphological disambiguation. In 30th AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, pp. 2863–2869.Google Scholar

Article contents

Sentiment analysis in Turkish: Supervised, semi-supervised, and unsupervised techniques

Abstract

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests