Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-13T17:07:58.549Z Has data issue: false hasContentIssue false

Modeling the impact of orthographic coding on Czech–Polish and Bulgarian–Russian reading intercomprehension

Published online by Cambridge University Press:  05 October 2017

Irina Stenger
Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Klára Jágrová
Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Andrea Fischer
Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Tania Avgustinova
Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Dietrich Klakow
Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Roland Marti
Affiliation:
Collaborative Research Center (SFB) 1102: Information Density and Linguistic Encoding Project C4: INCOMSLAV Mutual Intelligibility and Surprisal in Slavic Intercomprehension Saarland University, Postfach 151150, 66041 Saarbrücken, Germany. [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Get access

Abstract

Focusing on orthography as a primary linguistic interface in every reading activity, the central research question we address here is how orthographic intelligibility can be measured and predicted between closely related languages. This paper presents methods and findings of modeling orthographic intelligibility in a reading intercomprehension scenario from the information-theoretic perspective. The focus of the study is on two Slavic language pairs: Czech–Polish (West Slavic, using the Latin script) and Bulgarian–Russian (South Slavic and East Slavic, respectively, using the Cyrillic script). In this article, we present computational methods for measuring orthographic distance and orthographic asymmetry by means of the Levenshtein algorithm, conditional entropy and adaptation surprisal method that are expected to predict the influence of orthography on mutual intelligibility in reading.

Type
Research Article
Copyright
Copyright © Nordic Association of Linguistics 2017 

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

CORPORA

Czech National Corpus: Srovnávací frekvenčni seznamy. 2010. http://ucnk.ff.cuni.cz/srovnani10.php (accessed 1 January 2016).Google Scholar
Frequency Dictionaries of Bulgarian . 2011. Department of Computational Linguistics, Bulgarian Academy of Sciences. http://dcl.bas.bg/en/tchestotni-retchnitsi-na-balgarskiya-ezik-2 (accessed 5 April 2016).Google Scholar
Internationalism list . http://www.eurocomslav.de/kurs/iwslav.htm (accessed 11 July 2015).Google Scholar
Lista frekwencyjna [Frequency list]. 2016. Grupa Technologii Językowych G4.19 Politechniki Wrocławskiej. http://www.nlp.pwr.wroc.pl/narzedzia-i-zasoby/zasoby/lista-frekwencyjna (accessed 8 September 2016).Google Scholar
Novyj Častotnyj Slovar’ Russkoj Leksiki [New frequency dictionary of Russian vocabulary] (NČS). 2009. Ol'ga N. Ljaševskaja & Sergej A. Šarov. http://dict.ruslang.ru/freq.php (accessed 5 April 2016).Google Scholar
Otwarty słownik czesko-polski [Open Czech–Polish dictionary] V.03.2010 (c) . 2010. J. Kazojć. http://www.slowniki.org.pl/czesko-polski.pdf (accessed 22 April 2015).Google Scholar
Pan-Slavic list . http://www.eurocomslav.de/kurs/pwslav.htm (accessed 11 July 2015).Google Scholar
Russko-bolgarskij Razgovornik [Russian–Bulgarian phrase book]. Izdatel'stvo ‘Chermes’. https://drive.google.com/file/d/0B3ZsKnxnxCJNSUd3RzNnOVYydlU/view (accessed 15 April 2016).Google Scholar
Russko-bolgarskij Slovar’ [Russian–Bulgarian dictionary]. http://www.lexicons.ru/modern/b/bulgarian/index.html (accessed 5 April 2016).Google Scholar

REFERENCES

Beijering, Katrin, Gooskens, Charlotte & Heeringa, Wilbert. 2008. Predicting intelligibility and perceived linguistic distance by means of the Levenshtein algorithm. In van Koppen, Marjo & Botma, Bert (eds.), Linguistics in the Netherlands 2008, 1324. Amsterdam: John Benjamins.Google Scholar
Bidwell, Charles E. 1963. Slavic Historical Phonology in Tabular Form. The Hague: Mouton & Co. Google Scholar
Braunmüller, Kurt & Ludger Zeevaert, L. 2001. Semikommunikation, rezeptive Mehrsprachigkeit und verwandte Phänomene. Eine bibliographische Bestandaufnahme (Arbeiten zur Mehrsprachigkeit, Folge B, 19). Hamburg: Universität Hamburg.Google Scholar
Broda, Bartosz & Piasecki, Maciej. 2013. Parallel, massive processing in SuperMatrix: A general tool for distributional semantic analysis of corpora. International Journal of Data Mining, Modelling and Management 5 (1), 119.CrossRefGoogle Scholar
Budovičová, Viera. 1987. Literary languages in contact: A sociolinguistic approach to the relation between Slovak and Czech today. In Chloupek, Jan & Nekvapil, Jiří (eds.), Reader in Czech Sociolinguistics, 156175. Amsterdam: John Benjamins.Google Scholar
Comrie, Bernard. 1996a. Adaptations of the Roman alphabet: Languages of Eastern and Southern Europe. In Daniels & Bright (eds.), 663–675.Google Scholar
Comrie, Bernard. 1996b. Adaptations of the Cyrillic alphabet. In Daniels & Bright (eds.), 700–726.Google Scholar
Corbett, Greville G. 1998. Agreement in Slavic. Presented at the workshop Comparative Slavic Morphosyntax, Indiana University, Bloomington. [Position paper]Google Scholar
Cubberley, Paul. 1996. The Slavic alphabets. In Daniels & Bright (eds.), 346–355.Google Scholar
Daniels, Peter T. 2001. Writing systems. In Aronoff, Mark & Rees-Miller, Janie (eds.), The Handbook of Linguistics, 4380. Malden, MA: Blackwell.Google Scholar
Daniels, Peter T. & Bright, William (eds.). 1996. The World's Writing Systems. New York & Oxford: Oxford University Press.Google Scholar
Doyé, Peter. 2005. Intercomprehension. Guide for the Development of Language Education Policies in Europe: From Linguistic Diversity to Plurilingual Education (Reference Studies). Strasbourg: Council of Europe.Google Scholar
Fischer, Andrea, Jágrová, Klára, Stenger, Irina, Avgustinova, Tania, Klakow, Dietrich & Marti, Roland. 2015. An orthography transformation experiment with Czech–Polish and Bulgarian–Russian parallel word sets. In Sharp, Bernadette, Lubaszewski, Wiesław & Delmonte, Rodolfo (eds.), Natural Language Processing and Cognitive Science 2015 Proceedings, 115126. Venezia: Libreria Editrice Cafoscarina.Google Scholar
Frinsel, Felicity, Kingma, Anne, Gooskens, Charlotte & Swarte, Femke. 2015. Predicting the asymmetric intelligibility between spoken Danish and Swedish using conditional entropy. Tijdschrift voor Slandinavistiek 34 (2), 120138.Google Scholar
Frost, Ram. 2012. Towards a universal model of reading. Behavioral and Brain Sciences 35 (5), 263329.CrossRefGoogle ScholarPubMed
Gooskens, Charlotte. 2007. The contribution of linguistic factors to the intelligibility of closely related languages. Journal of Multilingual and Multicultural Development 28 (6), 445467.Google Scholar
Gooskens, Charlotte. 2013. Experimental methods for measuring intelligibility of closely related language varieties. In Bayley, Robert, Cameron, Richard & Lucas, Ceil (eds.), Handbook of Sociolinguistics, 195213. Oxford: Oxford University Press.Google Scholar
Gooskens, Charlotte & Hilton, Nanna H.. 2013. The effect of social factors on the comprehension of a closely related language. In Tirkkonen, Jani-Matti & Anttikoski, Esa (eds.), Proceedings of the 24th Scandinavian Conference of Linguistics, 201210. Joensuu: University of Eastern Finland.Google Scholar
Gooskens, Charlotte & van Bezooijen, Renée. 2006. Mutual comprehensibility of written Afrikaans and Dutch: Symmetrical or asymmetrical? Literary and Linguistic Computing 21 (4), 543557.Google Scholar
Gooskens, Charlotte & van Bezooijen, Renée. 2013a. Explaining Danish–Swedish asymmetric word intelligibility: An error analysis. In Gooskens & van Bezooijen (eds.), 59–82.Google Scholar
Gooskens, Charlotte & van Bezooijen, Renée (eds.). 2013b. Phonetics in Europe: Perception and Production. Frankfurt a.M.: Peter Lang.Google Scholar
Hale, John. 2016. Information-theoretical complexity metrics. Language and Linguistics Compass 10 (9), 397412.Google Scholar
Harley, Trevor. 2008. The Psychology of Language: From Data to Theory. New York: Psychology Press.Google Scholar
Haugen, Einar. 1966. Semicommunication: The language gap in Scandinavia. Sociological Inquiry 36, 280297.CrossRefGoogle Scholar
Heeringa, Wilbert, Golubovic, Jelena, Gooskens, Charlotte, Schüppert, Anja, Swarte, Femke & Voigt, Stefanie. 2013. Lexical and orthographic distances between Germanic, Romance and Slavic languages and their relationship to geographic distance. In Gooskens & van Bezooijen (eds.), 99–137.Google Scholar
Heeringa, Wilbert, Kleiweg, Peter, Gooskens, Charlotte & Nerbonne, John. 2006. Evaluation of string distance algorithms for dialectology. In Nerbonne, John & Hinrichs, Erhard (eds.), Linguistic Distances Workshop at the Joint Conference of International Committee on Computational Linguistics and the Association for Computational Linguistics, 5162. The Association for Computational Linguistics (ACL).Google Scholar
Ivanova, Vera F. 1991. Sovremennaja russkaja orfografija [Contemporary Russian orthography]. Moskva: Vysšaja škola.Google Scholar
Jágrová, Klára, Stenger, Irina, Marti, Roland & Avgustinova, Tania. 2017. Lexical and orthographic distances between Bulgarian, Czech, Polish, and Russian: A comparative analysis of the most frequent nouns. In Emonds, Joseph & Janebová, Markéta (eds.), Language Use and Linguistic Structure: Proceedings of the Olomouc Linguistics Colloquium 2016, 401416. Olomouc: Palacký University.Google Scholar
Jensen, John B. 1989. On the mutual intelligibility of Spanish and Portuguese. Hispania 72 (4), 848852.Google Scholar
Joshi, R. Malatesha & Aaron, P. G.. 2006. Introduction to the volume. In Joshi, R. Malatesha & Aaron, P. G. (eds.), Handbook of Orthography and Literacy, xiiixiv. Mahwah, NJ & London: Lawrence Erlbaum.Google Scholar
Kazojć, Jerzy. 2010. Otwarty słownik czesko-polski [Open Czech–Polish dictionary], V.03.2010 (c). http://www.slowniki.org.pl/czesko-polski.pdf (accessed 22 April 2015).Google Scholar
Kempgen, Sebastian. 2009. Phonetik, Phonologie, Orthographie, Flexionsmorphologie. In Kempgen et al. (eds.), 1–14.Google Scholar
Kempgen, Sebastian, Kosta, Peter, Berger, Tilman & Gutschmidt, Karl (eds.). 2009. The Slavic Languages: An International Handbook of their Structure, their History and their Investigation, vol. 1. Berlin & New York: Walter de Gruyter.Google Scholar
Kravchenko, Alexander V. 2009. The experiential basis of speech and writing as different cognitive domains. Pragmatics & Cognition 17 (3), 527548.Google Scholar
Křen, Michal. 2010. Srovnávací frekvenční seznamy [Comparative frequency lists]. Prague: Institute of the Czech National Corpus Faculty of Arts, Charles University Prague. http://ucnk.ff.cuni.cz/index.php (accessed 11 September 2016).Google Scholar
Kučera, Karel. 2009. The orthographic principles in the Slavic languages: Phonetic/phonological. In Kempgen et al. (eds.), 70–76.Google Scholar
Kürschner, Sebastian, van Bezooijen, Renée & Gooskens, Charlotte. 2008. Linguistic determinants of the intelligibility of Swedish words among Danes. International Journal of Humanities and Arts Computing 2 (1/2), 83100.Google Scholar
Levenshtein, Vladimir I. 1966. Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady 10 (8), 707710.Google Scholar
Ljaševskaja, Ol'ga N. & Šarov, Sergej A.. 2009. Častotnyj slovar’ sovremennogo russkogo jazyka [Frequency dictionary of the contemporary Russian language]. Moskva: Azbukovnik.Google Scholar
Marti, Roland. 2014. Historische Graphematik des Slavischen: Glagolitische und kyrillische Schrift. In Gutschmidt, Karl, Kempgen, Sebastian, Berger, Tilman & Kosta, Peter (eds.), The Slavic Languages: An International Handbook of their Structure, their History and their Investigation, vol. 2, 14971514. Berlin & New York: Walter de Gruyter.Google Scholar
Maslov, Jurij S. 1981. Grammatika bolgarskogo jazyka [A grammar of the Bulgarian language]. Moskva: Vysšaja škola.Google Scholar
Moberg, Jens, Gooskens, Charlotte, Nerbonne, John & Vaillette, Nathan. 2006. Conditional entropy measures intelligibility among related languages. In Dirix, Peter, Schuurman, Ineke, Vandeghinste, Vincent & Van Eynde, Frank (eds.), Computational Linguistics in the Netherlands 2006: Selected Papers from the 17th CLIN Meeting, 5166. Utrecht: LOT.Google Scholar
Möller, Robert & Zeevaert, Ludger. 2015. Investigating word recognition in intercomprehension: Methods and findings. Linguistics 2015 53 (2), 313352.Google Scholar
Musatov, Valerij N. 2012. Russkij jazyk. Fonetika, fonologija, orfoėpija, grafika, orfografija [The Russian language: Phonetics, phonology, orphoepy, graphetics, orthography]. Moskva: Izdatel'stvo ‘Flinta’.Google Scholar
Sampson, Geoffrey. 1985. Writing Systems: A Linguistic Introduction. Stanford, CA: Stanford University Press.Google Scholar
Schüppert, Anja & Gooskens, Charlotte. 2012. The role of extra-linguistic factors for receptive bilingualism: Evidence from Danish and Swedish pre-schoolers. International Journal of Bilingualism 16 (3), 332347.Google Scholar
Schüppert, Anja, Hilton, Nanna H. & Gooskens, Charlotte. 2015. Swedish is beautiful, Danish is ugly? Investigating the link between language attitudes and spoken word recognition. Linguistics 53 (2), 375403.CrossRefGoogle Scholar
Sgall, Petr. 2006. Towards a theory of phonemic orthography. In Sgall, Petr (ed.), Language in its Multifarious Aspects, 430452. Prague: Charles University; Karolinum Press.Google Scholar
Shannon, Claude E. 1948. A mathematical theory of communication. Bell System Technical Journal 27 (379–423), 623656.Google Scholar
Skorvid, Sergej S. 2005. Češskij jazyk [The Czech language]. In Moldovan, Aleksandr M., Skorvid, Sergej S., Kibrik, Andrej A., Rogova, Natal'ja V., Jakuškina, Ekaterina I., Žuravlёv, Anatolij F. & Tolstaja, Svetlana (eds.), Jazyki mira. Slavjanskie jazyki [The languages of the world: Slavic languages], 234274. Moskva: Academia.Google Scholar
Smith, Nathaniel J. & Levy, Roger. 2013. The effect of word predictability on reading time is logarithmic. Cognition 128 (3), 302319.Google Scholar
Stenger, Irina, Avgustinova, Tania & Marti, Roland. 2017. Levenshtein distance and word adaptation surprisal as methods of measuring mutual intelligibility in reading comprehension of Slavic languages. Computational Linguistics and Intellectual Technologies: International Conference ‘Dialogue 2017’ Proceedings. Issue 16 (23), vol. 1, 304317.Google Scholar
Stenger, Irina, Jágrová, Klára, Fischer, Andrea & Avgustinova, Tania. In press. ‘Reading Polish with Czech eyes’ or ‘How Russian can a Bulgarian text be?’: Orthographic differences as an experimental variable in Slavic intercomprehension. In Kosta, Peter & Radeva-Bork, Teodora (eds.), Current Developments in Slavic Linguistics: Twenty Years After [preliminary title]. Frankfurt am Main: Peter Lang.Google Scholar
Ternes, Elmar & Vladimirova-Buhtz, Tatjana. 2010. Bulgarian. In IPA (ed.), Handbook of the International Phonetic Association: A Guide to the Use of the International Phonetic Alphabet, 5557. Cambridge: Cambridge University Press.Google Scholar
Tribus, Myron. 1961. Thermostatics and Thermodynamics. Princeton, NJ: D. van Nostrand Company.Google Scholar
van Bezooijen, Renée & Gooskens, Charlotte. 2007. Interlingual text comprehension: Linguistic and extralinguistic determinants. In ten Thije, Jan D. & Zeevaert, Ludger (eds.), Receptive Multilingualism: Linguistic Analyses, Language Policies and Didactic Concepts, 249264. Amsterdam: John Benjamins.Google Scholar
van Heuven, Vincent J., Gooskens, Charlotte & van Bezooijen, Renée. 2015. Introduction Micrela: Predicting mutual intelligibility between closely related languages in Europe. In Navracsics, Judit & Batyi, Szilvia (eds.), First and Second Language: Interdisciplinary Approaches (Studies in Psycholinguistics 6), 127145. Budapest: Tinta konyvkiado.Google Scholar
Vanhove, Jan. 2016. The early learning of interlingual correspondences rules in receptive multilingualism. International Journal of Bilingualism 20 (5), 580593.Google Scholar
Vanhove, Jan & Berthele, Raphael. 2015a. The lifespan development of cognate guessing skills in an unknown related language. International Review of Applied Linguistics in Language Teaching 53 (1), 138.Google Scholar
Vanhove, Jan & Berthele, Raphael. 2015b. Item-related determinants of cognate guessing in multilinguals. In De Angelis, Gessica, Jessner, Ulrike & Kresić, Marija (eds.), Crosslinguistic Influence and Crosslinguistic Interaction in Multilingual Language Learning, 95118. London: Bloomsbury.Google Scholar
Vasmer, Max. 1973. Ėtimologičeskij slovar’ russkogo jazyka [Etymological dictionary of the Russian language]. Moskva: Progress.Google Scholar
Yanushevskaya, Irena & Bunčić, Daniel. 2015. Russian. Journal of the International Phonetic Association 45 (2), 221228.CrossRefGoogle Scholar
Žuravlev, Anatolij F. (ed.). 1974–2012. Ėtimologičeskij slovar’ slavjanskich jazykov. Praslavjanskij leksičeskij fond [Etymological dictionary of the Slavic languages: The Common Slavic lexical basis], vols. 1–37. Moskva: Nauka.Google Scholar