Corpus-based learning of Cantonese for Mandarin speakers

Tak-Sum Wong; John S. Y. Lee

doi:10.1017/S0958344015000257

Corpus-based learning of Cantonese for Mandarin speakers

Published online by Cambridge University Press: 17 March 2016

Tak-Sum Wong and

John S. Y. Lee

Show author details

Tak-Sum Wong: Affiliation:
City University of Hong Kong, Hong Kong (email: [email protected])
John S. Y. Lee: Affiliation:
City University of Hong Kong, Hong Kong (email: [email protected])

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This article presents the first study on using a parallel corpus to teach Cantonese, the variety of Chinese spoken in Hong Kong. We evaluated this approach with Mandarin-speaking undergraduate students at the beginner level. Exploiting their knowledge of Mandarin, a closely related language, the students studied Cantonese with authentic material in a Cantonese-Mandarin parallel corpus, transcribed from television programs. They were given a list of Mandarin words that yield a range of possible Cantonese translations, depending on the linguistic context. Leveraging sentence and word alignments in the parallel corpus, the students independently searched for example sentences to discover these translation equivalents. Experimental results showed that, in both the short- and long-term, this data-driven learning approach helped students improve their knowledge of Cantonese vocabulary. These results suggest the potential of applying parallel corpora at even the beginners’ level for other L1-L2 pairs of closely related languages.

Keywords

parallel corpus parallel concordance language learning Cantonese Mandarin

Type: Regular papers
Information: ReCALL , Volume 28 , Issue 2 , May 2016 , pp. 187 - 206

DOI: https://doi.org/10.1017/S0958344015000257 [Opens in a new window]
Copyright: Copyright © European Association for Computer Assisted Language Learning 2016

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Anthony, L. (2012) AntPConc. Tokyo: Waseda University. http://www.antlab.sci.waseda.ac.jp Google Scholar

Babych, S. (2015) Textual cohesion patterns for developing reading skills: A corpus-based multilingual learning environment. In Leńko-Szymańska, A. and Boulton, A. (eds.), Multiple affordances of language corpora for data-driven learning. 155–175. Amsterdam: John Benjamins.Google Scholar

Barlow, M. (2000) Parallel texts in language teaching. In Botley, S. P., McEnery, M. A. and Wilson A. (eds.), Multilingual corpora in teaching and research. Amsterdam/Atlanta: Rodopi, 107–115.Google Scholar

Barlow, M. (2004) ParaConc. http://www.athel.com/para.html Google Scholar

Boulton, A. (2008a) But where’s the proof? The need for empirical evidence for data-driven learning. In Edwardes, M. (ed.), Proceedings of BAAL annual conference 2007. London: Scitsiugnil Press, 13–16.Google Scholar

Boulton, A. (2008b) Looking (for) empirical evidence of data-driven learning at lower levels. In Lewandowska-Tomaszczyk, B. (ed.), Corpus linguistics, computer tools, and applications: State of the art. Frankfurt: Peter Lang, 581–598.Google Scholar

Boulton, A. (2008c) DDL: Reaching the parts other teaching can’t reach? Proceedings of the teaching and language corpora conference. Lisbon: Associação de Estudos e de Investigação Cientifíca do ISLA-Lisboa, 38–44.Google Scholar

Boulton, A. (2009) Testing the limits of data-driven learning: Language proficiency and training. ReCALL, 21(1): 37–54.Google Scholar

Boulton, A. (2010) Data-driven learning: Taking the computer out of the equation. Language learning, 60(3): 534–572.CrossRef Google Scholar

Boulton, A. (2011) Data-driven learning: The perpetual enigma. In Goźdź-Roszkowski, S. (ed.), Explorations across languages and corpora. Frankfurt: Peter Lang, 563–580.Google Scholar

Boulton, A. (2012) Language awareness and medium-term benefits of corpus consultation. In Gimeno Sanz, A. (ed.), New trends in CALL – working together. London: Macmillan, 39–46. https://hal.archives-ouvertes.fr/hal-00502606v2/document Google Scholar

Chan, T.-P. and Liou, H.-C. (2005) Effects of web-based concordancing instruction on EFL students’ learning of verb-noun collocations. Computer Assisted Language Learning, 18(3): 231–250.CrossRef Google Scholar

Chang, L. L. (2007) The effects of using CALL on advanced Chinese foreign language learners. CALICO Journal, 24(2): 331–353.Google Scholar

Chang, P.-C., Galley, M. and Manning, C. D. (2008) Optimizing Chinese word segmentation for machine translation performance. In Callison-Burch, C., Koehn, P., Monz, C. and Fordyce, C. S. (eds.), Proceedings of the 3rd workshop on statistical machine translation. Stroudsbury: Association for Computational Linguistics. http://dl.acm.org/citation.cfm?id=1626430 Google Scholar

Chujo, K., Anthony, L. and Oghigian, K. (2009) DDL for the EFL classroom: Effective uses of a Japanese-English parallel corpus and the development of a learner-friendly, online parallel concordancer. In Mahlberg, M., González-Díaz, V. and Smith, C. (eds.), Proceedings of the corpus linguistics conference (CL 2009). University of Liverpool. http://ucrel.lancs.ac.uk/publications/cl2009/48_FullPaper.doc Google Scholar

Chujo, K., Oghigian, K., Anthony, L. and Yokota, K. (2013) Teaching remedial grammar through data-driven learning using AntPConc. Taiwan International ESP Journal, 5(2): 65–90.Google Scholar

Erbaggio, P., Gopalakrishnan, S., Hobbs, S. and Liu, H. (2012) Enhancing student engagement through online authentic materials. The International Association for Language Learning Technology Journal, 42(2): 27–51.Google Scholar

Gao, Z.-M. (2011) Exploring the effects and use of a Chinese-English parallel concordancer. Computer Assisted Language Learning, 24(3): 255–275.Google Scholar

Geist, M. and Hahn, A. (2012) Using a corpus for written production: A classroom study. In Thomas, J. E. and Boulton, A. (eds.), Input, process and product: Developments in teaching and language corpora. Brno: Masaryk University Press, 123–135.Google Scholar

Hannas, W. C. (1997) Asia’s orthographic dilemma. Honolulu: University of Hawaii Press.Google Scholar

Huang, H.-T. and Liou, H.-C. (2007) Vocabulary learning in an automated graded reading program. Language Learning & Technology, 11(3): 61–82.Google Scholar

Johns, T. F. (1991) Should you be persuaded: Two samples of data-driven learning materials. In Johns, T. F. and King, P. (eds.), Classroom concordancing. English Language Research Journal. Birmingham: Birmingham University, 4: 1–13.Google Scholar

Johns, T. F. (1994) From printout to handout: Grammar and vocabulary teaching in the context of data-driven learning. In Odlin, T. (ed.), Perspectives on pedagogical grammar. Cambridge: Cambridge University Press, 293–313.Google Scholar

Johns, T. F. (1997) Contexts: The background, development and trialling of a concordance-based CALL program. In Wichmann, A., Fligelstone, S., McEnery, T. and Knowles, G. (eds.), Teaching and language corpora. London: Longman, 100–115.Google Scholar

Ki, W. W. (2006) Computer-assisted perceptual learning of Cantonese tones. The 14th international conference on computers in education. Peking Normal University, 30/11/06.Google Scholar

Kilgarriff, A., Huang, C., Rychly, P., Smith, S. and Tugwell, D. (2005) Chinese word sketches. In Ooi, B. Y. V., Pakir, A., Talib, I. B. S., Tan, L., Tan, K. W. P. and Tan, Y. Y. (eds.), Words in Asian cultural context: Proceedings of the 4th ASIALEX conference. National University of Singapore, 1–3/06/05.Google Scholar

Kuo, M.-L. A. and Hooper, S. (2004) The effect of visual and verbal coding mnemonics on learning Chinese characters in computer-based instruction. Educational Technology Research and Development, 52(3): 23–34.Google Scholar

Lam, H. C., Ki, W. W., Law, N., Chung, A. L. S., Ko, P. Y., Ho, A. H. S. and Pun, S. W. (2001) Designing CALL for learning Chinese characters. Journal of Computer Assisted Learning, 17(1): 115–128.CrossRef Google Scholar

Lange, D. L. (1999) Planning for using the new national culture standards. In Phillips, J. and Terry, R. M. (eds.), Foreign language standards: Linking research, theories, and practices. Lincolnwood, IL: National Textbook Company, 57–120.Google Scholar

Larimer, R. E. and Schleicher, L. (eds.) (1999) New ways in using authentic materials in the classroom. Alexandria, VA: Teachers of English to Speakers of Other Languages, Inc.Google Scholar

Lee, J. (2011) Toward a parallel corpus of spoken Cantonese and written Chinese. In Wang, H. and Yarowsky, D. (eds.), Proceedings of the 5th international joint conference on natural language processing. Chiang Mai: Asian Federation of Natural Language Processing. https://aclweb.org/anthology/I/I11/I11-1174.pdf Google Scholar

Lee, J. (2012) Corpus-based analysis of mixed code in Hong Kong speech. In Xiong, D., Castelli, E., Dong M. and Yen P. T. N. (eds.), Proceedings of 2012 international conference on Asian language processing. Hanoi: IEEE.Google Scholar

Lee, J., Hui, C. Y. and Kong, Y. H. (2013) Treebanking for data-driven research in the classroom. In Derzhanski, I. and Radev D. (eds.), Proceedings of the 4th workshop on teaching natural language processing. Stroudsburg: Association for Computational Linguistics. https://www.aclweb.org/anthology/W/W13/W13-3409.pdf Google Scholar

Li, D. C. S., Wong, C. S. P., Leung, W. M. and Wong, S. T. S. (2016) Facilitation of transference: The case of monosyllabic salience in Hong Kong Cantonese. Linguistics, 54(1).Google Scholar

Luk, R. W. P. and Ng, A. B. Y. (1998) Computer-assisted learning of Chinese idioms. Journal of Computer Assisted Learning, 14(1): 2–18.Google Scholar

Mair, V. H. (1991) What is a Chinese “dialect/topolect”? Reflections on some key Sino-English linguistic terms. Sino-Platonic Papers, 29: 1–31.Google Scholar

Matthews, S. and Yip, V. (2011) Cantonese: A comprehensive grammar. New York: Routledge.Google Scholar

Montero Perez, M., Paulussen, H., Macken, L. and Desmet, P. (2014) From input to output: the potential of parallel corpora for CALL. Language Resources and Evaluation, 48(1): 165–189.Google Scholar

Nation, I. S. P. (2001) Learning vocabulary in another language. Cambridge: Cambridge University Press.Google Scholar

Nerbonne, J. (2000) Parallel texts in computer-assisted language learning. In Veronis, J. (ed.), Parallel text processing. Dordrecht and Boston: Kluwer, 354–369.Google Scholar

Ōuyáng, J. (1993) Pŭtōnghuà Guăngzhōuhuà de bĭjiào yŭ xuéxí (The comparison and learning of Mandarin and Cantonese). Peking: China Social Science Press.Google Scholar

Poole, R. (2012) Concordance-based glosses for academic vocabulary acquisition. CALICO Journal, 29(4): 679–693.Google Scholar

Ramsey, S. R. (1987) The languages of China. Princeton: Princeton University Press.Google Scholar

Rosell-Aguilar, F. and Kan, Q. (2015) Design and user evaluation of a mobile application to teach Chinese characters. JALT CALL journal, 11(1): 19–40.Google Scholar

St. John, E. (2001) A case for using a parallel corpus and concordancer for beginners of a foreign language. Language Learning & Technology, 5(3): 185–203.Google Scholar

Shei, C. and Hsieh, H.-P. (2012) Linkit: a CALL system for learning Chinese characters, words, and phrases. Computer Assisted Language Learning, 25(4): 319–338.Google Scholar

Shī, Z. (2002) Guǎngzhōu yīn Běijīng yīn Duìyìng Shǒucè(A handbook on the correspondence between Cantonese pronunciation and Pekinese pronunciation). Canton: Jinan University Press.Google Scholar

Smith, S., Huang, C.-R., Kilgarriff, A. and Chen, M.-R. (2008) A corpus query tool for SLA: Learning Mandarin with the help of Sketch Engine. In Lewandowska-Tomaszczyk, B. (ed.), Corpus linguistics, computer tools, and applications – state of the art. Frankfurt: Peter Lang, 673–686.Google Scholar

Tadmor, U., Haspelmath, M. and Taylor, B. (2010) Borrowability and the notion of basic vocabulary. Diachronica, 27(2): 226–246.Google Scholar

Tian, S. (2004) Data-driven learning: Do learning tasks and proficiency make a difference? In Proceedings of the 9th conference of pan-Pacific association of applied linguistics. http://www.paaljapan.org/resources/proceedings/PAAL9/pdf/TianShiaup.pdf Google Scholar

Tono, Y., Satake, Y. and Miura, A. (2014) The effects of using corpora on revision tasks in L2 writing with coded error feedback. ReCALL, 26(2): 147–162.Google Scholar

Wang, L. (2001) Exploring parallel concordancing in English and Chinese. Language Learning & Technology, 5(3): 174–184.Google Scholar

Wong, L.-H., Chin, C.-K., Tan, C.-L. and Liu, M. (2010) Students’ personal and social meaning making in a Chinese idiom mobile learning environment. Educational Technology & Society, 13(4): 15–26.Google Scholar

Wong, T.-S. (2010) A pilot study on the outcome of teaching phonological correspondence in Cantonese class for Mandarin speakers. The 2010 Annual research forum of the linguistic society of Hong Kong (LSHK-ARF 2010). The Chinese University of Hong Kong, 01/12/10.Google Scholar

Wu, Y. and Zhang, J. (2004) A Chinese language expert system using Bayesian learning. In Callaos, N., Lesso, W. and Sanchez, B. (eds.), Proceedings of the 8th world multiconference on systemics, cybernetics and informatics, Florida. http://facultyweb.cs.wwu.edu/~zhangj/home/papers/sci04-nlp.pdf Google Scholar

Yang, C. and Xie, Y. (2013) Learning Chinese idioms through iPads. Language Learning & Technology, 17(2): 12–23.Google Scholar

Zeldes, A., Ritz, J., Lüdeling, A. and Chiarcos, C. (2009) ANNIS: A search tool for multi-layer annotated corpora. In Mahlberg, M., González-Diaz, V. and Smith C. (eds.), Proceedings of the corpus linguistics conference (CL2009). University of Liverpool, 20–23/07/09. http://ucrel.lancs.ac.uk/publications/cl2009/358_FullPaper.doc Google Scholar

Zeng, Z. (1993) Colloquial Cantonese and Putonghua equivalents (4th edn.). (S. K. Lai, trans.). Hong Kong: Joint Publishing (Hong Kong) Company Limited.Google Scholar

Article contents

Corpus-based learning of Cantonese for Mandarin speakers

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests