Corpora as Agency in the Intellectualisation of African Languages

doi:10.1017/9781108671088.014

12 - Corpora as Agency in the Intellectualisation of African Languages

from Part III - Digitalisation and Democratisation of Knowledge

Published online by Cambridge University Press: 18 September 2020

Langa Khumalo

Edited by

Russell H. Kaschula and

H. Ekkehard Wolff

Show author details

Russell H. Kaschula: Affiliation:
Rhodes University, South Africa
H. Ekkehard Wolff: Affiliation:
Universität Leipzig

Book contents

Get access

Summary

The chapter critically examines the development of corpora that is being driven at the University of KwaZulu-Natal as one of the key agents of language intellectualisation. The chapter critically evaluates the architecture of the two types of corpora. The first corpus is the isiZulu National Corpus (INC). The INC is an organic corpus of 30 million tokens. It is designed as a monitor corpus, and an important precursor to the development of isiZulu human language technologies. It will be evinced that crucial to the development of the isiZulu spellchecker is the INC, which was used to train the checker. The second type of corpus is an English-IsiZulu Parallel Corpus (EIPC), with a modest size of fifty e-files of each natural language. A parallel corpus is a collection of identical texts in two natural languages, processed and stored in machine-readable format. The EIPC is crucial in the development of automated machine translations between English and isiZulu. The development of a machine translation tool using computational processes requires a parallel corpus such as EIPC as an agent and follows the tenets of the Data-Driven Machine Translation (DDMT) approach. The chapter outlines the imperative to develop both the INC and the EIPC. The chapter further shows that the two corpora are key components in the intellectualisation of isiZulu as a digital, scientific, natural language.

Keywords

Intellectualisation corpora INC machine translation computational tools

Type: Chapter
Information: The Transformative Power of Language
From Postcolonial to Knowledge Societies in Africa
, pp. 247 - 258

DOI: https://doi.org/10.1017/9781108671088.014 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Bosch, S. E., & Eiselen, R. 2005. The effectiveness of morphological rules for an isiZulu spelling checker. South African Journal of African Languages, 25(1): 25–36.CrossRef Google Scholar

Brownstein, J. S., Freifeld, C. C., & Madoff, L. C. 2009. Digital disease detection harnessing the Web for public health surveillance. New England Journal of Medicine, 360: 2153–2157.Google Scholar

Busch, B., Busch, L., & Press, K. 2014. Interviews with Neville Alexander: The Power of Languages against the Language Power. Pietermaritzburg: UKZN Press.Google Scholar

Crystal, D. 2003. [1978]. A Dictionary of Linguistics and Phonetics (5th ed.). Oxford: Blackwell.Google Scholar

De Schryver, G.-M., & Prinsloo, D. 2000. The compilation of electronic corpora, with special reference to the African languages. Southern African Linguistics and Applied Language Studies 18(1–4): 89–106.Google Scholar

De Schryver, G.-M., & Prinsloo, D. 2004. Spellcheckers for the South African languages. Part 1: The status quo and options for improvement. South African Journal of African Languages, 24(1): 57–82.Google Scholar

Finlayson, R., & Madiba, M. 2002. The intellectualization of the Indigenous Languages of South Africa: Challenges and Prospects. Current Issues in Language Planning, 3(1): 40–61.Google Scholar

Havranek, B. 1932. The functions of literary language and its cultivation. In Havranek, B. & Weingart, M. (Eds.). A Prague School Reader on Esthetics, Literary Structure and Style. Prague: Melantrich, pp. 32–84.Google Scholar

Kaschula, R. H., & Maseko, P. 2014. The intellectualisation of African languages, multilingualism and education: A research-based approach. Alternation Special Edition, 13: 8–35.Google Scholar

Kamwangamalu, N. M. 2010. Vernacularization, globalization, and language economics in non English speaking countries in Africa. Language Problems & Language Planning, 34(1): 1–23.Google Scholar

Keet, C. M., & Khumalo, L. 2014a. Basics for a Grammar Engine to Verbalize Logical Theories in isiZulu. Paper presented at the International Workshop on Rules and Rule Markup Languages for the Semantic Web.Google Scholar

Keet, C. M., & Khumalo, L. 2014b. Toward Verbalizing Ontologies in isiZulu. Paper presented at the International Workshop on Controlled Natural Language.Google Scholar

Keet, C. M., & Khumalo, L. 2016. On the Verbalization Patterns of Part-Whole Relations in isiZulu. Paper presented at the Proceedings of INLG.Google Scholar

Keet, C. M., & Khumalo, L. 2017a. Evaluation of the effects of a spellchecker on the intellectualisation of IsiZulu. Alternations, 24(2): 75–97.Google Scholar

Keet, C. M., & Khumalo, L. 2017b. Grammar rules for the isiZulu complex verb. Southern African Linguistics and Applied Language Studies, 35(2): 183–200.Google Scholar

Keet, C. M., & Khumalo, L. 2017c. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 51(1): 131–57.Google Scholar

Khumalo, L. 2015a. Advances in developing corpora in African languages. Kuwala, 1(2): 21–30.Google Scholar

Khumalo, L. 2015b. Semi-automatic term extraction for an isiZulu linguistic terms dictionary using a corpus linguistic method. Lexikos, 25(1): 495–506.Google Scholar

Khumalo, L. 2016. Disrupting language hegemony: Intellectualizing African languages. In Samuel, M., Dhunpath, R., & Amin, N. (Eds). A Critical Response to Curriculum Reform in Higher Education: Undoing Cognitive Damage. Rotterdam: Sense Publishers.Google Scholar

Khumalo, L. 2017. Intellectualization through terminology development. Lexikos, 27(1): 252–264.CrossRef Google Scholar

Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., & Suchomel, V. 2014. The Sketch Engine: Ten years on. Lexicography, 1(1): 7–36.Google Scholar

Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. 2004. Itri-04-08 the sketch engine. Information Technology, 105: 116.Google Scholar

Kituku, B., Muchemi, L., & Nganga, W. 2016. Framework for many to one machine translation. International Journal of Advanced Research in Computer Science and Software Engineering, 6(5): 103–110.Google Scholar

Kotze, G., & Wolff, E. 2015. Syllabification and parameter optimization in Zulu to English Machine Translation. South African Computer Journal, 57: 1–23.Google Scholar

Language Policy of the University of KwaZulu-Natal. 2006. [2014]. Ref: CO/02/0109/06. Unpublished.Google Scholar

Malumba, N., Moukangwe, K., & Suleman, H. 2015. Afriweb: A web search engine for a marginalized language. LNCS, 9469: 180–189.Google Scholar

Mchombo, S. 2017. Politics of language choice in African education: The case of Kenya and Malawi. International Relations and Diplomacy Journal, 5(4): 181–204.Google Scholar

Mugane, J. 2006. Necrolinguistics: The linguistically stranded. In Mugane, J. et al. (eds.), Selected Proceedings of the 35th Annual Conference on African Linguistics. Somerville, MA: Cascadilla Proceedings Project, pp. 10–21.Google Scholar

Ndaba, B., Suleman, H., Keet, C. M., & Khumalo, L. 2016. The Effects of a Corpus on isiZulu Spellcheckers Based on N-grams. Paper presented at the IST-Africa Week Conference.Google Scholar

Nobles, W. 1986. African Psychology: Toward Its Reclamation, Revitalization and Re-Ascension. Oakland, CA: Black Family Institute.Google Scholar

Pak, A., & Paroubek, P. 2010. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Paper presented at the LREc.Google Scholar

Pretorius, L., & Bosch, S. E. 2003. Finite state computational morphology: An analyzer prototype for Zulu. Machine Translation, 18: 195–216.Google Scholar

Prinsloo, D., & de Schryver, M. 2004. Spellcheckers for the South African languages. Part 2: The utilization of clusters of circumfixes. South African Journal of African Languages, 24(1): 83–94.Google Scholar

Shizha, E. 2012. Reclaiming and re-visioning indigenous voices: The case of the language of instruction in science education in Zimbabwean primary schools. Literacy Information and Computer Education Journal (LICEJ), Special Issue, 1(1): 785–793.Google Scholar

Sibayan, Bonifacio P. 1999. The Intellectualization of Filipino and other Essays on Education and Sociolinguistics. The Linguistic Society of the Philippines. De La Salle University Press, Manila, p. 230.Google Scholar

Sithole, E. 2017. From Dialect to ‘Official’ Language: Towards the intellectualization of Ndau in Zimbabwe. Unpublished PhD dissertation. Rhodes University.Google Scholar

Spiegler, S., Van Der Spuy, A., & Flach, P. A. 2010. Ukwabelana: An Open-Source Morphological Zulu Corpus. Paper presented at the Proceedings of the 23rd International Conference on Computational Linguistics.Google Scholar

Spiegler, S. R. 2011. Machine Learning for the Analysis of Morphologically Complex Languages. Bristol: University of Bristol.Google Scholar