Skip to main content Accessibility help
×
Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2025-01-02T17:06:05.307Z Has data issue: false hasContentIssue false

12 - Corpora as Agency in the Intellectualisation of African Languages

from Part III - Digitalisation and Democratisation of Knowledge

Published online by Cambridge University Press:  18 September 2020

Russell H. Kaschula
Affiliation:
Rhodes University, South Africa
H. Ekkehard Wolff
Affiliation:
Universität Leipzig
Get access

Summary

The chapter critically examines the development of corpora that is being driven at the University of KwaZulu-Natal as one of the key agents of language intellectualisation. The chapter critically evaluates the architecture of the two types of corpora. The first corpus is the isiZulu National Corpus (INC). The INC is an organic corpus of 30 million tokens. It is designed as a monitor corpus, and an important precursor to the development of isiZulu human language technologies. It will be evinced that crucial to the development of the isiZulu spellchecker is the INC, which was used to train the checker. The second type of corpus is an English-IsiZulu Parallel Corpus (EIPC), with a modest size of fifty e-files of each natural language. A parallel corpus is a collection of identical texts in two natural languages, processed and stored in machine-readable format. The EIPC is crucial in the development of automated machine translations between English and isiZulu. The development of a machine translation tool using computational processes requires a parallel corpus such as EIPC as an agent and follows the tenets of the Data-Driven Machine Translation (DDMT) approach. The chapter outlines the imperative to develop both the INC and the EIPC. The chapter further shows that the two corpora are key components in the intellectualisation of isiZulu as a digital, scientific, natural language.

Type
Chapter
Information
The Transformative Power of Language
From Postcolonial to Knowledge Societies in Africa
, pp. 247 - 258
Publisher: Cambridge University Press
Print publication year: 2020

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Bosch, S. E., & Eiselen, R. 2005. The effectiveness of morphological rules for an isiZulu spelling checker. South African Journal of African Languages, 25(1): 25–36.CrossRefGoogle Scholar
Brownstein, J. S., Freifeld, C. C., & Madoff, L. C. 2009. Digital disease detection harnessing the Web for public health surveillance. New England Journal of Medicine, 360: 2153–2157.Google Scholar
Busch, B., Busch, L., & Press, K. 2014. Interviews with Neville Alexander: The Power of Languages against the Language Power. Pietermaritzburg: UKZN Press.Google Scholar
Crystal, D. 2003. [1978]. A Dictionary of Linguistics and Phonetics (5th ed.). Oxford: Blackwell.Google Scholar
De Schryver, G.-M., & Prinsloo, D. 2000. The compilation of electronic corpora, with special reference to the African languages. Southern African Linguistics and Applied Language Studies 18(1–4): 89–106.Google Scholar
De Schryver, G.-M., & Prinsloo, D. 2004. Spellcheckers for the South African languages. Part 1: The status quo and options for improvement. South African Journal of African Languages, 24(1): 57–82.Google Scholar
Finlayson, R., & Madiba, M. 2002. The intellectualization of the Indigenous Languages of South Africa: Challenges and Prospects. Current Issues in Language Planning, 3(1): 40–61.Google Scholar
Havranek, B. 1932. The functions of literary language and its cultivation. In Havranek, B. & Weingart, M. (Eds.). A Prague School Reader on Esthetics, Literary Structure and Style. Prague: Melantrich, pp. 32–84.Google Scholar
Kaschula, R. H., & Maseko, P. 2014. The intellectualisation of African languages, multilingualism and education: A research-based approach. Alternation Special Edition, 13: 8–35.Google Scholar
Kamwangamalu, N. M. 2010. Vernacularization, globalization, and language economics in non English speaking countries in Africa. Language Problems & Language Planning, 34(1): 1–23.Google Scholar
Keet, C. M., & Khumalo, L. 2014a. Basics for a Grammar Engine to Verbalize Logical Theories in isiZulu. Paper presented at the International Workshop on Rules and Rule Markup Languages for the Semantic Web.Google Scholar
Keet, C. M., & Khumalo, L. 2014b. Toward Verbalizing Ontologies in isiZulu. Paper presented at the International Workshop on Controlled Natural Language.Google Scholar
Keet, C. M., & Khumalo, L. 2016. On the Verbalization Patterns of Part-Whole Relations in isiZulu. Paper presented at the Proceedings of INLG.Google Scholar
Keet, C. M., & Khumalo, L. 2017a. Evaluation of the effects of a spellchecker on the intellectualisation of IsiZulu. Alternations, 24(2): 75–97.Google Scholar
Keet, C. M., & Khumalo, L. 2017b. Grammar rules for the isiZulu complex verb. Southern African Linguistics and Applied Language Studies, 35(2): 183–200.Google Scholar
Keet, C. M., & Khumalo, L. 2017c. Toward a knowledge-to-text controlled natural language of isiZulu. Language Resources and Evaluation, 51(1): 131–57.Google Scholar
Khumalo, L. 2015a. Advances in developing corpora in African languages. Kuwala, 1(2): 21–30.Google Scholar
Khumalo, L. 2015b. Semi-automatic term extraction for an isiZulu linguistic terms dictionary using a corpus linguistic method. Lexikos, 25(1): 495–506.Google Scholar
Khumalo, L. 2016. Disrupting language hegemony: Intellectualizing African languages. In Samuel, M., Dhunpath, R., & Amin, N. (Eds). A Critical Response to Curriculum Reform in Higher Education: Undoing Cognitive Damage. Rotterdam: Sense Publishers.Google Scholar
Khumalo, L. 2017. Intellectualization through terminology development. Lexikos, 27(1): 252–264.CrossRefGoogle Scholar
Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., & Suchomel, V. 2014. The Sketch Engine: Ten years on. Lexicography, 1(1): 7–36.Google Scholar
Kilgarriff, A., Rychly, P., Smrz, P., & Tugwell, D. 2004. Itri-04-08 the sketch engine. Information Technology, 105: 116.Google Scholar
Kituku, B., Muchemi, L., & Nganga, W. 2016. Framework for many to one machine translation. International Journal of Advanced Research in Computer Science and Software Engineering, 6(5): 103–110.Google Scholar
Kotze, G., & Wolff, E. 2015. Syllabification and parameter optimization in Zulu to English Machine Translation. South African Computer Journal, 57: 1–23.Google Scholar
Language Policy of the University of KwaZulu-Natal. 2006. [2014]. Ref: CO/02/0109/06. Unpublished.Google Scholar
Malumba, N., Moukangwe, K., & Suleman, H. 2015. Afriweb: A web search engine for a marginalized language. LNCS, 9469: 180–189.Google Scholar
Mchombo, S. 2017. Politics of language choice in African education: The case of Kenya and Malawi. International Relations and Diplomacy Journal, 5(4): 181–204.Google Scholar
Mugane, J. 2006. Necrolinguistics: The linguistically stranded. In Mugane, J. et al. (eds.), Selected Proceedings of the 35th Annual Conference on African Linguistics. Somerville, MA: Cascadilla Proceedings Project, pp. 10–21.Google Scholar
Ndaba, B., Suleman, H., Keet, C. M., & Khumalo, L. 2016. The Effects of a Corpus on isiZulu Spellcheckers Based on N-grams. Paper presented at the IST-Africa Week Conference.Google Scholar
Nobles, W. 1986. African Psychology: Toward Its Reclamation, Revitalization and Re-Ascension. Oakland, CA: Black Family Institute.Google Scholar
Pak, A., & Paroubek, P. 2010. Twitter as a Corpus for Sentiment Analysis and Opinion Mining. Paper presented at the LREc.Google Scholar
Pretorius, L., & Bosch, S. E. 2003. Finite state computational morphology: An analyzer prototype for Zulu. Machine Translation, 18: 195–216.Google Scholar
Prinsloo, D., & de Schryver, M. 2004. Spellcheckers for the South African languages. Part 2: The utilization of clusters of circumfixes. South African Journal of African Languages, 24(1): 83–94.Google Scholar
Shizha, E. 2012. Reclaiming and re-visioning indigenous voices: The case of the language of instruction in science education in Zimbabwean primary schools. Literacy Information and Computer Education Journal (LICEJ), Special Issue, 1(1): 785–793.Google Scholar
Sibayan, Bonifacio P. 1999. The Intellectualization of Filipino and other Essays on Education and Sociolinguistics. The Linguistic Society of the Philippines. De La Salle University Press, Manila, p. 230.Google Scholar
Sithole, E. 2017. From Dialect to ‘Official’ Language: Towards the intellectualization of Ndau in Zimbabwe. Unpublished PhD dissertation. Rhodes University.Google Scholar
Spiegler, S., Van Der Spuy, A., & Flach, P. A. 2010. Ukwabelana: An Open-Source Morphological Zulu Corpus. Paper presented at the Proceedings of the 23rd International Conference on Computational Linguistics.Google Scholar
Spiegler, S. R. 2011. Machine Learning for the Analysis of Morphologically Complex Languages. Bristol: University of Bristol.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×