Book contents
- Frontmatter
- Contents
- Notes on Contributors
- 1 Introducing Arabic Corpus Linguistics
- 2 Under the Hood of arabiCorpus
- 3 Tunisian Arabic Corpus: Creating a Written Corpus of an ‘Unwritten’ Language
- 4 Accessible Corpus Annotation for Arabic
- 5 The Leeds Arabic Discourse Treebank: Guidelines for Annotating Discourse Connectives and Relations
- 6 Using the Web to Model Modern and Qurʾanic Arabic
- 7 Semantic Prosody as a Tool for Translating Prepositions in the Holy Qurʾan: A Corpus-Based Analysis
- 8 A Relational Approach to Modern Literary Arabic Conditional Clauses
- 9 Quantitative Approaches to Analysing come Constructions in Modern Standard Arabic
- 10 Approaching Text Typology through Cluster Analysis in Arabic
- Appendix: Arabic Transliteration Systems Used in This Book
- Index
6 - Using the Web to Model Modern and Qurʾanic Arabic
Published online by Cambridge University Press: 11 November 2020
- Frontmatter
- Contents
- Notes on Contributors
- 1 Introducing Arabic Corpus Linguistics
- 2 Under the Hood of arabiCorpus
- 3 Tunisian Arabic Corpus: Creating a Written Corpus of an ‘Unwritten’ Language
- 4 Accessible Corpus Annotation for Arabic
- 5 The Leeds Arabic Discourse Treebank: Guidelines for Annotating Discourse Connectives and Relations
- 6 Using the Web to Model Modern and Qurʾanic Arabic
- 7 Semantic Prosody as a Tool for Translating Prepositions in the Holy Qurʾan: A Corpus-Based Analysis
- 8 A Relational Approach to Modern Literary Arabic Conditional Clauses
- 9 Quantitative Approaches to Analysing come Constructions in Modern Standard Arabic
- 10 Approaching Text Typology through Cluster Analysis in Arabic
- Appendix: Arabic Transliteration Systems Used in This Book
- Index
Summary
Introduction
This chapter is not about a specific Arabic corpus, nor about the use of a corpus in an Arabic linguistics research project. I work in the School of Computing within the Faculty of Engineering at Leeds University, and engineers build things for others to use; so our contribution to Arabic corpus linguistics has been to develop a range of Arabic-language resources – corpora and software tools – for as wide a range of users as possible, including not just linguists but also computing and artificial intelligence researchers, religious scholars, and the general public.
In the School of Computing at Leeds University, we are not Arabic linguists, but we enjoy working with Arabic linguists. To explain our motivation for contributing to Arabic corpus linguistics, I will outline some examples of artificial intelligence and corpus linguistics research where we have worked with end users across interesting and challenging domains. We may have little or no expertise in a domain, but nevertheless we can apply machine learning to textual data from the domain to produce useful results.
Next I explain what I mean by the phrases ‘using the Web’, ‘to model’, ‘modern (Arabic)’, and ‘Qurʾanic Arabic’. This leads into a summary of the web-based software and corpus datasets developed by Leeds University researchers, covering Modern Standard Arabic and the Classical Arabic of the Qurʾan.
I conclude with some ideas for further development of Arabic corpus linguistics resources. Most Arabic linguistic research focuses on Modern Standard Arabic and modern Arabic dialects. However, modern Arabic linguists, lexicographers, and language teachers need to recognise and deal with the religious terms and quotations from Qurʾanic Arabic that can appear in modern Arabic texts. Furthermore, while Qurʾanic Arabic corpus research may be a minority interest in linguistics, it has huge potential for impact on society and the general public including Muslims worldwide who want to study and understand the Qurʾan.
- Type
- Chapter
- Information
- Arabic Corpus Linguistics , pp. 100 - 119Publisher: Edinburgh University PressPrint publication year: 2018