Hostname: page-component-cd9895bd7-q99xh Total loading time: 0 Render date: 2024-12-27T06:41:58.342Z Has data issue: false hasContentIssue false

Automatic question generation based on sentence structure analysis using machine learning approach

Published online by Cambridge University Press:  17 June 2021

Miroslav Blšták*
Affiliation:
Kempelen Institute of Intelligent Technologies, Mlynske nivy 5, Bratislava, Slovakia
Viera Rozinajová
Affiliation:
Kempelen Institute of Intelligent Technologies, Mlynske nivy 5, Bratislava, Slovakia
*
*Corresponding author. E-mail: [email protected]

Abstract

Automatic question generation is one of the most challenging tasks of Natural Language Processing. It requires “bidirectional” language processing: first, the system has to understand the input text (Natural Language Understanding), and it then has to generate questions also in the form of text (Natural Language Generation). In this article, we introduce our framework for generating the factual questions from unstructured text in the English language. It uses a combination of traditional linguistic approaches based on sentence patterns with several machine learning methods. We first obtain lexical, syntactic and semantic information from an input text, and we then construct a hierarchical set of patterns for each sentence. The set of features is extracted from the patterns, and it is then used for automated learning of new transformation rules. Our learning process is totally data-driven because the transformation rules are obtained from a set of initial sentence–question pairs. The advantages of this approach lie in a simple expansion of new transformation rules which allows us to generate various types of questions and also in the continuous improvement of the system by reinforcement learning. The framework also includes a question evaluation module which estimates the quality of generated questions. It serves as a filter for selecting the best questions and eliminating incorrect ones or duplicates. We have performed several experiments to evaluate the correctness of generated questions, and we have also compared our system with several state-of-the-art systems. Our results indicate that the quality of generated questions outperforms the state-of-the-art systems and our questions are also comparable to questions created by humans. We have also created and published an interface with all created data sets and evaluated questions, so it is possible to follow up on our work.

Type
Article
Copyright
© The Author(s), 2021. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Afzal, N. and Mitkov, R. (2014). Automatic generation of multiple choice questions using dependency-based semantic relations. Soft Computing 7, 12691281.CrossRefGoogle Scholar
Agarwal, M. and Mannem, P. (2011). Automatic gap-fill question generation from text books. In Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, pp. 5664.Google Scholar
Agarwal, M., Shah, R. and Mannem, P. (2011). Automatic question generation using discourse cues. In Proceedings of the 6th Workshop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, pp. 19.Google Scholar
Amidei, J., Piwek, P. and Willis, A. (2018). Evaluation methodologies in automatic question generation 20132018. In Proceedings of the 11th International Conference on Natural Language Generation, pp. 307317.CrossRefGoogle Scholar
Ali, H., Chali, Y. and Hasan, S. (2010). Automation of question generation from sentences. In Proceedings of QG2010: The Third Workshop on Question Generation, pp. 5867.Google Scholar
Anderson, L.W., Krathwohl, D.R., Airiasian, W., Cruikshank, K.A., Mayer, R.E. and Pintrich, P.R. (2001). A Taxonomy for Learning, Teaching and Assessing: A Revision of Bloom’s Taxonomy of Educational Outcomes: Complete Edition. New York: Longman.Google Scholar
Ashraf, A. and Khan, M.G. (2017) Effectiveness of data mining approaches to E-learning system: A survey. NFC IEFR Journal of Engineering and Scientific Research 4, 4957.CrossRefGoogle Scholar
Becker, L., Basu, S. and Vanderwende, L. (2012). Mind the gap: Learning to choose gaps for question generation. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, pp. 742751.Google Scholar
Bloom, B.S. (1956). Taxonomy of Educational Objectives: The Classification of Educational Goals. Cognitive Domain. Google Scholar
Blstak, M. and Rozinajova, V. (2016). Automatic question generation based on analysis of sentence structure. In International Conference on Text, Speech, and Dialogue. Cham: Springer, pp. 223230.CrossRefGoogle Scholar
Blstak, M. and Rozinajova, V. (2017) Machine learning approach to the process of question generation. International Conference on Text, Speech, and Dialogue Cham: Springer, pp. 102110.CrossRefGoogle Scholar
Blstak, M. and Rozinajova, V. (2018) Building an agent for factual question generation task. In 2018 World Symposium on Digital Intelligence for Systems and Machines (DISA). IEEE, pp. 143150.CrossRefGoogle Scholar
Brown, J.C., Frishkoff, G.A. and Eskenazi, M. (2005). Automatic question generation for vocabulary assessment. In Proceedings of the Conference on Human Language Technology and Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 819826.CrossRefGoogle Scholar
Buhrmester, M., Kwang, T. and Gosling, S.D. (2011). Amazon’s Mechanical Turk a new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science 6(1), 35.CrossRefGoogle Scholar
Colchester, K., Hagras, H., Alghazzawi, D. and Aldabbagh, G. (2017). A survey of artificial intelligence techniques employed for adaptive educational systems within E-learning platforms. Journal of Artificial Intelligence and Soft Computing Research 7(1), 4764.CrossRefGoogle Scholar
Chali, Y. and Hasan, S.A. (2012). Towards automatic topical question generation. In International Conference on Computational Linguistics (COLING), Mumbai, India, pp. 475492.Google Scholar
Chali, Y. and Hasan, S.A. (2015). Towards topic-to-question generation. Computational Linguistics 41, 120.CrossRefGoogle Scholar
Chali, Y. and Baghaee, T. (2018). Automatic opinion question generation. In Proceedings of the 11th International Conference on Natural Language Generation. Association for Computational Linguistics, pp. 152158.CrossRefGoogle Scholar
Chen, C.Y., Liou, H.C. and Chang, J.S. (2006). Fast: An automatic generation system for grammar tests. In Proceedings of the COLING/ACL on Interactive Presentation Sessions. Association for Computational Linguistics, pp. 14.Google Scholar
Chen, G., Yang, J., Hauff, C. and Houben, G.J. (2018). LearningQ: A large-scale dataset for educational question generation. In International AAAI Conference on Web and Social Media.Google Scholar
Curto, S., Mendes, A.C. and Coheur, L. (2012). Question generation based on lexico-syntactic patterns learned from the web. Dialogue & Discourse 3(2), 147175.CrossRefGoogle Scholar
d’Aquin, M. and Motta, E. (2011). Extracting relevant questions to an RDF dataset using formal concept analysis. In Proceedings of the Sixth International Conference on Knowledge Capture. Association for Computational Linguistics, pp. 521528.Google Scholar
Das, R., Ray, A., Mondal, S. and Das, D. (2016). A rule based question generation framework to deal with simple and complex sentences. In International Conference on IEEE Advances in Computing, Communications and Informatics (ICACCI), pp. 542548.CrossRefGoogle Scholar
Day, R. and Park, J. (2005). Developing reading comprehension questions. Reading in a Foreign Language 17, 6073.Google Scholar
Denkowski, M. and Lavie, A. (2011). Meteor 1.3: Automatic metric for reliable optimization and evaluation of machine translation systems. In Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, pp. 8591.Google Scholar
Divate, M. and Salgaonkar, A. (2017). Automatic question generation approaches and evaluation techniques. Current Science 113(9), 16831691.CrossRefGoogle Scholar
Du, X., Shao, J. and Cardie, C. (2017). Learning to Ask: Neural Question Generation for Reading Comprehension. arXiv preprint .CrossRefGoogle Scholar
Gatt, A. and Krahmer, E. (2017). Survey of the state of the art in natural language generation: Core tasks, applications and evaluation. Journal of Artificial Intelligence Research 61, 65170.CrossRefGoogle Scholar
Gawriljuk, G., Harth, A., Knoblock, C.A. and Szekely, P.V. (2016). A scalable approach to incrementally building knowledge graphs. In International Conference on Theory and Practice of Digital Libraries. Cham: Springer, pp. 188199.Google Scholar
Graesser, A.C., Person, N. and Huber, J. (1992). Mechanisms that generate questions. In Lauer, T., Peacock, E. and Graesser, A.C. (eds.), Questions and Information Systems Hillsdale, NJ: Erlbaum, pp. 167187.Google Scholar
Heilman, M. and Smith, N.A. (2010). Extracting simplified statements for factual question generation. In Proceedings of Workshop on Question Generation.Google Scholar
Hosking, T. and Riedel, S. (2019). Evaluating Rewards for Question Generation Models. .CrossRefGoogle Scholar
Jin, Y. and Le, P. (2016). Selecting domain-specific concepts for question generation with lightly-supervised methods. In Proceedings of the 9th International Natural Language Generation Conference, pp. 133142.CrossRefGoogle Scholar
Kalady, S., Elikkottil, A. and Das, R. (2010). Natural language question generation using syntax and keywords. In Proceedings of QG2010: The Third Workshop on Question Generation, pp. 110.Google Scholar
Karamanis, N. and Mitkov, R. (2006). Generating multiple-choice test items from medical text: A pilot study. In Proceedings of the Fourth International Natural Language Generation Conference. Association for Computational Linguistics, pp. 111113.Google Scholar
Kumar, G., Banchs, R.E. and D’Haro, L.F. (2015). Revup: Automatic gap-fill question generation from educational texts. In Proceedings of the Tenth Work-shop on Innovative Use of NLP for Building Educational Applications. Association for Computational Linguistics, pp. 154161.Google Scholar
Kumar, V., Ramakrishnan, G., and Li, Y.F. (2018). A Framework for Automatic Question Generation from Text Using Deep Reinforcement Learning. arXiv preprint .CrossRefGoogle Scholar
Kunichika, H., Katayama, T., Hirashima, T. and Takeuchi, A. (2011). Automated question generation methods for intelligent English learning systems and its evaluation. In Proceedings of the International Conference on Computers in Education, pp. 11171124.Google Scholar
Lee, J., Kim, G., Yoo, J., Jung, C., Kim, M. and Yoon, S. (2016). Training IBM Watson using automatically generated question-answer pairs. arXiv preprint .Google Scholar
Lin, C.Y. (2004). Rouge: A package for automatic evaluation of summaries. In Proceedings of the ACL 2004 Workshop on Text Summarization Branches Out, Barcelona, Spain, pp. 7481.Google Scholar
Lin, Y.C., Sung, L.C. and Chen, M.C. (2007). An automatic multiple-choice question generation scheme for English adjective understanding. In Proceedings of the 15th International Conference on Computers in Education, pp. 137142.Google Scholar
Lin, Y.T., Chen, M.C. and Sun, Y.S. (2009). Automatic text-coherence question generation based on coreference resolution. In Proceedings of the 17th International Conference on Computers in Education, Hong Kong, China, pp. 59.Google Scholar
Lindberg, D., Popowich, F., Nesbit, J. and Winne, P. (2013). Generating natural language questions to support learning on-line. In Proceedings of the 14th European Workshop on Natural Language Generation, pp. 105114.Google Scholar
Mannem, P., Prasad, R. and Joshi, A. (2010). Question generation from paragraphs at UPenn: QGSTEC system description. In Proceedings of QG2010: The Third Workshop on Question Generation, pp. 8491.Google Scholar
Mazidi, K. and Tarau, P. (2010). Infusing NLU into automatic question generation. In Proceedings of the 9th International Natural Language Generation Conference, pp. 5160.Google Scholar
Mitkov, R., Ha, L.A. and Karamanis, N. (2006). A computer-aided environment for generating multiple-choice test items. Natural Language Engineering 12(2): 177194.CrossRefGoogle Scholar
Mostow, J. and Jang, H. (2012). Generating diagnostic multiple choice comprehension cloze questions. In Proceedings of the Seventh Workshop on Building Educational Applications Using NLP, pp. 136146.Google Scholar
Olney, A.M., Graesser, A.C. and Person, N.K. (2012). Question generation from concept maps. Dialogue & Discourse 3(2), 7599.CrossRefGoogle Scholar
Pal, S., Mondal, T., Pakray, P., Das, D. and Bandyopadhyay, S. 2010. QGSTEC system description: JUQGG: A rule based approach. In Proceedings of QG2010: The Third Workshop on Question Generation, pp. 7679.Google Scholar
Papineni, K., Roukos, S., Ward, T. and Zhu, W.J. (2002). BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, pp. 311318.Google Scholar
Pearson, P.D. and Johnson, D.D. (1978). Teaching Reading Comprehension. New York: Holt, Rinehart and Winston.Google Scholar
Pikuliak, M., Simko, M. and Bielikova, M. (2021). Cross-lingual learning for text processing: A survey. Expert Systems with Applications 165.CrossRefGoogle Scholar
Rajpurkar, P., Zhang, J., Lopyrev, K. and Liang, P. (2016). SQuAD: 100,000+ Questions for Machine Comprehension of Text. In Proceedings Conference on Empirical Methods in Natural Language Processing, pp. 23832392.CrossRefGoogle Scholar
Reddy, S., Raghu, D., Khapra, M.M. and Joshi, S. (2017). Generating natural language question-answer pairs from a knowledge graph using a RNN based question generation model. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 1, 376385.Google Scholar
Rodrigues, P. H., Coheur, L. and Nyberg, E. (2016). QGASP: A framework for question generation based on different levels of linguistic information. In Proceedings of the 9th International Natural Language Generation Conference, pp. 242243.CrossRefGoogle Scholar
Rus, V., Wyse, B., Piwek, P., Lintean, M., Stoyanchev, S. and Moldovan, C. (2012). A detailed account of the first question generation shared task evaluation challenge. Dialogue & Discourse 3(2), 177204.CrossRefGoogle Scholar
Serban, I.V., Garca-Durán, A., Gulcehre, C., Ahn, S., Chandar, S., Courville, A. and Bengio, Y. (2016). Generating Factoid Questions with Recurrent Neural Networks: The 30m Factoid Question-Answer Corpus. arXiv preprint .CrossRefGoogle Scholar
Skalban, Y. (2013). Automatic Generation of Factual Questions from Video Documentaries. PhD Thesis, University of Wolverhampton, UK.Google Scholar
Song, L. and Zhao, L. (2016). Question Generation from a Knowledge Base with Web Exploration. arXiv preprint .Google Scholar
Song, L., Wang, Z., Hamza, W., Zhang, Y. and Gildea, D. (2018). Leveraging context information for natural question generation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 2, 569574.CrossRefGoogle Scholar
Sun, X., Liu, J., Lyu, Y., He, W., Ma, Y. and Wang, S. (2018). Answer-focused and position-aware neural question generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 39303939.CrossRefGoogle Scholar
Susanti, Y., Iida, R. and Tokunaga, T. (2015). Automatic generation of English vocabulary tests. In International Conference on Computer Supported Education, pp. 7787.CrossRefGoogle Scholar
Tang, D., Duan, N., Qin, T., Zhao, Y. and Zhou, M. (2017). Question Answering and Question Generation as Dual Tasks. arXiv preprint .Google Scholar
Varga, A. and Ha, L.A. (2010). WLV: A question generation system for the QGSTEC 2010 task B. In Proceedings of QG2010: The Third Workshop on Question Generation, pp. 8083.Google Scholar
Wang, Z., Lan, A.S., Nie, W., Waters, A.E., Grimaldi, P.J. and Baraniuk, R.G. (2018). QG-net: A data-driven question generation model for educational content. In Proceedings of the Fifth Annual ACM Conference on Learning at Scale, pp. 110.CrossRefGoogle Scholar
Yao, X., Bouma, G. and Zhang, Y. (2012). Semantics-based question generation and implementation. Dialogue & Discourse 3(2), 1142.CrossRefGoogle Scholar
Yao, X. and Zhang, Y. (2010). Question generation with minimal recursion semantics. In Proceedings of QG2010: The Third Workshop on Question Generation, pp. 6875.Google Scholar
Yuan, X., Wang, T., Gulcehre, C., Sordoni, A., Bachman, P., Subramanian, S., Zhang, S. and Trischler, A. (2017). Machine Comprehension by Text-to-Text Neural Question Generation. arXiv preprint .CrossRefGoogle Scholar
Zhou, Q., Yang, N., Wei, F., Tan, C., Bao, H. and Zhou, M. (2018). Neural question generation from text: A preliminary study. In National CCF Conference on Natural Language Processing and Chinese Computing, Hohhot, China, pp. 662671.Google Scholar