Context identification of sentences in research articles: Towards developing intelligent tools for the research community

M. A. ANGROSH; STEPHEN CRANEFIELD; NIGEL STANGER

doi:10.1017/S1351324912000277

Context identification of sentences in research articles: Towards developing intelligent tools for the research community

Published online by Cambridge University Press: 10 October 2012

M. A. ANGROSH ,

STEPHEN CRANEFIELD and

NIGEL STANGER

Show author details

M. A. ANGROSH: Affiliation:
Department of Information Science, University of Otago, Dunedin, New Zealand e-mail: [email protected], [email protected], [email protected]
STEPHEN CRANEFIELD: Affiliation:
Department of Information Science, University of Otago, Dunedin, New Zealand e-mail: [email protected], [email protected], [email protected]
NIGEL STANGER: Affiliation:
Department of Information Science, University of Otago, Dunedin, New Zealand e-mail: [email protected], [email protected], [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

Scientific literature is an important medium for disseminating scientific knowledge. However, in recent times, a dramatic increase in research output has resulted in challenges for the research community. An increasing need is felt for tools that exploit the full content of an article and provide insightful services with value beyond quantitative measures such as impact factors and citation counts. However, the intricacies of language and thought, and the unstructured format of research articles present challenges in providing such services. The identification of sentence contexts that encode the role of specific sentences in advancing an article's scientific argument can facilitate in developing intelligent tools for the research community. This paper describes our research work in this direction. First, we investigate the possibility of identifying contexts associated with sentences and propose a scheme of thirteen context type definitions for sentences, based on the generic rhetorical pattern found in scientific articles. We then present the results of our experiments using sequential classifiers – conditional random fields – for achieving automatic context identification. We also describe our Semantic Web application developed for providing citation context based information services for the research community. Finally, we present a comparison and analysis of our results with similar studies and explain the distinct features of our application.

Type: Articles
Information: Natural Language Engineering , Volume 19 , Issue 4 , October 2013 , pp. 481 - 515

DOI: https://doi.org/10.1017/S1351324912000277 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Angrosh, M. A., Cranefield, S., and Stanger, N. 2010. Context identification of sentences in related work sections using a conditional random field: towards intelligent digital libraries. In Hunter, J.et al. (eds.), Proceedings of the 2010 Joint Conference on Digital Libraries, pp. 293–302. New York: ACM.Google Scholar

Angrosh, M. A., Cranefield, S., and Stanger, N. 2011. Contextual information retrieval in research articles: semantic publishing tools for the research community. The Information Science Discussion Paper Series Number 2011/06, Department of Information Science, Dunedin, University of Otago.Google Scholar

Baldi, S. 1998. Normative versus social constructivist processes in the allocation of citations: a network-analytic model. American Sociological Review 63 (6): 829–46.CrossRef Google Scholar

Brooks, T. A. 1985. Private acts and public objects: an investigation of citer motivations. Journal of the American Society for Information Science 36 (4): 223–9.CrossRef Google Scholar

Brooks, T. A. 1986. Evidence of complex citer motivations. Journal of the American Society for Information Science 37 (1): 34–6.CrossRef Google Scholar

Buckingham Shum, S. J., Uren, V., Li, G., Sereno, B., and Mancini, C. 2007. Modelling naturalistic argumentation in research literatures: representation and interaction design issues. International Journal of Intelligent Systems 22 (1): 17–47.CrossRef Google Scholar

Case, D. O., and Higgins, G. M. 2000. How can we investigate citation behavior? A study of reasons for citing literature in communication. Journal of the American Society for Information Science 51 (7): 635–45.3.0.CO;2-H>CrossRef Google Scholar

Chubin, D. E., and Moitra, S. D. 1975. Content analysis of references: adjunct or alternative to citation counting? Social Studies of Science 5 (4): 423–41.CrossRef Google Scholar

Chung, G. Y. 2009. Sentence retrieval for abstracts of randomized controlled trials. BMC Medical Informatics and Decision Making 9 (10): 1–13.CrossRef Google Scholar PubMed

Elmezain, M., Al-Hamadi, A., Appenrodt, J., and Michaelis, B. 2008. A Hidden Markov Model-based continuous gesture recognition system for hand motion trajectory. In Proceedings of the 19th International Conference on Pattern Recognition, pp. 1–4. Tampa Florida: IEEE.Google Scholar

Finney, B. 1979. The Reference Characteristics of Scientific Texts. Master's thesis. London: The City University of London.Google Scholar

Frost, C. O. 1979. The use of citations in literary research: a preliminary classification of citation functions. The Library Quarterly 49 (4): 399–414.Google Scholar

Gaillard, J. 2008. The characteristics of R and D in developing countries: measuring R & D in developing countries, the UNESCO Institute of Statistics (UIS), March 2008. http://www.uis.unesco.org/template/pdf/S&T/Gaillard_final_report.pdf Google Scholar

Garfield, E. 1965. Can citation indexing be automated? In Stevens, M. E.et al. (eds.), Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, vol. 1, pp. 189–92. Washington: National Bureau of Standards Miscellaneous Publication.Google Scholar

Garzone, M. A. 1997. Automated Classification of Citations Using Linguistic Semantic Grammars. MSc thesis. London: University of Western Ontario.Google Scholar

Garzone, M., and Mercer, R. E. 2000. Towards an automated citation classifier. In Hamilton, H., and Yang, Q. (eds.), Canadian AI 2000, pp. 337–46. Lecture Notes in Artificial Intelligence, vol. 1822. Berlin: Springer-Verlag.Google Scholar

Guo, Y., Korhonen, A., Liakata, M., Silins, I., Sun, L., and Stenius, U. 2010. Identifying the information structure of scientific abstracts: an investigation of three different schemes. In Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, pp. 99–107. Uppsala, Sweden: Association of Computational Linguistics.Google Scholar

Hachey, B., and Grover, C. 2005. Sequence modelling for sentence classification in a legal summarisation system. In Proceedings of the 2005 ACM Symposium on Applied Computing, pp. 292–6. New York: ACM.CrossRef Google Scholar

Hirohata, K., Okazaki, N., Ananiadou, S., and Ishizuka, M. 2008. Identifying sections in scientific abstracts using conditional random fields. In Proceedings of the Third International Joint Conference on Natural Language Processing, Hyderabad, India, pp. 381–8.Google Scholar

Hodges, T. L. 1972. Citation Indexing: Its Potential for Bibliographic Control. PhD thesis. Berkeley: University of California.Google Scholar

Hu, J., Brown, M. K., and Turin, W. 1996. HMM based on-line handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 18 (10): 1039–45.Google Scholar

Kim, S. N., Martinez, D., Cavedon, L., and Yencken, L. 2011. Automatic classification of sentences to support evidence based medicine. BMC Bioinformatics 12 (Suppl 2):S5: 1–10.CrossRef Google Scholar PubMed

Kupiec, J. 1992. Robust part-of-speech tagging using a Hidden Markov Model. Computer Speech and Language 6: 225–42.CrossRef Google Scholar

Lafferty, J., McCallum, A., and Pereira, F. 2001. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In Proceedings of the Eighteenth International Conference on Machine Learning, pp. 282–9. Williamstown, MA, USA: Morgan Kaufmann.Google Scholar

Langer, H., Lüngen, H., and Bayerl, P. S. 2004. Text type structure and logical document structure. In Proceedings of the 2004 ACL Workshop on Discourse Annotation - DiscAnnotation '04, pp. 49–56, Barcelona, Spain: ACL.CrossRef Google Scholar

Lawrence, S., Giles, C. L., and Bollacker, K. 1999. Digital libraries and autonomous citation indexing. IEEE Computer 32 (6): 67–71.CrossRef Google Scholar

Le, M. H., Ho, T. B., and Nakamori, Y. 2006. Detecting citation types using finite-state machines. In PAKDD 2006, LNCS 3918, pp. 265–74, Berlin, Heidelberg: Springer-Verlag.Google Scholar

Li, H., Councill, I., Lee, W. C., and Giles, C. L. 2006. CiteSeerX: an architecture and web service design for an academic document search engine. In WWW 2006, pp. 883–4. New York: ACM.Google Scholar

Liakata, M. 2010. Zones of conceptualisation in scientific papers: a window to negative and speculative statements. In Proceedings of the Workshop on Negation and Speculation in Natural Language Processing, pp. 1–4, Uppsala, Sweden.Google Scholar

Liakata, M., Teufel, S., Siddharthan, A., and Batchelor, C. 2010. Corpora for the conceptualisation and zoning of scientific papers. In Proceedings of the International Conference on Language Resources and Evaluation (LREC 2010). Valletta, Malta: European Language Resources Association.Google Scholar

Lindsey, D., and Lindsey, T. 1978. The outlook of journal editors and referees on the normative criteria of scientific craftsmanship: viewpoints from psychology, social work, and sociology. Quality and Quantity 12: 45–62.CrossRef Google Scholar

Lipetz, B. 1965. Improvement of the selectivity of citation indexes to science literature through inclusion of citation relationship indicators. American Documentation 16 (2): 81–90.CrossRef Google Scholar

Marshall, A. 2009. Principles of Economics, 8th ed.New York: Cosimo.Google Scholar

McCallum, A. K. 2002. MALLET: a machine learning for language toolkit. http://mallet.cs.umass.edu Google Scholar

McCallum, A., Freitag, D., and Pereira, F. 2000. Maximum entropy Markov models for information extraction and segmentation. In Proceedings of the Seventeenth International Conference on Machine Learning, pp. 591–8. Stanford, CA: Morgan Kaufmann.Google Scholar

Mizuta, Y., and Collier, N. 2004a. An annotation scheme for a rhetorical analysis of biology articles. In Proceedings of the Fourth International Conference on Language Resource and Evaluation, (LREC 2004), Lisbon, Portugal.Google Scholar

Mizuta, Y., and Collier, N. 2004b. Zone identification in biology articles as a basis for information extraction. In Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications, (JNLPBA’ 04), pp. 29–35. Geneva, Switzerland: ACL.Google Scholar

Moravcsik, M. J., and Murugesan, P. 1975. Some results on the function and quality of citations. Social Studies of Science 5 (1): 86–92.CrossRef Google Scholar

Nanba, H., Kando, N., and Okumura, M. 2000. Classification of research papers using citation links and citation types: towards automatic review article generation. In Proceedings of the 11th SIG Classification Research Workshop, Classification for User Support and Learning, pp. 117–34, Chicago, USA.Google Scholar

Nanba, H., and Okumura, M. 1999. Towards multi-paper summarization retrieval of papers using reference information. In Dean, T. (ed.), IJCAI, pp. 926–31. Stockholm, Sweden: Morgan Kaufmann.Google Scholar

Oppenheim, C., and Renn, S. P. 1978. Highly cited old papers and the reasons why they continue to be cited. Journal of the American Society for Information Science 29 (5): 227–31.CrossRef Google Scholar

Oxford University Press (OUP). 2010. Oxford Dictionary of English. Stevenson, A. (ed.). Oxford, UK: Oxford University Press.Google Scholar

Pham, S. B., and Hoffmann, A. 2003. A new approach for scientific citation using cue phrases. In Gedeon, T. D., and Fung, L. C. C. (eds.), Australian Joint Conference in Artificial Intelligence, pp. 759–71. Berlin: Springer-Verlag.Google Scholar

Prabha, C. G. 1986. Some aspects of citation behavior: a pilot study in business administration. Journal of the American Society for Information Science 34 (3): 202–6.CrossRef Google Scholar

Rabiner, L. R. 1989. A tutorial on Hidden Markov Models and selected applications in speech recognition. Proceedings of the IEEE 77 (2): 257–86.CrossRef Google Scholar

Radoulov, R. 2008. Exploring Automatic Citation Classification. MSc thesis. Ontario: University of Waterloo.Google Scholar

Research4Life. (2009). Research output in developing countries reveals 194% increase in five years. Program Manager. http://www.research4life.org/Documents/Increase_in_developing_country_research_output.pdf Google Scholar

Sha, F., and Pereira, F. 2003. Shallow parsing with conditional random fields. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology – (NAACL’03), pp. 134–41. Edmonton, Canada: Association for Computational Linguistics.Google Scholar

Shadish, W. R., Tolliver, D., Gray, M., and Sen Gupta, S. K. 1995. Author judgements about works they cite: three studies from psychology journals. Social Studies of Science 25 (3): 477–98.CrossRef Google Scholar

Shatkay, H., Pan, F., Rzhetsky, A., and Wilbur, W. J. 2008. Multi-dimensional classification of biomedical text: toward automated, practical provision of high-utility text to diverse users. Bioinformatics 24 (18): 2086–93.CrossRef Google Scholar PubMed

Small, H. 1982. Citation context analysis. In Dervin, P., and Voigt, M. J. (eds.), Progress in Communication Sciences 3, pp. 287–310, Norwood, NJ: Ablex.Google Scholar

Soldatova, L., and Liakata, M. 2007. An ontology methodology and CISP - the proposed core information about scientific papers. JISC Technology and Standards Watch. Aberystwyth: The University of Wales. http://ie-repository.jisc.ac.uk/137/1/ReportCISP.pdf Google Scholar

Spiegel-Rosing, I. 1977. Science studies: bibliometric and content analysis. Social Studies of Science 7 (1): 97–113.CrossRef Google Scholar

Swales, J. 1984. Citation analysis and discourse analysis. Applied Linguistics 7 (1): 39–56.CrossRef Google Scholar

Tanguay, D. O. 1995. Hidden Markov Models for Gesture Recognition. MSc thesis. Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Boston.Google Scholar

Teufel, S. 1999. Argumentative Zoning: Information Extraction from Scientific Text. PhD thesis, University of Edinburgh, Edinburgh.Google Scholar

Teufel, S., and Moens, M. 1999. Discourse-level argumentation in scientific articles: human and automatic annotation. In ACL Workshop - Towards Standards and Tools for Discourse Tagging. Maryland, USA: ACL.Google Scholar

Teufel, S., Siddharthan, A., and Tidhar, D. 2006. Automatic classification of citation function. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 103–10. Sydney, Australia: ACL.CrossRef Google Scholar

White, H. D. 2004. Citation analysis and discourse analysis revisited. Applied Linguistics 25 (1): 89–116.CrossRef Google Scholar

Wilbur, W. J., Rzhetsky, A., and Shatkay, H. 2006. New directions in biomedical text annotation: definitions, guidelines and corpus construction. BMC Bioinformatics 7: 356.CrossRef Google Scholar PubMed

Article contents

Context identification of sentences in research articles: Towards developing intelligent tools for the research community

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests