A statistical method of evaluating the pronunciation proficiency/intelligibility of English presentations by Japanese speakers

Hiroshi Kibishi; Kuniaki Hirabayashi; Seiichi Nakagawa

doi:10.1017/S0958344014000251

A statistical method of evaluating the pronunciation proficiency/intelligibility of English presentations by Japanese speakers

Published online by Cambridge University Press: 23 May 2014

Hiroshi Kibishi ,

Kuniaki Hirabayashi and

Seiichi Nakagawa

Show author details

Hiroshi Kibishi: Affiliation:
Toyohashi University of Technology, Computer Science and Engineering, Japan ([email protected])
Kuniaki Hirabayashi: Affiliation:
Toyohashi University of Technology, Computer Science and Engineering, Japan ([email protected])
Seiichi Nakagawa: Affiliation:
Toyohashi University of Technology, Computer Science and Engineering, Japan ([email protected])

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

In this paper, we propose a statistical evaluation method of pronunciation proficiency and intelligibility for presentations made in English by native Japanese speakers. We statistically analyzed the actual utterances of speakers to find combinations of acoustic and linguistic features with high correlation between the scores estimated by the system and native English teachers. Our results showed that the best combination of acoustic features produced correlation coefficients of 0.929 and 0.753 for pronunciation and intelligibility scores, respectively, using open data for speakers at the 10-sentence level. In an offline test, we evaluated possibly-confusing pairs of phonemes that are often mispronounced by Japanese speakers of English. In addition, we developed an online real-time score estimation system for Japanese learners of English using offline techniques to evaluate the pronunciation and intelligibility scores in real-time with almost the same ability as English teachers. Finally, we show that both the objective and subjective evaluations improved after learning with our system.

Keywords

English learning pronunciation evaluation intelligibility evaluation offline/online execution Japanese speakers

Type: Research Article
Information: ReCALL , Volume 27 , Issue 1 , January 2015 , pp. 58 - 83

DOI: https://doi.org/10.1017/S0958344014000251 [Opens in a new window]
Copyright: Copyright © European Association for Computer Assisted Language Learning 2015

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

Acoustical Society of America SII. Speech Intelligibility Index. http://www.sii.to/index.html Google Scholar

Aist, G. (1999) Speech recognition in computer-assisted language learning. In: Cameron, K. (ed.), Computer Assisted Language learning; Media, Design and applications. Lisse, The Netherlands: Swets & Zeitlinger, 165–181.Google Scholar

ATR Institute of Human Information. (2000) Full version Scientific Progress Method for English speaking. Tokyo, Japan: Kodansha.Google Scholar

ATR. (1999) Full version Scientific Progress Method for English Speaking. Tokyo, Japan: Kodansha.Google Scholar

Cucchiarini, C., Strik, H. and Bovels, L. (2000) Different aspects of expert pronunciation quality ratings and their relation to scores produced by speech recognition algorithms. Speech Communication, 30(2–3): 109–119.Google Scholar

Eskenazi, M., Kennedy, A., Ketchum, C., Olszewski, R. and Pelton, G. (2007) The native accent pronunciation tutor: measuring success in the real world. Proceedings of SIG-SlaTE. Baixas, France: ISCA, 124–127.Google Scholar

Falk, T. H., Chan, W. and Shein, F. (2012) Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Communication, 54(5): 622–631.Google Scholar

Franco, H., Neumeyer, L., Kim, Y. and Ronen, O. (1997) Automatic pronunciation scoring for language instruction. Proceedings of ICASSP. New York: IEEE, 1471–1474.Google Scholar

Fujisawa, Y., Minematsu, N. and Nakagawa, S. (1998) Evaluation of Japanese manners of generation word accent of English based on a stressed syllable detection technique. Proceedings of ICSLP. Baixas, France: ISCA, 3103–3106.Google Scholar

Garofalo, J. D., Graff, D. Paul, and Pallett, D. (2007) CSR-I (WSJ0) Complete Linguistic Data Consortium. Philadelphia, USA: LDC.Google Scholar

Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L. and Zue, V. (1993) TIMIT Acoustic-Phonetic Continuous Speech Corpus. Linguistic Data Consortium. Philadelphia, USA: LDC.Google Scholar

Grant, T. (2008) Tactics for TOEIC Listering and Reading Test Student Book. Oxford, UK: Oxford University Press.Google Scholar

Hirabayashi, K. and Nakagawa, S. (2010) Automatic evaluation of English pronunciation by Japanese speakers using various acoustic features and pattern recognition techniques. Proceedings of Interspeech. Baixas, France: ISCA, 598–601.CrossRef Google Scholar

Holliday, J. J., Beckman, M. E. and Mays, C. (2010) Did you say susi or shushi? measuring the emergence of robust fricative contrasts in English- and Japanese-acquiring children. Proceedings of Interspeech. Baixas, France: ISCA, 1886–1889.Google Scholar

Itou, K., Yamamoto, M., Takeda, K., Takezawa, T., Matsuoka, T., Kobayashi, T. and Shikano, K. (1999) JNAS: Japanese speech corpus for large vocabulary continuous speech recognition research. Journal of the Acoustical Society of Japan (E), 20(3): 199–206.Google Scholar

Karafiat, M., Janda, M., Cernocky, J. and Burget, L. (2012) Region dependent linear transforms in multilingual speech recognition. Proceedings of ICASSP. New York: IEEE, 4885–4888.Google Scholar

Kawahara, T. and Minematsu, N. (2011) Tutorial on Computer-assisted language learning (CALL) based on speech technologies. Proceedings of APSIPA Tutorial session. Hong Kong: APSIPA.Google Scholar

Kibishi, H. and Nakagawa, S. (2011) New feature parameters for pronunciation evaluation in English presentations at international conferences. Proceedings of Interspeech. Baixas, France: ISCA, 1149–1152.Google Scholar

Kibishi, H., Hirabayashi, K. and Nakagawa, S. (2012) Development of Online Evaluation System of English Pronunciation Score/Intelligibility for Japanese. Proceedings of Acoustical Society of Japan (in Japanese), Tokyo, Japan: ASJ, 499–502.Google Scholar

Kobayashi, T., Itahashi, S., Hayamizu, S. and Takezawa, T. (1992) ASJ continuous speech corpus for research. Journal of the Acoustical Society of Japan (J) (in Japanese), 48(12): 888–893.Google Scholar

Koniaris, C. and Engwall, O. (2011) Perceptual differentiation modeling explains phoneme mispronunciation by non-native speakers. Proceedings of ICASSP. New York: IEEE, 5704–5707.Google Scholar

Li, H, Wang, S., Liang, J., Huang, S. and Xu, B. (2009) High performance automatic mispronunciation detection method based on neural network and TRAP features. Proceedings of Interspeech. Baixas, France: ISCA, 1911–1914.Google Scholar

Minematsu, N., Tomiyama, Y., Yoshimoto, K., Shimizu, K., Nakagawa, S., Dantsuji, M. and Makino, S. (2002) English Speech Database Read by Japanese Learners for CALL System Development. Proceedings of of the International Conference on Language Resources and Evaluation (LREC 2002) Paris, France: ERLA, 896–903.Google Scholar

Nakagawa, S., Reyes, A. A., Suzuki, H. and Taniguchi, Y. (1997) An English conversation and pronunciaiton CAI system using speech recognition technology. Proceedings of Eurospeech. Baixas, France: ISCA, 705–708.Google Scholar

Nakagawa, S. and Ohta, K. (2007) A statistical method of evaluating pronunciation proficiency for presentation in English. Proceedings of Interspeech. Baixas, France: ISCA, 2317–2320.Google Scholar

Nakagawa, S., Reyes, A., Suzuki, A., Reyes, H., Allen, A. and Taniguchi, Y. (1997) An English conversation CAI system using speech recognition technology, (in Japanese). Trans. Information Processing Society in Japan, 38(8): 1649–1658.Google Scholar

Nakamura, N., Nakagawa, S. and Mori, K. (2004) A statistical method of evaluating pronunciation proficiency for English works spoken by Japanese. IEICE Trans. Information and Systems, E87–D(7): 1917–1922.Google Scholar

Neri, A., Cucchiarini, C. and Strik, H. (2008) The effectiveness of computer-based speech corrective feedback for improving segmental quality in L2 Dutch. ReCall, 20(2): 225–243.Google Scholar

Neumeyer, L., Franco, H., Weintraub, M. and Price, P. (1996) Automatic text-independent pronunciation scoring of foreign language student speech. Proceedings of ICSLP. Baixas, France: ISCA, 1457–1460.Google Scholar

Ohta, K. and Nakagawa, S. (2005) A statistical method of evaluating pronunciation proficiency for Japanese words. Proceedings of Interspeech. Baixas, France: ISCA, 2233–2236.Google Scholar

Ramos, M., Franco, H., Neumeyer, L. and Bratt, H. (1999) Automatic detection of phone-level mispronunciation for language learning. Proceedings of EuroSpeech. Baixas, France: ISCA, 851–854.Google Scholar

Ronen, O., Neumeyer, L. and Franco, H. (1997) Automatic detection of mispronunciation for language instruction. Proceedings of Eurospeech. Baixas, France: ISCA, 645–648.Google Scholar

Smit, P. and Kurimo, M. (2011) Using stacked transformations for recognizng foreign accented speech. Proceedings of IEEE. New York: IEEE, 5008–5111.Google Scholar

Stenson, N., Downing, B., Smith, J. and Smith, K. (1992) The effectiveness of computer-assisted pronunciation training. CALICO Journal, 9(4): 5–19.Google Scholar

TED Translanguage English Database. http://www.elda.org/catalogue/en/speech/S0031.html Google Scholar

Tsubota, Y., Kawahara, T. and Dantsuji, M. (2002) Recognition and verification of English by Japanese students for computer-assisted language learning system. Proceedings of ICSLP. Baixas, France: ISCA, 1205–1208.Google Scholar

Wang, Y.-B. and Lee, L.-S. (2012) Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training. Proceedings of ICASSP. New York: IEEE, 5049–5052.Google Scholar

Witt, S. and Young, S. (1999) Computer-Assisted pronunciation teaching based on automatic speech recognition. In: Jager, S., Nerbonne, J. and Essen, A. V. (eds.), Language Teaching and Language Technology. Lisse, The Netherlands: Swets & Zeitlinger, 25–35.Google Scholar

Wu, C., Su, H. and Liu, C. (2012) Efficient personalized mispronunciation detection of Taiwanese-accented English speech based on unsupervised model adaptation and dynamic sentence selection. Computer Assisted Language Learning, 23(5): 446–467.Google Scholar

Yoon, S.-Y, Hasegawa-Johnson, M. and Sproat, R. (2009) Automated pronunciation scoring using confidence scoring and landmark-based SVM. Proceedings of Interspeech. Baixas, France: ISCA, 1903–1906.Google Scholar

Young, S. and Witt, S. (1999) Offline acoustic modeling of nonnative accents. Proceedings of Eurospeech. Baixas, France: ISCA, 1367–1370.Google Scholar

Zhao, Y. and He, X. (2001) Model complexity optimization for nonnative English speakers. Proceedings of Eurospeech. Baixas, France: ISCA, 1461–1463.Google Scholar

Article contents

A statistical method of evaluating the pronunciation proficiency/intelligibility of English presentations by Japanese speakers

Abstract

Keywords

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests