Review of recent empirical research (2011–2018) on language assessment in China

Shangchao Min; Lianzhen He; Jie Zhang

doi:10.1017/S0261444820000051

Review of recent empirical research (2011–2018) on language assessment in China

Published online by Cambridge University Press: 15 May 2020

Shangchao Min ,

Lianzhen He and

Jie Zhang

Show author details

Shangchao Min: Affiliation:
Institute of Applied Linguistics, Zhejiang University, Hangzhou, China
Lianzhen He*: Affiliation:
Institute of Applied Linguistics, Zhejiang University, Hangzhou, China
Jie Zhang: Affiliation:
School of Foreign Language Studies at Shanghai University of Finance and Economics, China
*: *Corresponding author. E-mail: [email protected]

Article contents

Abstract
References

Get access

Rights & Permissions

Abstract

This article reviews a selected sample of 70 empirical studies in journal articles and doctoral dissertations on language assessment in China between 2011 and 2018. Following a brief introduction to the history and current state of language assessment in China, the article presents a critical review of language assessment research on six themes that have aroused the greatest interest from researchers in the country, including (1) test reliability and validity; (2) factors affecting test performance; (3) rating and rating scales; (4) technology and language testing; (5) test washback; and (6) classroom-based assessment. In addition to situating the commentary on the studies within the social, cultural and historical contexts of China, this article outlines the scholarly contributions of these studies to the wider international field of language learning, teaching and assessment. It concludes with recommendations on areas in need of further development over the coming decades.

Type: A Country in Focus
Information: Language Teaching , Volume 53 , Issue 3 , July 2020 , pp. 316 - 340

DOI: https://doi.org/10.1017/S0261444820000051 [Opens in a new window]
Copyright: Copyright © The Author(s), 2020. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Article purchase

Temporarily unavailable

References

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: Author.Google Scholar

Anderson, L. W., Krathwohl, D. R., & Bloom, B. S. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom's taxonomy of educational objectives. Boston, MA: Allyn & Bacon.Google Scholar

Ansaldo, A. I., Kahlaoui, K., & Joanette, Y. (2012). Functional near-infrared spectroscopy: Looking at the brain and language mystery from a different angle. Brain & Language, 121(2), 77–78.CrossRef Google Scholar PubMed

Bachman, L. (2000). Modern language testing at the turn of the century: Assuring that what we count counts. Language Testing, 17(1), 1–42.CrossRef Google Scholar

Bachman, L. (2004). Statistical analyses for language assessment. Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Bachman, L., & Palmer, A. (2010). Language assessment in practice. Oxford, UK: Oxford University Press.Google Scholar

Bai, L., Feng, L., & Yan, M. (2018). 中国英语笔译能力等级量表的构念与原则. Modern Foreign Languages, 41(1), 101–110.Google Scholar

Black, P., & Wiliam, D. (2009). Developing the theory of formative assessment. Educational Assessment Evaluation & Accountability, 21(1), 5–31.CrossRef Google Scholar

Cao, R., & Chen, Y. (2013). 形成性评估及其在口译教学中的应用探析. Chinese Translator Journal, 1, 45–50.Google Scholar

Chalhoub-Deville, M. (2016). Validity theory: Reform policies, accountability testing, and consequences. Language Testing, 33(4), 453–472.CrossRef Google Scholar

Chapelle, C., Enright, M., & Jamieson, J. (2008). Building a validity argument for TOEFL. New York, NY: Routledge.Google Scholar

Chen, J. (2013). Assessment construct in foreign language teaching: The case of Chinese assessors of high-stake exam essays written in English (Ph.D. dissertation). Shanghai International Studies University.Google Scholar

Cheng, L. (2008). The key to success: English language testing in China. Language Testing, 25(1), 15–37.CrossRef Google Scholar

Cheng, L., & Curtis, A. (2010). The impact of English language assessment and the Chinese learner in China and beyond. In Cheng, L., & Curtis, A. (Eds.), English language assessment and the Chinese learner (pp. 267–273). New York, NY: Routledge: Taylor & Francis Group.CrossRef Google Scholar

Cheng, L., & Fox, J. (2017). Assessment in the language classroom: Teachers supporting student learning. London, UK: Palgrave.CrossRef Google Scholar

Dai, Z. (2011). 计算机口语考试信度研究. Computer-assisted Foreign Language Education, 138, 45–50.Google Scholar

Deng, J., & Deng, H. (2017). 中国英语能力等级量表的写作策略框架研究. Foreign Language World, 179(2), 29–36.Google Scholar

Dong, L. (2014). 全国高考北京市英语考试对高中英语教学的反拨效应研究 (Ph.D. dissertation). Shanghai International Studies University.Google Scholar

Du, W., & Ma, X. (2018). 基于认知诊断评估的英语阅读诊断模型构建. Foreign Language Teaching and Research, 50(1), 74–87.Google Scholar

Fan, J., Ji, P., & Yu, L. (2014). 语言测试效度研究的另一视角：考试的因子结构研究. Foreign Language Learning: Theory and Practice, 4, 34–40.Google Scholar

Gao, X. (2014). 视听测试中考生观看行为影响因素的实证研究. Foreign Language Learning: Theory and Practice, 1, 64–71.Google Scholar

Green, A. (2013). Washback in language assessment. International Journal of English Studies, 13(2), 39–51.CrossRef Google Scholar

Gu, X. (2004). Positive or negative? An empirical study of CET washback on college English teaching and learning in China (Ph.D. dissertation). Shanghai Jiao Tong University.Google Scholar

Gui, S. (2015). 我国英语教育的再思考——实践篇. Modern Foreign Languages, 38(5), 687–704.Google Scholar

Guo, S., & Li, F. (2012). 大学英语网络考试对大学英语教师专业发展的反拨效应研究. Computer-assisted Foreign Language Education, 147, 72–76.Google Scholar

Hamp-Lyons, L. (2002). The scope of writing assessment. Assessing Writing, 8, 5–16.CrossRef Google Scholar

Han, B., & Huang, Y. (2018). 中国英语能力等级量表的研制——语用能力的界定与描述. Modern Foreign Languages, 41(1), 91–100.Google Scholar PubMed

He, J., & Wang, F. (2012). 语法自动分析与计算机辅助写作评分. Foreign Languages and Their Teaching, 267(6), 61–65.Google Scholar

He, L. (2010). The graduate school entrance English examination. In Cheng, L., & Curtis, A. (Eds.), English language assessment and the Chinese learner (pp. 145–157). New York, NY: Routledge.Google Scholar

He, L., & Chen, D. (2017). Developing common listening ability scales for Chinese learners of English. Language Testing in Asia, 7(4), 1–12.CrossRef Google Scholar

He, L., Chen, D., & Min, S. (2018). 英语听力测试中测试方法对任务难度的影响研究. Modern Foreign Languages, 41(1), 43–54.Google Scholar

He, L., & Min, S. (2012). 学生外语水平对其在独立写作与综合写作中写作结果的影响. Foreign Languages and Their Teaching, 265(4), 43–47.Google Scholar

He, L., & Sun, Y. (2015). 提示特征对中国学生综合写作任务的影响研究. Foreign Language Teaching and Research, 47(2), 237–250.Google Scholar

Higgins, L., & Sun, C. (2002). The development of psychological testing in China. International Journal of Psychology, 37(4), 246–254.CrossRef Google Scholar

Hoang, G. T. L., & Kunnan, A. (2016). Automated essay evaluation for English language learners: A case study of MY Access. Language Assessment Quarterly, 13(4), 359–376.CrossRef Google Scholar

Hu, X. (2015). 在线作文自我修改对大学生英语写作结果的影响. Computer-assisted Foreign Language Education, 163, 45–49.Google Scholar

Huang, X. (2012). An investigation into the effects of background knowledge intervention on English reading comprehension (Ph.D. dissertation). Zhejiang University.Google Scholar

Jiang, J., Wang, L., & Wang, Z. (2012). 学生英译汉分析性评分标准的研制. Foreign Languages and Their Teaching, 267(6), 56–60.Google Scholar

Jiang, J., & Wen, Q. (2012). 大规模测试中学生英译汉机器评分模型的构建. Computer-assisted Foreign Language Education, 144, 3–8.Google Scholar

Jie, W., & Jin, Y. (2017). 口语能力描述语的语体分析:基于中国英语能力等级量表的研究. Foreign Language World, 179(2), 20–28.Google Scholar

Jin, Y. (2010). The place of language testing and assessment in the professional preparation of foreign language teachers in China. Language Testing, 27(4), 555–584.CrossRef Google Scholar

Jin, Y., & Cheng, L. (2013). 影响高风险考试效度的心理因素研究. Modern Foreign Languages, 36(1), 62–69.Google Scholar

Jin, Y., & Fan, J. (2011). Test for English Majors (TEM) in China. Language Testing, 28(4), 589–596.CrossRef Google Scholar

Jin, Y., & Jie, W. (2017). 中国英语能力等级量表的’口语量表’制定原则和方法. Foreign Language World, 179(2), 10–19.Google Scholar

Jin, Y., & Yang, H. (2006). The English proficiency of college and university students in China: As reflected in the CET. Language. Culture and Curriculum, 19(1), 21–36.Google Scholar

Jin, Y., & Zhang, X. (2013). 技能综合对语言测试构念效度的影响—培生英语考试与大学英语六级网考的对比研究. Computer-assisted Foreign Language Education, 154, 3–10.Google Scholar PubMed

Kane, M. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50, 1–73.CrossRef Google Scholar

Kong, W., Li, D., & Yu, G. (2013). L2写作动态评估中同伴中介干预和教师中介干预比较研究. Foreign Language World, 156(3), 77–86.Google Scholar

Kong, W., Wang, S., Zhou, Y., & Chen, Y. (2011). 视频信息对 EFL 听力理解影响的实证研究. Computer-assisted Foreign Language Education, 141, 26–31.Google Scholar

Li, G., & Zeng, Y. (2011). 交际语言能力模型的构念效度研究. Modern Foreign Languages, 34(4), 389–396.Google Scholar

Li, H. (2012). Effects of rater-scale interaction on EFL essay rating outcomes and processes (Ph.D. dissertation). Zhejiang University.Google Scholar

Li, J. (2013). Validating summarization as a read-to-write integrated task (Ph.D. dissertation). Guangdong University of Foreign Studies.Google Scholar

Li, J. (2014). 不同文章体裁概要写作任务的 Rasch 模型分析. Foreign Languages and Their Teaching, 278(5), 30–34.Google Scholar

Li, R., & Ni, C. (2017). CALL 新技术应用的接受行为研究——基于在线写作自动评价系统的案例. Foreign Languages and Their Teaching, 296(5), 97–104.Google Scholar

Li, Y., & Guan, D. (2016). PETS 口试评分培训效果的多面 Rasch 分析. Foreign Language Learning: Theory and Practice, 3, 43–48.Google Scholar

Liu, J., & Han, B. (2018). 面向运用的中国英语能力等级量表建设的理论基础. Modern Foreign Languages, 41(1), 78–90.Google Scholar

Liu, J., & Lü, J. (2015). 大规模计算机口试分析评分效果研究. Modern Foreign Languages, 38(2), 248–257.Google Scholar

Liu, L., Mak, C., & Jin, T. (2013). 写作测试内容质量评分研究—分层决策树. Modern Foreign Languages, 36(4), 419–426.Google Scholar PubMed

Liu, M. (2015). 高考英语听后口头复述任务效度论证研究 (Ph.D. dissertation). Beijing Foreign Studies University.Google Scholar

Liu, Q. (2010). The national education examinations authority and its English language tests. In Cheng, L., & Curtis, A. (Eds.), English Language Assessment and the Chinese Learner (pp. 30–43). New York, NY: Routledge: Taylor & Francis Group.Google Scholar

Liu, Q. (2017). 高考英语学科 40 年. China Examinations, 298(2), 13–19.Google Scholar

Lu, L. (2016). 基于自动评价系统的第二写作过程研究. Foreign Language World, 173(2), 88–96.Google Scholar

McNamara, T., & Knoch, U. (2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29(4), 555–576.CrossRef Google Scholar

Meng, Y. (2013). Developing a model of cognitive diagnostic assessment for college EFL listening (Ph.D. dissertation). Shanghai International Studies University.Google Scholar

Min, S. (2012). Design and validation of a computerized adaptive English proficiency test (Ph.D. dissertation). Zhejiang University.Google Scholar

Min, S., He, L., & Luo, L. (2018). 中国英语听力能力等级量表描述语效度验证——基于学生自我评价的多级计分IRT模型分析. Foreign Languages in China, 15(2), 72–81.Google Scholar

Pan, M. (2017). 中国英语写作能力等级量表的典型写作活动构建——系统功能语言学的文本类型视角. Foreign Language World, 179(2), 37–52.Google Scholar

Pellegrino, J., DiBello, V., & Goldman, S. (2016). A framework for conceptualizing and evaluating the validity of instructionally relevant assessments. Educational Psychologist, 51(1), 1–23.CrossRef Google Scholar

Peng, K., & Zhang, Y. (2013). 文本可听性对听力理解的影响. Foreign Language Education, 34(3), 50–53.Google Scholar

Peng, K., & Zou, S. (2012). TEM4 语法词汇项目的构念效度研究—基于 Rasch 和 CFA 模型的分析. Foreign Languages and Their Teaching, 267(6), 49–55.Google Scholar

Spolsky, B. (2017). History of language testing. In Shohamy, E. et al. (Ed.), Language testing and assessment, encyclopedia of language and education (pp. 375–384). New York, NY: Springer.CrossRef Google Scholar

Sun, H. (2011). 概化理论和多层面 Rasch 模型在建立‘职前中学英语教师口语考试模型’中的应用. Foreign Languages and Their Teaching, 260(5), 57–62.Google Scholar

Sun, H., & Wei, M. (2012). 口语测试评分标准的现代测试学分析. Foreign Languages and Their Teaching, 267(6), 66–70.Google Scholar

Tai, Z. (2015). 学术讲座笔记质量的预测效力与特征研究. Foreign Languages and Their Teaching, 281(2), 52–57.Google Scholar

Tang, J. (2014). 探究写作自动评价系统在英语教学中的应用模式. Foreign Language Learning: Theory and Practice, 1, 49–57.Google Scholar

Tang, J., & Wu, Y. (2012). 写作自动评价系统在大学英语教学中的应用研究. Foreign Languages and Their Teaching, 265(4), 53–59.Google Scholar

Tang, L. (2014). 双人口语测试语境下的会话互动特征分析. Foreign Languages and Their Teaching, 278(5), 36–41.Google Scholar

Toulmin, S. E. (2003). The uses of argument. Cambridge, UK: Cambridge University Press.CrossRef Google Scholar

Tsagari, D., Cheng, L. (2017). Washback, impact, and consequences revisited. In Shohamy, E. et al. (Ed.), Language testing and assessment, encyclopedia of language and education (pp. 359–372). New York: Springer.CrossRef Google Scholar

Wang, B. (2012). 基于学习档案的基础阶段英语学习评估机制探索. Foreign Languages and Their Teaching, 266(5), 15–19.Google Scholar

Wang, C., & Qi, L. (2016). 从动态系统理论视角看语言测试的反拨效应. Shandong Foreign Language Teaching, 37(4), 35–42.Google Scholar

Wang, H. (2011). 写作档案袋评价过程中不同评价主体的探索研究. Foreign Language World, 143(2), 90–96.Google Scholar

Wang, H. (2014). Exploring the construct of academic listening: The trio of task demands, cognitive processes and language competence (Ph.D. dissertation). Shanghai International Studies University.Google Scholar

Wang, J. (2013). 基于语料库的大学英语英汉翻译量化评价实证研究. Foreign Language Learning: Theory and Practice, 4, 53–57.Google Scholar

Wang, W. (2013). The effects of self- and peer-assessment on Chinese learners’ EFL writing: Students’ perceptions and draft revisions (Ph.D. dissertation). Guangdong University of Foreign Studies.Google Scholar

Wang, W., Xu, Y., & Mu, L. (2018). 中国英语能力等级量表中的口译能力. Modern Foreign Languages, 41(1), 111–121.Google Scholar

Wen, Q. (2016). ‘师生合作评价’：’产出导向法’创设的新评价形式. Foreign Language World, 176(5), 37–43.Google Scholar

Wind, S., & Peterson, M. (2018). A systematic review of methods for evaluating rating quality in language assessment. Language Testing, 35(2), 161–192.CrossRef Google Scholar

Wu, Y. (2014). EFL 读写结合测试任务中受试使用原文策略的实证研究. Computer-assisted Foreign Language Education, 159, 63–69.Google Scholar

Wu, Y. (2017). 读写结合写作测试任务效度研究——结合定量统计和定性描述的方法. Computer-assisted Foreign Language Education, 173, 55–61.Google Scholar PubMed

Xiao, W., Gu, X., & Ni, C. (2014). CET的反拨效应机制：基于多群组结构方程建模的历时研究. Foreign Language Learning: Theory and Practice, 3, 37–43.Google Scholar

Xiao, Y. (2017). Formative assessment in a test-dominated context: How test practice can become more productive. Language Assessment Quarterly, 14(4), 295–311.CrossRef Google Scholar

Xu, L. (2018). The effect of task type on construct representation in an EFL speaking test: The case of TEM-4 oral test (Ph.D. dissertation). Guangdong University of Foreign Studies.Google Scholar

Xu, Q. (2012). 英语专业八级考试的反拨作用研究. Foreign Language World, 150(3), 21–31.Google Scholar

Xu, Q. (2014). A study on the washback effects of TEM (Ph.D. dissertation). Shanghai International Studies University.Google Scholar

Xu, S. (2014). Washback effect of a high-stakes test: Preparation for the writing tasks of the Graduate School Entrance English Examination (Ph.D. dissertation). Zhejiang University.Google Scholar

Xu, Y. (2011). 大学英语教师在评估改革中身份转变的叙事探究. Foreign Language Learning: Theory and Practice, 2, 41–50.Google Scholar

Xu, Y. (2015). 八年级英语写作诊断测试评分标准的构建和效度验证 (Ph.D. dissertation). Beijing Normal University.Google Scholar

Xu, Y., & Zhang, R. (2017). 小组活动中评价个人贡献的必要性与可行性实证研究. Modern Foreign Languages, 40(2), 244–253.Google Scholar

Yang, H., & Weir, C. (2001). Validation study of the National College English Test (3rd ed.). Shanghai: Shanghai Foreign Language Education Press.Google Scholar

Yang, H., & Wen, Q. (2014a). 目标在外语课堂即时形成性评估中的动态变化特征及方式. Foreign Language Teaching and Research, 46(3), 389–400.Google Scholar

Yang, H., & Wen, Q. (2014b). 外语课堂即时形成性评估的’相倚性’研究. Foreign Language Education, 35(4), 41–45.Google Scholar

Zeng, Y., & Fan, T. (2017). Developing reading proficiency scales for EFL learners in China. Language Testing in Asia, 7(8), 1–15.CrossRef Google Scholar

Zhang, C. (2015). 高考英语语法填空题构念效度的 Rasch 模型分析. Modern Foreign Languages, 38(2), 258–268.Google Scholar

Zhang, F. (2015). The variability and mechanism of washback: Investigating the washback of NMET CELST through teachers’ test preparations (Ph.D. dissertation). Guangdong University of Foreign Studies.Google Scholar

Zhang, L., & Sheng, Y. (2015). 自动作文评阅系统反馈效果个案研究. Computer-assisted Foreign Language Education, 163, 38–44.Google Scholar

Zhang, X., & Zhang, Y. (2014). 任务类型对中国英语学习者写作表现的影响. Modern Foreign Languages, 37(4), 548–558.Google Scholar

Zhang, Y., & Peng, K. (2012). TEM8 写作考试评分员差异性研究. Computer-assisted Foreign Language Education, 143, 42–46.Google Scholar

Zhou, L. (2015). 英语在线写作平台对大学生写作句法能力的影响. Computer-assisted Foreign Language Education, 165, 26–29.Google Scholar

Zhu, Z. (2017). 中国大学英语考试能力构念三十年之嬗变. Foreign Language Learning: Theory and Practice, 1, 60–66.Google Scholar

Zou, S. (2003). 语言教学大纲与语言测试的衔接——TEM8 的设计与实施. Foreign Language World, 98(6), 71–78.Google Scholar

Zou, S. (2017). The development and validation of an analytic rating scale for the writing assessment of College English Test Band 4 (Ph.D. dissertation). Shanghai Jiao Tong University.Google Scholar

Article contents

Review of recent empirical research (2011–2018) on language assessment in China

Abstract

Access options

Article purchase

Temporarily unavailable

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests