Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-08T12:09:59.210Z Has data issue: false hasContentIssue false

Latent Theme Dictionary Model for Finding Co-occurrent Patterns in Process Data

Published online by Cambridge University Press:  01 January 2025

Guanhua Fang*
Affiliation:
Columbia University
Zhiliang Ying
Affiliation:
Columbia University
*
Correspondence should be made to Guanhua Fang, Columbia University, New York, USA. Email: [email protected]

Abstract

Process data, which are temporally ordered sequences of categorical observations, are of recent interest due to its increasing abundance and the desire to extract useful information. A process is a collection of time-stamped events of different types, recording how an individual behaves in a given time period. The process data are too complex in terms of size and irregularity for the classical psychometric models to be directly applicable and, consequently, new ways for modeling and analysis are desired. We introduce herein a latent theme dictionary model for processes that identifies co-occurrent event patterns and individuals with similar behavioral patterns. Theoretical properties are established under certain regularity conditions for the likelihood-based estimation and inference. A nonparametric Bayes algorithm using the Markov Chain Monte Carlo method is proposed for computation. Simulation studies show that the proposed approach performs well in a range of situations. The proposed method is applied to an item in the 2012 Programme for International Student Assessment with interpretable findings.

Type
Application Reviews and Case Studies
Copyright
Copyright © 2020 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

Electronic supplementary material The online version of this article (https://doi.org/10.1007/s11336-020-09725-2) contains supplementary material, which is available to authorized users.

References

Aalen, O., Borgan, O., & Gjessing, H. (2008). Survival and event history analysis: A process point of view. Berlin: Springer. CrossRefGoogle Scholar
Allison, P. D. (1984). Event history analysis: Regression for longitudinal event data. California: Sage. CrossRefGoogle Scholar
Allman, E., Matias, C., & Rhodes, J. (2009). Identifiablity of parameters in latent structure models with many observed variables. The Annals of Statistics, 37, 3099 3132. CrossRefGoogle Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning research, 3 9931022. Google Scholar
Borboudakis, G., & Tsamardinos, I. (2019). Forward-backward selection with early dropping. The Journal of Machine Learning Research, 20, 276314. Google Scholar
Chen, Y. (2019). A continuous-time dynamic choice measurement model for problem-solving process data. arXiv preprint arXiv:1912.11335.Google Scholar
Chen, Y. -L., Tang, K., Shen, R. -J., & Hu, Y. -H. (2005). Market basket analysis in a multiple store environment. Decision Support Systems, 40, 339354. CrossRefGoogle Scholar
Deng, K., Geng, Z., & Liu, J. S. (2014). Association pattern discovery via theme dictionary models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 76, 319347. CrossRefGoogle Scholar
Duchateau, L., & Janssen, P. (2007). The frailty model. Berlin: Springer. Google Scholar
Dunson, D. B., & Xing, C. (2009). Nonparametric Bayes modeling of multivariate categorical data. Journal of the American Statistical Association, 104 10421051. CrossRefGoogle Scholar
Fang, G., Liu, J.,& Ying, Z. (2019). On the identifiability of diagnostic classification models. Psychometrika, 84 1940. CrossRefGoogle ScholarPubMed
Gibson, W. A. (1959). Three multivariate models: Factor analysis, latent structure analysis, and latent profile analysis. Psychometrika, 24, 229252. CrossRefGoogle Scholar
Goodman, L. A. (1974). Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 61, 215231. CrossRefGoogle Scholar
Goodman, M., Finnegan, R., Mohadjer, L., Krenzke, T., & Hogan, J. (2013). Literacy, numeracy, and problem solving in technology-rich environments among US adults: Results from the program for the international assessment of adult competencies 2012. First look (NCES 2014-008). ERIC.Google Scholar
Griffin, P., McGaw, B., & Care, E. (2012). Assessment and teaching of 21st century skills. Berlin: Springer. CrossRefGoogle Scholar
Han, Z., He, Q., & von Davier, M. (2019). Predictive feature generation and selection using process data from pisa interactive problem-solving items: An application of random forests. Frontiers in Psychology, 10, 2461CrossRefGoogle ScholarPubMed
Hastie, T., Tibshirani, R., Friedman, J., & Franklin, J. (2005). The elements of statistical learning: Data mining, inference and prediction. The Mathematical Intelligencer, 27, 8385. Google Scholar
He, Q., & von Davier, M. (2016). Analyzing process data from problem-solving items with n-grams: Insights from a computer-based large-scale assessment. In Handbook of research on technology tools for real-world skill development, (pp. 750–777). IGI Global.CrossRefGoogle Scholar
Ishwaran, H., & Rao, J. S. (2003). Detecting differentially expressed genes in microarrays using Bayesian model selection. Journal of the American Statistical Association, 98, 438455. CrossRefGoogle Scholar
Ishwaran, H., & Rao, J. S. (2005). Spike and slab variable selection: Frequentist and bayesian strategies. The Annals of Statistics, 33, 730773. CrossRefGoogle Scholar
Kruskal, J. B. (1977). Three-way arrays: Rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra and its Applications, 18, 95138. CrossRefGoogle Scholar
Liu, J., Xu, G., & Ying, Z. (2012). Data-driven learning of q-matrix. Applied Psychological Measurement, 36, 548564. CrossRefGoogle ScholarPubMed
Liu, J., Xu, G., & Ying, Z. (2013). Theory of the self-learning q-matrix. Bernoulli: Official Journal of the Bernoulli Society for Mathematical Statistics and Probability, 19, 1790 CrossRefGoogle ScholarPubMed
Lord, F. M. (1980). Applications of item response theory to practical testing problems, UK: Routledge. Google Scholar
OECD. (2014a). Assessing problem-solving skills in PISA 2012.Google Scholar
OECD. (2014b). PISA 2012 technical report. (Available at) http://www.oecd.org/pisa/pisaproducts/pisa2012technicalreport.htm.Google Scholar
OECD. (2016). PISA 2015 results in focus. (Available at) https://www.oecd.org/pisa/pisa-2015-results-in-focus.pdf.Google Scholar
Piatetsky-Shapiro, G. (1991). Discovery, analysis, and presentation of strong rules. Knowledge discovery in databases, 229–238.Google Scholar
Qiao, X., & Jiao, H. (2018). Data mining techniques in analyzing process data: A didactic. Frontiers in Psychology, 9, 2231CrossRefGoogle Scholar
Sethuraman, J. (1994). A constructive definition of dirichlet priors. Statistica Sinica, 4, 639650. Google Scholar
Templin, J., & Henson, R. A. et al. (2010). Diagnostic measurement: Theory, methods, and applications, New York: Guilford Press. Google Scholar
Tibshirani, R. (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16, 385395. 3.0.CO;2-3>CrossRefGoogle ScholarPubMed
van der Linden, W. J. (2006). A lognormal model for response times on test items. Journal of Educational and Behavioral Statistics, 31, 181204. CrossRefGoogle Scholar
Vermunt, J. K., & Magidson, J. (2002). Latent class cluster analysis. Applied Latent Class Analysis, 11, 89106. CrossRefGoogle Scholar
Walker, S. G. (2007). Sampling the dirichlet mixture model with slices. Communications in Statistics–Simulation and Computation®, 36, 4554. CrossRefGoogle Scholar
Xu, G. et al. (2017). Identifiability of restricted latent class models with binary responses. The Annals of Statistics, 45, 675707. CrossRefGoogle Scholar
Xu, H., Fang, G., Chen, Y., Liu, J., & Ying, Z. (2018). Latent class analysis of recurrent events in problem-solving items. Applied Psychological Measurement, 42, 478 CrossRefGoogle ScholarPubMed
Xu, H., Fang, G., & Ying, Z. (2019). A latent topic model with Markovian transition for process data. arXiv preprint arXiv:1911.01583.Google Scholar
Supplementary material: File

Fang and Ying supplementary material

Fang and Ying supplementary material
Download Fang and Ying supplementary material(File)
File 794.1 KB