Multimodal composing, which has sometimes been referred to synonymously as multimodal composition or multimodal writing, is the use of different semiotic resources (e.g., audio, visual, gestural, and/or spatial resources) in addition to linguistic text for making meaning. Notably, multimodal composing is neither a new type of writing nor a new area of research, with studies dating back to the early 2000s. In the domain of second language (L2) research, Tardy's (2005*) study on multimodal composition in academia was one of the earliest to bring attention to the nonlinguistic features of L2 written output. Even after this pioneering study, in the few years that followed, only a handful of studies further explored aspects of L2 learners’ multimodal compositions. However, over the past decade, the fields of applied linguistics and second language acquisition (SLA) have witnessed an explosion of interest in both its study and classroom applications, with teachers’ adoption of multiple modes becoming an indispensable part of their pedagogical toolkits (e.g., Kessler, 2022; Li, 2021; Zhang et al., 2021).