This study investigated the encoding strategies employed by Chinese and English language users when recalling sequences of pictured objects. The working memory performance of native English participants (n = 14) and Chinese speakers of English as a second language (Chinese ESL; n = 14) was compared using serial recall of visually-presented pictures of familiar objects with three conditions: (i) phonologically and visually distinct, (ii) phonologically similar and visually distinct, and (iii) phonologically distinct and visually similar. Digit span, visual pattern span and articulation rate were also measured. Results indicated that whilst English participants were affected by the phonological but not the visual similarity of items, the performance of Chinese ESL participants was comparable across all three conditions. No significant differences in digit span, visual memory or articulation rate were found between groups. These results are discussed in the light of our understanding of the use of cognitive resources in short-term memory in users of diverse orthographies.