Social interaction is inescapably multimodal, composed of talk (e.g., lexical items, syntax, prosody), nonlexical conduct (e.g., breathing, laughter, sighing, response cries), and solely visible (or embodied) conduct (e.g., body posture and movement, hand gestures, object manipulation). While this chapter concerns the transcription of social interaction, its primary goal is not to explain transcription conventions and instruct readers how to use them (these topics are dealt with secondarily). Rather, the primary goal of this chapter is to demonstrate the analytic necessity and usefulness of systematic and detailed transcription practices, including those for both vocal and visual conduct (e.g., systems developed by Gail Jefferson and Lorenza Mondada, respectively). We achieve this goal by applying a wide range of transcription practices to a single video clip of mundane, dinner-time English conversation, illustrating how transcription both is, and contributes to, an analytic process. We discuss practical difficulties associated with transcription, especially that of visual conduct. Ultimately, we show that transcription is essential to understanding topics such as turn-taking, sequentiality, (dis)affiliation, emotion, stance, and social action itself.