Historically, the principal function of vision has been to provide the information needed to support action. Visually mediated actions rely on three systems: the gaze system responsible for locating and fixating task-relevant objects, the motor system of the limbs to carry out the task, and the visual system to supply information to the other two. All three systems are under the control of a fourth system, the schema system, which specifies the current task and plans the overall sequence of actions. These four systems have separate but interconnected cortical representations. The way these systems interact in time and space is discussed here in relation to two studies of the gaze changes and manipulations made during two ordinary food preparation tasks. The main conclusions are that complex action sequences consist of a succession of individual object-related actions, each of which typically involve a turn toward the object (if needed), followed by fixation and finally manipulation monitored by vision. Gaze often moves on to the next object just before manipulation is complete. Task-irrelevant objects are hardly ever fixated, implying that the control of fixation comes principally from top-down instructions from the schema system, not bottom-up salience. Single fixations have identifiable functions (locating, directing, guiding, and checking) related to the action to be taken. Several variants of the basic object-related action scheme are discussed, including single-action events in ball sports involving only one anticipatory gaze shift, continuous production loops in text and music reading, and storage–action alternation in copying tasks such as portrait sketching.