Ludmila Reimer and Maria Spychalska
Affiliation: Ruhr-University Bochum, Max Planck Institute for Psycholinguistics
In everyday conversations, we extensively use co-speech gestures. This doubling of information streams, gestural and linguistic, could have multiple benefits, such as providing redundancies to compensate for noisy environments, or providing information that is only efficiently expressed through one of the information streams. Previously, we conducted two EEG experiments to determine whether iconic co-speech gestures play a role in online language processing, focusing on interactions with semantic comprehension (self-reference). We found that gestures do interact with semantic processing in the second experiment, where the task depended on the gestural information, but in the first experiment, where the task was solvable without the gestural information, the effects remained marginally non-significant, indicating only a trend similar to the second experiment (self-reference). Since we measured the EEG data before the task, it begs the question why costly online-processing took place in the second experiment, if the recall of the information of the gesture was only needed during the task. Here, we analyze the behavioral results focusing on the task performed by the participants, and ask the question whether online gestural information processing is useful to a listener further down in the timeline and how it affects their task performance. Prior studies found faster and more correct responses in presence of congruent gestures matching linguistic or visual input in comparison to incongruent mismatches, even in the absence of a task, or without the requirement to pay attention to the gesture (Kandana Arachchige et al., 2021). However, the lack of this advantageous effect has also been reported in one study (Wu & Coulson, 2007). Unlike other studies (e.g., Hintz et al., 2023; Özyürek et al., 2007), we constructed the linguistic material in such a way, that it did not vary in regard to modulating expectations about upcoming words. This lets us postulate that any change in the processing would be due to a change in the gesture alone (see Table 1). For both experiments, we used naturalistic videos of a person uttering a sentence using a general action verb, featuring (i) no gesture or (ii) an iconic co-speech gesture representing a more specific action. The target sentence was presented on screen, and in Experiment 1, it contained a verb indicating the use of an instrument represented in the shown gesture. In Experiment 2, the target sentence contained an instrument noun, preceded by the same verb as in Experiment 1. The verb and the noun either matched or mismatched the previously seen gesture, creating three conditions for both experiments: Neutral (no gesture), Match, and Mismatch (see Table 1). We used sentences such as “The child is baking cookies” with “baking” as the general activity verb combined with a vertical gesture, a truncated movement similar to using a cookie cutter (Match) or a horizontal moving gesture (Mismatch) similar to glazing cookies with a brush. After reading the target sentence, participants were prompted in ¼ of the trials in Experiment 1 to answer whether a given probe word was displayed/uttered in the previous sequence. In Experiment 2, they were prompted after all trials to rate whether the target sentence was a sensible continuation of the content of the context video (see Figure 1). The task was prompted 3 seconds after the offset of the last target word, which was initially designed to have no interference of the upcoming task with the processing of the linguistic input. Now, this time gap is helpful to determine whether the benefits of gesture in task performance persist. Both experiments had low error rates: In Experiment 1, participants gave the correct answer in 97,4% of trials for matching gestures, 97,6% for mismatches, and 97,5% for Neutral trials. The differences were not significant. In Experiment 2, we observed the benefit of the matching gesture, just as in the literature, in 91,0% of the trials. Mismatches had the worst performance with a correctness rate of 88,7%, and Neutral trials were answered correctly in 93,0% of the trials. The difference between Match and Neutral conditions was significant, as well as between Mismatch and Neutral, but not for Match vs. Mismatch. This trend was also observed for reaction times: In Experiment 1, we did not find significant differences (see Tables 1 and 2). In Experiment 2, we found that tasks following the mismatches were resolved the fastest, and tasks without any gestural information took the longest (see Tables 1 and 2). Usually, incongruencies – such as our mismatch condition – have the lowest reactions times. Given the time gap between the offset of the target word and the task, we can assume detecting a mismatch speeds up response time – however, at the cost of accuracy. The null result in Experiment 1 seems to contradict previous research, reporting faster response times even if the gesture was not task relevant. This could be explained by their different setup, where the incongruency was between the gesture and simultaneously uttered sentences, not between the gesture and following discourse. However, we also found that all reaction times in Experiment 2 were faster than in Experiment 1. Since the two experiments differed in subjects and task trial numbers, we did not statistically compare the reaction times between them. Still, the differences were relatively high (Match: 59,42ms; Mismatch: 104,90ms), suggesting that the benefit of the gestural information only matters if it is relevant within the overall context, i.e., resolving the task in this case, and can be ignored to free resources to resolve tasks that do not need this information. It might even be the case that the gestural information was initially processed but abandoned before task onset, again, to free working space capacities. This interpretation is in line with findings, where older participants rely less on gestural information than younger participants, which has been interpreted as older participants possibly freeing processing resources to prioritize language comprehension and compensate for age related degradation of working memory (Cocks et al., 2011). Concluding, this indicates that the processing of gestural information is flexible and can be suppressed in order to accommodate additional task-specific processing costs.