PhD projects

Fifteen early-stage researchers (ESRs) will be recruited as PhD students in the framework of COBRA.

The call for applications is open. Deadline for submission: 31 March 2020. Each selected ESR will be offered a three-year contract starting from September/October 2020.

Further information about the 15 proposed projects is available below:

ESR1: Categorization of speech sounds as a collective decision process

PhD enrolment at: AMU

Main supervisor’s institution: AMU   |    Co-supervisor(s)’ institution(s): DAVI, UEDIN

Objectives : For two people to understand each other, words must refer to the same entities for both of them. How the relations between speech sounds and meanings come to be collectively established within a linguistic community, is an issue at the heart of speech and language. ESR1’s project will be to explore this issue in the laboratory, using—to our knowledge for the first time—a joint-perception experimental paradigm. A model will be developed of perceptual convergence as a collective decision process. Predictions of this model will be tested using an artificial agent performing the task together with human participants.

Secondments : (i) DAVI: setting up of the artificial agent employed in the joint-perception experiments (6 months); (ii) UEDIN: guidance on how to integrate perceptual convergence processes in a cognitive model of conversation (5 months).


ESR2: Brain markers of between-speaker convergence in conversational speech

PhD enrolment at: University of Ferrara

Main supervisor’s institution: IIT    |   Co-supervisor(s)’ institution(s): AMU, HU-ZAS

Objectives: When people are engaged in meaningful social interaction, they automatically and implicitly adjust their speech, vocal patterns and gestures to accommodate to others. Although these processes have extensively been explored at the behavioral level, very little is known about their neural underpinnings. The project, by running dual-EEG recordings, will investigate if behavioral speech alignment translates into identifiable brain oscillatory markers. Key objectives are (i) to develop and validate metrics to quantify phonetic accommodation during natural speech interactions and (ii) to identify electrophysiological markers of between-speaker convergence.

Secondments: (i) AMU: making-up of linguistic material in both Italian and French, contribution to design of experimental set-up and to phonetic analyses (6 months); (ii) HU-ZAS: Assessment of a repertoire of metrics for measuring phonetic convergence in conversational speech (5 months).



ESR3: Does prediction drive neural alignment in conversation?

Main supervisor’s institution: AMU   |    Co-supervisor(s)’ institution(s): FUB, Furhat

Objectives: Recent studies on neural alignment in language have shown that successful communication relies on the synchronization of the same brain regions in both speakers. However, more explicit links between neural alignment and specific linguistic functions remain to be established. ESR3’s project relies on the hypothesis that the degree of neural synchronization depends on the degree of predictive processing. This will be tested for specific linguistic functions in dual-EEG experiments with pairs of interlocutors engaged in conversations. The identification of component-specific brain markers of predictability and alignment will allow to establish which linguistic factors can enhance the predictability, and thus alignment, in human-machine interactions.

Secondments: (i) FUB: explore how predictive alignment interacts with different verbal acts and pragmatic contexts (5 months); (ii) Furhat: apply neurophysiological insights of predictability and neural alignment to test and improve the effectiveness of human-machine interactions (6 months).


ESR4: Brain indexes of semantic and pragmatic prediction

Main supervisor’s institution: FUB    |    Co-supervisor(s)’ institution(s): AMU, IIT

Objectives: Although the role of prediction in language and communication processing is widely acknowledged, there exist few brain indexes that directly map the neurophysiological correlates of prediction as these emerge in real time. ESR4 will use the readiness potential to directly map the emergence of prediction correlates in the course of communications between two people. Participants will engage in different interactive language games while predictive brain responses will be measured with EEG or fMRI and, eventually, MEG. TMS will be applied to find out whether local stimulation can specifically interfere with communicative predictions. We expect ESR4 to find new brain indexes of semantic and pragmatic predictions, which will be used to directly assess current theories of alignment between communication partners in different types of conversations.

Secondments: (i) AMU: cooperation on MEG work addressing speech production and comprehension in communication with MEG (5 months); (ii) IIT: cooperation on the design and implementation of the TMS experiment (6 months).


ESR5: Communicative alignment at the physiological level

Main supervisor’s institution: HU-ZAS    |    Co-supervisor(s)’ institution(s): IIT

Objectives: Theoretical models of interactive alignment in dialogue have mostly focused on different levels of linguistic representations. ESR5’s project will consider alignment from the physiological level, in particular joint motion, alignment in respiration and articulation, since language and communication are grounded on sensorimotor processes. Physiological data will be recorded from speakers engaged in different communicative settings using the newest technology in this area (e.g., motion-capture system, dual electromagnetic articulography, dual inductance plethysmography, dual wireless EEG). If communicative alignment is shown to take place at the physiological level, and if specific neuro-behavioral biomarkers of successful alignment can be identified, this will have an impact on theoretical models as well as therapeutic interventions for patients with interactive deficits.

Secondments: (i) IIT: training and support in EEG data acquisition and analyses to investigate neurophysiological indexes of alignment to the kinematics of speaking and breathing (8 months).


ESR6: Lexical alignment in human-machine spoken interaction

Main supervisor’s institution: UEDIN   |   Co-supervisor(s)’ institution(s): DAVI, Furhat

Objectives: In the context of a dialogue game in which interlocutors describe objects to each other as part of a task, ESR6 will investigate the extent to which people align with a natural-language generation system, and how this impacts on task success. Starting from a simple referential communication game, in which objects may be referred to by means of different terms, ESR6 will move on to more naturalistic tasks such as determining a route through a complex environment. We expect a direct relationship between alignment of the generation system and alignment of participants.

Secondments: (i) DAVI: training on using real-time dialogue systems in experimental settings (5 months, M14-M18); (ii) Furhat Robotics: to test the system with a physical agent/robot (6 months, M28-M33).


ESR7: Contribution of discourse markers to alignment in conversation

Main supervisor’s institution: UCL  |   Co-supervisor(s)’ institution(s): MPG, Orange SA

Objectives: ESR7’s project asks whether the use of discourse markers (DMs) in spontaneous conversation is motivated by the speaker’s needs, or by (the prediction by the speaker of) the listener’s needs. ESR7 will combine corpus analysis and experimental work measuring the impact on comprehension of spoken discourse with and without DMs, to better understand to which extent DMs contribute to speaker and hearer processes of alignment and prediction in spontaneous conversation.

Secondments: (i) MPG: training on experimental psycholinguistic methods (5 months); (ii) Orange SA: identification of typical markers of spoken interaction to improve dialogue agents (6 months).


ESR8: Discourse units and discourse alignment

Main supervisor’s institution: UCL  |   Co-supervisor(s)’ institution(s): AMU, Orange SA

Objectives: ESR8’s project focuses on the interaction between (i) discourse units, (ii) discourse alignment, and (iii) discourse processing in conversation. It aims to establish whether alignment applies to the building blocks of conversational turns, beyond the level of words and phrases. Syntactic, prosodic and pragmatic components are known to be involved in the Turn-Constructional Unit (TCU), but the interaction between these three components has not been investigated in a systematic way. Recent empirical work at UCL has aimed at clarifying the syntax-prosody interaction through the concept of Basic Discourse Unit. ESR8 will endeavor to better understand the role of BDUs in discourse alignment and discourse processing.

Secondments: (i) AMU: modelling the overall sequence organization of activities within conversation, and development of an annotation protocol (5 months); (ii) Orange SA: to carry out experiments with conversational systems; to determine the usability of the results for modelling the architecture of a conversational system (6 months).


ESR9: Acoustic-phonetic alignment in synthetic speech

Main supervisor’s institution: UEDIN  |   Co-supervisor(s)’ institution(s): ReadSpeaker, Helsinki

Objectives: To discover whether there is a benefit from employing acoustic-phonetic alignment in synthetic speech. Here, alignment refers to changes in pronunciation (e.g., vowel space) or prosody (e.g., pitch range, speaking rate). It is already possible to control these and related factors such as vocal tract length in Hidden Markov Model (HMM) speech synthesis, using machine learning methods previously developed by UEDIN and its collaborators. Possible applications are human-machine dialogue and human-human interaction where one of the humans is using a voice-output communication aid. Changing vocal tract length and pitch range gives control over perceived gender.

Secondments: (i) ReadSpeaker: to apply the developed methods to commercial-quality synthetic speech (6 months); (ii) Helsinki: application to user-adaptive systems for the improvement of second-language pronunciation, in which the teacher (i.e., machine) aligns with particular aspects of the student’s speech (5 months).


ESR10: Phonetic alignment in a non-native language

Main supervisor’s institution: IIT  |   Co-supervisor(s)’ institution(s): IISAS, DAVI

Objectives: For adults, mastering the segmental and supra-segmental aspects of a second language (L2) is particularly challenging. Although we know that such a capability is partially maintained during adulthood, we do not know yet how to facilitate effective and long-lasting L2 learning. This project is based on the hypothesis that when people engage in meaningful social interactions, they automatically and implicitly align at multiple linguistic levels, including the phonetic one. ESR10 will tackle the fundamental scientific question of phonetic alignment in L2 and whether it drives long-lasting improvements in L2 skills.

Secondments: (i) IISAS: training on methods for analysing acoustic-prosodic alignment in dialogues (5 months); (ii) DAVI: training on how to design effective game-like tools to explore alignment in L2 (6 months).


ESR11: Conversation coordination and mind-reading

Main supervisor’s institution: IISAS  |   Co-supervisor(s)’ institution(s): AMU, Furhat

Objectives: ESR11 will investigate to what extent people need to attribute intention, belief and knowledge to their interlocutor (i.e. theory of mind, also called mindreading) to effectively coordinate and build common ground during a conversation. ESR11 will design scenarios in which humans interact with a) individuals with a theory-of-mind impairment such as schizophrenia, and b) an avatar. The ESR will test the use of overt markers of the attribution of belief and knowledge (e.g. feedback responses such as backchannels, prosody) in the development of conversational coordination and alignment. Combining clinical work with computational modeling offers a broad range of skills for ESR11 and potential for complementary findings regarding the role of intention, belief and knowledge in conversation coordination.

Secondments: (i) AMU: training in clinical assessment, development of psycholinguistic experiments (5 months); (ii) Furhat Robotics: training in the implementation of the avatar interaction platform focusing on alignment and behavior projection (6 months).


ESR12: The influence of alignment

Main supervisor’s institution: IISAS  |   Co-supervisor(s)’ institution(s): Orange SA, HKPU

Objectives: Alignment in both human-human and human-machine interactions has been shown to be beneficial on several social dimensions such as likeability, task success, learning gains, or trust. ESR12 will test the central hypothesis that people (and machines) who adapt to their interlocutor(s) influence their interlocutors’ behavior and emotional state more than people/machines who do not adapt.

Secondments: (i) Orange SA: strategies for dealing with displeased users of automatic conversational systems (5 months); (ii) HKPU: ontological and linguistic models for representation and verification of alignment (6 months).


ESR13: Parametric dialogue synthesis: from separate speakers to conversational interaction

Main supervisor’s institution: ReadSpeaker  |   Co-supervisor(s)’ institution(s): IISAS, Helsinki

Objectives: The goal of ESR13’s project is to produce a naturally sounding synthesis voice with appropriately rendered turn-takes, filled pauses, and backchannels in a conversational, expressive speaking style. ESR13 will collect acoustic data from human-human dialogues, extract cues relevant to turn-taking, and train an adaptive model to go from standard news-reading style TTS to conversational dialogue TTS. The ultime objective will be to develop a working platform for synthesizing dialogues (using parametric synthesis technology) that can serve as a testing ground for evaluating hypotheses and models of interaction within COBRA. ESR13 will enroll for a PhD in University of Helsinki.

Secondments: (i) University of Helsinki: to analyse prosodic cues and learn the continuous wavelet transform methodology for analysis and representation of prosody (6 months); (ii) IISAS: training on turn-taking behaviour (5 months).


ESR14: Gender and vocal alignment in speakers and robots

Main supervisor’s institution: Furhat  |   Co-supervisor(s)’ institution(s): UCL, IISAS

Objectives: ESR14 will focus on acoustic-prosodic alignment at the discourse level. In addition to playing a crucial role in speakers’ joint understanding of what they are talking about, vocal alignment may also have an important social dimension. Thus, previous work reveals that vocal alignment tends to occur to a greater extent in women than in men. Recent studies on human-robot interactions, however, suggest that there is a tendency for both women and men to disalign with respect to the robot’s voice, regardless of the robot’s voice gender. The goal of ESR14’s project will be to provide a fuller characterization of acoustic-prosodic in humans and robots with a focus on gender and the communicative situation.

Secondments: (i) UCL: training in designing naturalistic gender-based and situation specific speech elicitation techniques (6 months); IISAS: training in fine-grained acoustic-prosodic alignment informed by dialogue progression (5 months).


ESR15: Endowing robots with high-level conversational skills

Main supervisor’s institution: Furhat   |   Co-supervisor(s)’ institution(s): HU-ZAS, MPG

Objectives: For social robots and conversational agents to take part in real-life interactions with users, these robots need to possess certain social and conversational skills that will enable them to interact naturally with a human partner, using language, gesture, and other cognitive and social skills that reflect aspects of the user’s behaviour. These include models of engagement, joint attention, personality style and collaborative state. ESR15 will contribute to developing computational models of high-level conversational skills in a multiparty human-robot interaction set-up (two humans, one robot). ESR15 will enroll for a PhD at KTH, Stockholm.

Secondments: (i) HU-ZAS: training in recording and analysis of head motion in multi-party conversations (5 months); (ii) MPG: experimental training on testing the efficiency of the interaction set-up (6 months).


Scroll to top