ESR13: Parametric dialogue synthesis: from separate speakers to conversational interaction

PhD Fellow: Adaeze Adigwe

My name is Adaeze Adigwe, I am originally from Lagos Nigeria. Currently I am a PhD fellow at ReadSpeaker B.V and the University of Helsinki. I graduated with my Bachelors degree in Electrical Engineering from Northeastern University in 2019. During my undergraduate studies, I was a visiting research assistant at ISIA Lab at the University of Mons, Belgium working on voice conversion which sparked my interest in the Speech Processing field.

I was also an undergraduate research assistant at the Augmented Cognition Lab at Northeastern University working on speech database and computer vision projects. Upon the completion of my undergraduate degree I enrolled in a Master Degree program in Computer Science at Columbia University specializing in Natural Language Processing. During my time there I was a graduate research assistant at the Speech Lab working on Prosodic assignment in text for TTS systems.

The goal of the ESR 13 project is to add to this research effort and extend single speaker TTS systems to conversational speaker systems. This will involve modelling and incorporating conversational cues often found in human-human interaction into our systems. Another aspect is that of prosody control in TTS systems, by addressing the challenge of how to effectively control prosodic features such as pitch and accents without compromising the quality of the synthesizer. This will hopefully enable us to synthesize a range of voices that convey different speaker styles and emotions. My project is advised by Dr Esther Judd (ReadSpeaker B.V) and Prof Juraj Simko (University of Helsinki) and Prof Stefan Benus (IISAS, Bratislava). 


The goal of ESR13’s project is to produce a naturally sounding synthesis voice with appropriately rendered turn-takes, filled pauses, and backchannels in a conversational, expressive speaking style. ESR13 will collect acoustic data from human-human dialogues, extract cues relevant to turn-taking, and train an adaptive model to go from standard news-reading style TTS to conversational dialogue TTS. The ultime objective will be to develop a working platform for synthesizing dialogues (using parametric synthesis technology) that can serve as a testing ground for evaluating hypotheses and models of interaction within COBRA.

Expected results:

  • Fully operational high quality synthesis platform for scripted dialogues;
  • Understanding of signal parameters in terms of dialogue rather than individual participants.

Based in Huis ter Heide, the Netherlands

Full-time three-year contract, starting September 2020

PhD enrolment at: University of Helsinki, Finland

Main supervisors’ institutions: ReadSpeaker, Huis ter Heide, the Netherlands, and the University of Helsinki, Finland

Main supervisors: Dr Esther Klabbers and Prof Juraj Šimko


  • University of Helsinki: to analyse prosodic cues and learn the continuous wavelet transform methodology for analysis and representation of prosody (5,5 months);
  • IISAS, Bratislava: training on turn-taking behaviour (5 months).

Co-supervisor’s institution:

  • IISAS, Bratislava, Slovakia

Scroll to top