ESR9 : Acoustic-phonetic alignment in synthetic speech

PhD fellowship

Objectives:

To discover whether there is a benefit from employing acoustic-phonetic alignment in synthetic speech. Here, alignment refers to changes in pronunciation (e.g., vowel space) or prosody (e.g., pitch range, speaking rate). It is already possible to control these and related factors such as vocal tract length in Hidden Markov Model (HMM) speech synthesis, using machine learning methods previously developed by UEDIN and its collaborators. Possible applications are human-machine dialogue and human-human interaction where one of the humans is using a voice-output communication aid. Changing vocal tract length and pitch range gives control over perceived gender.


Expected results:

Extension of control techniques from HMM to the latest Deep Neural Network synthesis, providing controllable high-quality synthetic speech for use in experiments; control of this system using vocal tract length, vowel space and prosody measured (using already-available signal processing methods) from the human interlocutor’s speech; initial results regarding the effects on listeners of controlling the above factors, in a non-interactive situation; final results regarding the effectiveness of acoustic-phonetic alignment on interlocutor behaviour and overall task success / user satisfaction in both simulated human-machine and real human-human interaction scenarios.

Based in Edinburgh, UK

Full-time three-year contract, starting September 2020

PhD enrolment at: University of Edinburgh

Main supervisor’s institution: University of Edinburgh

Main supervisor: Prof Simon King

Secondments:

  • ReadSpeaker, Uppsala: to apply the developed methods to commercial-quality synthetic speech (5,5 months);
  • University of Helsinki: application to user-adaptive systems for the improvement of second-language pronunciation, in which the teacher (i.e., machine) aligns with particular aspects of the student’s speech: first matching gender, then prosody, then more subtle effects (5 months).

Co-supervisors’ institutions:

  • ReadSpeaker, Uppsala, Sweden
  • University of Helsinki, Finland


Scroll to top