To discover whether there is a benefit from employing acoustic-phonetic alignment in synthetic speech. Here, alignment refers to changes in pronunciation (e.g., vowel space) or prosody (e.g., pitch range, speaking rate). It is already possible to control these and related factors such as vocal tract length in Hidden Markov Model (HMM) speech synthesis, using machine learning methods previously developed by UEDIN and its collaborators. Possible applications are human-machine dialogue and human-human interaction where one of the humans is using a voice-output communication aid. Changing vocal tract length and pitch range gives control over perceived gender.
Extension of control techniques from HMM to the latest Deep Neural Network synthesis, providing controllable high-quality synthetic speech for use in experiments; control of this system using vocal tract length, vowel space and prosody measured (using already-available signal processing methods) from the human interlocutor’s speech; initial results regarding the effects on listeners of controlling the above factors, in a non-interactive situation; final results regarding the effectiveness of acoustic-phonetic alignment on interlocutor behaviour and overall task success / user satisfaction in both simulated human-machine and real human-human interaction scenarios.
Based in Edinburgh, UK
Full-time three-year contract, starting September 2020
PhD enrolment at: University of Edinburgh
Main supervisor’s institution: University of Edinburgh
Main supervisor: Prof Simon King
- ReadSpeaker, Uppsala: to apply the developed methods to commercial-quality synthetic speech (5,5 months);
- University of Helsinki: application to user-adaptive systems for the improvement of second-language pronunciation, in which the teacher (i.e., machine) aligns with particular aspects of the student’s speech: first matching gender, then prosody, then more subtle effects (5 months).
- ReadSpeaker, Uppsala, Sweden
- University of Helsinki, Finland