Vocal Imitation and Learning - ESRC
Our ESRC-funded project aims to investigate learning and expertise in the context of vocal imitation. Using a combination of sophisticated brain imaging techniques and behavioural measures, we will chart the emergence of refined as well as expert-level vocal imitation performance.
In our first study, we aim to examine production of native and non-native vowels. Crucially, we will train monolingual participants to produce one non-native vowel at the lab, where a well-defined aspect of articulation will distinguish this vowel from its native partner. This will help us to understand which facets of speech motor control are malleable through training. By combining these behavioural training methods with functional brain imaging, we will also gain an insight into changes in neural circuits involved in speech, as people learn. In addition, real-time imaging of the vocal tract will allow us to track the movements of the articulators (e.g., lips, tongue, larynx) as people speak; this can tell us more about how learning non-native vowels affects the way we co-ordinate and execute speech movements.
Project Update - February 2017 - September 2018:
The last 20 months have been extremely busy and productive on the ESRC Vocal Learning project! The project officially wrapped up at the end of June 2018, and since then Carolyn has been busy preparing the data - including our Royal Holloway Vocal Tract MRI Database - for sharing. Here's a rundown of the highlights:
Feb 2017 - Feb 2018
- Dr Sheena Waters joined the lab from UCL, with experience in fMRI, MEG and studies of neuromotor plasticity. She performed the awesome task of testing almost 60 people - including 27 highly-trained singers on Experiment 3 of our Vocal Learning project. In this, participants imitated altered versions of their own voices possessing different combinations of shifted pitch and formants. Using acoustic and vocal tract measures, we tested how well people could alter the pitch of their voice, and the length of their vocal tracts (i.e. by moving their larynx, or voice box, up and down in the throat), to imitate the target sounds. We also recorded vocal tract MRI of our participants reading out simple English syllables and sentences, which contributed to our vocal tract database that we hope will be useful to other researchers and educators with an interest in speech motor behaviour!
- In the meantime, Dan Carey enjoyed publication success with papers accepted in Cerebral Cortex and NeuroImage describing Experiments 1 and 2 of the project. The details of these papers can be found on the Publications page of our website.
- So what did we find out in Experiment 3? Carolyn shared some interim results at the 2017 meeting of the Society for the Neurobiology of Language in Baltimore. We found that singers were better able to imitate the acoustics of the imitation targets than our non-singing controls, and that this was reflected in differences in brain activation as measured with fMRI. Check out the poster below!
- Later in 2017 and right through early 2018 was a period of intensive work for the lab as we concentrated our efforts on analysing the vocal tract MRI data. Here, we wanted to see how well participants can move the larynx vertically in the throat to make the vocal tract shorter or longer, when imitating "smaller" and "larger" voices. We're hugely grateful to our research assistant Elise Kanber, who not only extracted these larynx height data for us but also extracted each and every pitch track and formant frequency from thousands of speech recordings. This allowed us to generate the estimates of pitch and vocal tract length shifts per trial of the experiment, for each participant, both during the behavioural training phase and the scanning session. Dr Nadine Lavan also gave us a helping hand with some innovative use of Representational Similarity Analysis approaches to correlate participants' behavioural performance and brain activation during imitation to the "ideal models" of changes in voice pitch and vocal tract length. Pulling it all together, we found that singers are indeed more talented at engaging the appropriate changes in vocal tract length to imitate the target voices, and in combining these with appropriate shifts in voice pitch - this was reflected in stronger representation of vocal tract length for singers in regions of sensorimotor cortex. However, singers and controls showed pitch-alone adjustments that were equally good fits to the "ideal" model.
- What happens next? We're preparing a manuscript describing Experiment 3, and hope to have this submitted by the end of 2018.
Project Update - September 2016 - February 2017:
In September 2016, Dan moved on to a new opportunity at Trinity College Dublin to work on the TILDA longitudinal study of ageing. So, we decided to pause and complete the writing up of the first and second experiments while finding Dan's replacement.
Project Update - August 2016:
It's conference season, and we members of VoCoLab are very excited to present our findings at the Society for Neurobiology of Language (SNL) meeting, to be held at London's Institute of Education!
We will be presenting two posters covering the results of our first two experiments. One poster presents an overview of our fMRI findings from our second experiment; here, we show that activation across a host of brain regions just prior to producing non-native compared to native speech, is modulated by the degree of success in imitating the non-native vowel in isolation, during pre-scan training. This is an exciting finding, and tells us more about the underlying process of transforming from a sensory target to a motor output during speech, and how this relates to the familiarity and challenges involved in producing non-native speech.
Our second poster presents a combined overview of acoustic and rtMRI data from both of our experiments. Here, we show generalisation of acoustic learning from isolated non-native vowels to novel non-words, along with evidence that participants acquired novel articulatory dynamics (lip protrusion), which helped them to produce the vowels. We also present some new and exciting analyses of tongue tissue measures, which have never before been shown for trained and untrained non-native speech.
Our posters are included below, and will be presented in the session on Friday the 19th of August. We look forward to seeing you there for a chat about the results!
Elsewhere, VoCo PhD student (soon to be Dr.) Nadine Lavan, and our fearless leader, Dr. Carolyn McGettigan, will be presenting as part of the SNL Science Show-off: Brains and Language Special! It promises to be a night packed with entertainment and will offer a look at the weird and wonderful world of speech in the brain. Tickets for the night (in the Bloomsbury Theatre) have now sold out, so this is officially the hottest event in town. You heard it here first.
Project Update - June 2016:
We have recently completed the second experiment within our ESRC project, and our analyses are well underway!
Our second experiment extended our first, and examined the neural and articulatory consequences of speech training. Our participants produced trained native and non-native vowels in isolation, or within novel mono- or trisyllabic utterances. Our results broadly replicated the findings of our first experiment, and suggested that participants could learn to imitate the acoustics of non-native vowels (although with notable individual differences in success of learning), and that syllabic complexity impacted the accuracy of imitation success. Using both real-time MRI and fMRI, we again were able to demonstrate both articulatory and neural outcomes of learning and generalisation. By imaging participants' vocal tracts during speech, we could show that lip protrusion did occur for a non-native vowel (/y/), when produced in isolation and in syllabic contexts. Importantly, our fMRI results point towards regions including superior temporal cortex and cerebellum that appear to be modulated by syllabic complexity during the listening stage just before imitation, and during imitation itself.
The findings of our second study will be presented at the Society for Neurobiology of Language annual meeting in London this August.
ESRC Vocal Imitation - first publication accepted! (May 2016)
The first peer-reviewed publication arising from our ESRC-funded project has been accepted! As part of the Neuropsychologia special issue on mechanisms of speech learning and language plasticity, we submitted a review of the current state of knowledge of the neural bases of vocal imitation, in tandem with a proposal for the implementation of rtMRI in addressing questions surrounding vocal learning. Our review also provides an overview of a proposed multivariate (RSA) framework that enables the integration of vocal tract imaging data (i.e., rtMRI) and neural data (i.e., fMRI), to probe the representational bases of speech plasticity.
Carey, D., & McGettigan, C. (in press). Magnetic resonance imaging of the brain and vocal tract: applications to the study of speech production and language learning. Neuropsychologia.
ESRC Vocal Imitation project featured in the Mirror! (Dec. 2015)
Following the recent success of our RHUL Christmas video, featuring our very own Nadine Lavan lip syncing to Adele's 'Hello' in the scanner, the Mirror have run a piece about our ESRC-funded Vocal Imitation project! Follow the below link to read the full story:
The Christmas video and link to the real-time MRI video are available under the Media tab above. Enjoy!
Project Update - October 2015:
Following a whistle-stop tour of the Society for Neurobiology of Language conference, and Society for Neuroscience conference, we have now returned from Chicago. Both conferences were very successful for VoCoLab, with our ESRC posters being received quite positively! See below for the poster.
The results from our initial analyses suggest that we can indeed train participants to round their lips, thereby helping them to approximate non-native vowels such as /y/ and /oe/ more closely. We were able to probe this using acoustic measures of the participants' imitations (specifically 2D Euclidean distance between the stimulus and target F1 and F2); additionally, we could also explore the position of participants' lips and determine how successful they were at lip protrusion during real-time imaging of the vocal tract. This allowed us to compare lip protrusion extent to learning as indexed acoustically - and to our delight, we found a significant positive relationship between the two! In other words, participants who tended to protrude their lips more during scanning tended to show greater learning during the preceding training.
Our fMRI data have revealed some exciting differences in regions of the brain that are active during native and non-native articulation, as well as during the planning stage just prior to speaking. To our knowledge, we have also completed the first ever RSA searchlight analysis of a speech production fMRI dataset, using real-time images of vocal tract movements to construct the searchlight model. This has the advantage of being data-driven, which means that the model structure that we use to search our fMRI dataset is derived from the participants' own vocal tract gestures, as produced in the scanner. Our results so far have shown that we can correlate patterns of activation within somatomotor and auditory cortical regions with the dissimilarity pattern that emerges from the real-time image based model.
The next exciting stages of our project will involve further detailed analyses, as we start to prepare our results for publication. We also hope to present these findings at invited talks in the new year. Watch this space!
Project Update - July 2015:
Our data collection is well underway! We have recently begun a busy period of testing and scanning, with 6 full datasets collected so far.
Each dataset comprises real-time images of participants articulating trained and untrained vowels, along with complimentary fMRI data as participants listen to, or listen to and repeat, these same vowels. Together, these datasets can tell us quite a lot about the brain basis to articulation, as well as how subjects articulated the trained and untrained speech sounds we are interested in. In addition, we collect audio data as participants imitate speech, during training, fMRI and rtMRI - phew!
Our recruitment will continue over the summer - if you are a monolingual female native British English speaker and might like to be involved in the study, please feel free to get in touch! (Daniel.Carey@rhul.ac.uk)
Our data collection will continue until the end of the summer, and we have just recently learned that we will be presenting the findings this Autumn at both the Neurobiology of Language Conference, and the Society for Neuroscience Annual Meeting. We'll look forward to seeing you in Chicago!
Below are some of our initial fMRI pilot data, along with representational dissimilairty matrices derived from our rtMRI data. The latter can help us gain a better insight into the structured nature of participants' imitations of each of the vowels. In particular, we can start to identify boundaries between categories that reflect the more similar (i.e., blue-green) and the less similar (i.e., yellow-red) patterns of articulation apparent in the real-time images, as participants change the position of the different articulators across vowels.
Project Update - April 2015:
So far, we have collected pilot data from ~ 30 participants. Our initial results have helped to delimit the sets of vowels that participants find easier and harder to imitate. Based on this, we have selected a subset of vowels to be used in further pilot work (both behavioural and real-time MRI).
We have also begun to develop an analysis framework that will allow us to measure movements of the articulators, collected using real-time MRI sequences. We have demonstrated that movements of the lips and larynx, as well as changes in tongue position can be measured across sequences of vowel and whole word repetitions. From our framework, we can extract measures across time, such as the aperture distance along the vocal tract, and the change in location co-ordinates of the lips and larynx. We will continue to refine these methods over the coming months, with the potential to explore image averaging techniques, to improve data quality.