Special sessions will introduce conference attendees to relevant 'hot' topics which may not be covered in other sessions, and topics which could generate instructive debate on speech technology and its applications. These sessions are developed from Papers submitted during the online paper submission process.
Friday, 9 September | 11:00 – 13:00 | Grand Ballroom BC
Friday, 9 September | 14:30 – 16:30 | Pacific Concourse Poster A
Jeesun Kim, firstname.lastname@example.org
The topic ‘Auditory-visual expressive speech and gesture in humans and machines’, encompasses many research fields and is relevant to researchers: who investigate the role of the talker’s face and head movements in human face-to-face communication; who are interested in the relationship between speech and gesture; and who are working to develop platforms for human-machine communication (e.g., a key topic for sociable humanoid robots).
The proposed session aims to bring together these researchers to create the focus and critical mass for effective interaction, more specifically to enable sharing of techniques and investigative methods and research findings. It will provide a forum for researchers to explore how studies about human communication may be relevant for enabling social machines. Conversely, it will provide an opportunity for researchers working with machines to showcase developments in their field. The feedback between the two communities will be stimulating and rewarding.
Friday, 9 September | 14:30 – 16:30 | Grand Ballroom BC
The RedDots project was initiated, with collaboration from multiple sites, as a follow-up to a special session during INTERSPEECH 2014. It was set out to collect speech data through mobile crowd-sourcing, with the benefit of potentially wider population and greater diversity. The project was rolled out in January 29, 2015. At the time of writing, the project has recruited 89 speakers (72 male, 17 female) from 21 countries, with a total of 875 complete sessions.
The purpose of this special session is to gather the research efforts towards a common goal of exploring new directions and better understanding of speaker-channel-phonetic variability modelling for text-dependent and text-prompted speaker verification over short utterances. Papers submitted to the special session are encouraged to provide results based on the RedDots database in order to enable comparison of algorithms and methods. To get the database, please contact the organizers or visit the page http://goo.gl/forms/Dpk3OiJkWV.
Friday, 9 September | 14:30 – 16:30 | Pacific Concourse Poster B
Ricard Marxer, email@example.com
Existing models of intelligibility can successfully estimate word identification in broadly stated noise conditions. These predictions may be characterized as 'macroscopic' in that they represent averages -- averages over many listeners and over many speech tokens. This Special Session asks whether we can go beyond macroscopic predictions. We invite work that might contribute to a new breed of `microscopic' models that are evaluated according to their ability to make precise predictions of what a specific listener might hear in response to a specific noisy speech token. Developing such models will deepen our understanding of speech perception and enable a wealth of new intelligibility modeling applications. To focus the session, we will provide contributers with access to two large corpora that record 'slips of the ear' made by listeners hearing words in complex noise backgrounds. Participants will be encouraged to use this data where it can support the aims of their work.
Friday, 9 September | 17:00 – 19:00 | Grand Ballroom BC
The Speakers in the Wild (SITW) speaker recognition challenge will focus on the challenges of applying current speaker recognition technology to unconstrained conditions of real-world data. The challenge is based on a newly collected database of speech samples from open source media consisting of single and multi-speaker audio acquired across unconstrained or 'wild' conditions. Multiple speech samples from nearly three hundred individuals are represented in the database, with all noise, reverb, compression and other artifacts being natural characteristics of the original audio. Challenge participants will be provided with the SITW database on which they can benchmark current technologies and explore new high-risk techniques for the task of speaker recognition under the conditions exhibited in the data. The special session will be dedicated to the discussion of applied technology, performance thereof and any issues highlighted as a result of the challenge.
Saturday, 10 September | 10:00 – 12:00 | Grand Ballroom BC
Saturday, 10 September | 13:30 – 15:30 | Pacific Concourse Poster D
Nicholas Cummins, firstname.lastname@example.org
This session will focus on the latest developments within speech-based neurological and psychiatric assessment. Topics will include (but not limited to) the automatic detection of depression, PTSD, schizophrenia, traumatic brain injury, dementia, Parkinson’s disease and autism. Participants are encouraged to target the following themes: (i) Novel clinically- or neuroscience-motivated analysis methods and vocal features: features designed to capture speech effects specific to one or more conditions, (ii) Nuisance Variability Compensation: removing effects of comorbid conditions or unwanted acoustic variability, (iii) Clinical utility and quantifying uncertainty: considerations of clinical utility or systems that self-determine a level of uncertainty associated with detection, and (iv) Cross-corpus studies; analysing the similarities and differences in speech patterns between different conditions or different recording paradigms.
Saturday, 10 September | 10:00 – 12:00 | Bayview A
Christophe d’Alessandro, email@example.com
The special session “Singing Synthesis Challenge: Fill-In the Gap” aims at bringing together research teams working on singing synthesis from all over the world by means of proposing a common challenge. The challenge will be to fill-in the gap in a well-known song (e.g. “Autumn leaves”), i.e. to synthesize a new, specially written couplet. The new couplet includes new lyrics, and possibly a new melody, to be inserted in the song. The chosen song is a top hit song, so that a large number of interpretations are available on the net and can be used for reference, acoustic analysis, machine learning, comparison etc. All aspects of singing synthesis and all methodologies are welcome, including both off-line (studio) singing synthesis systems, with no limits on time for producing the result, and performative (real-time) singing instruments. While contributors are encouraged to produce a singing synthesis, other aspects like evaluation methodologies are welcome and will be considered as valid contributions. Interspeech 2016 attendants will be given the opportunity to vote for their preferred synthetic song.
The winner of the Singing synthesis challenge at Interspeech 2016 was:
Jordi Bonada, Martí Umbert, Merlijn Blaauw
"Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016"
Saturday, 10 September | 13:30 – 15:30 | Grand Ballroom BC
Eric Fosler-Lussier, firstname.lastname@example.org
Speech processing systems have become increasingly complex and difficult to share across sites. Significant time is spent reimplementing published methods; even when software is shared, the lack of common environments between sites means that reproducing results can require significant effort. Open software repositories, virtual machines, and tools for automatically building container environments in the cloud are beginning to facilitate cross-site collaboration.
We seek contributions from creators of resources (such as software, VMs, web services, frameworks, datasets, etc.) describing their experience in production and usage of these resources in recognition, synthesis, or other speech/language processing technologies. In addition, we welcome essay contributions addressing the use of resources in both research and education settings. Featured virtual machines will be shared with session attendees in order to foster community discussion and encourage use after the session.
Saturday, 10 September | 13:30 – 15:30 | Bayview A
Tomoki Toda, email@example.com
The focus of our Special Session is on better understanding and comparing the current performance of various voice conversion techniques on identical speech corpora. It is a Special Session with an incorporated, standard challenge "Voice Conversion Challenge 2016." Authors submitting papers to the Special Session will be encouraged to submit results assessing converted voices using a new, standard database provided in the Challenge. The task of the Challenge focuses on conversion of five source speaker's voices to five different target speaker's voices. In total, 25 conversion cases will be evaluated in terms of perceived naturalness and similarity via listening tests. This will help different research groups working on voice conversion to converge on common tasks under common conditions and it will enable us to share our views about the unsolved problems and challenges behind the technologies.
Sunday, 11 September | 10:00 – 12:00 | Grand Ballroom BC
Sunday, 11 September | 13:30 – 15:30 | Grand Ballroom BC
Björn Schuller, firstname.lastname@example.org
The Interspeech 2016 Computational Paralinguistics ChallengE (ComParE) is an open Challenge dealing with states of speakers as manifested in in their speech characteristics. There have so far been seven consecutive Challenges at INTERSPEECH since 2009, but there still exists a multiplicity of not yet covered, but highly relevant paralinguistic phenomena.
Thus, we introduce two new tasks by the Deception Sub-Challenge and the Sincerity Sub-Challenge. All data including features that may be used are provided by the organisers. Based on speech analysis, in the Deception Sub-Challenge, it has to be automatically determined whether speech is deceptive or not, and in the Sincerity Sub-Challenge, the degree of perceived sincerity has to be determined.
Results of the Challenge will be presented at Interspeech 2016 and Prizes will be awarded to the Sub-Challenge winners.
Sunday, 11 September | 10:00 – 12:00 | Bayview A
Sunday, 11 September | 13:30 – 15:30 | Pacific Concourse Poster B
Naomi Harte, email@example.com
The ability to analyse sounds from animals and birds has important implications for understanding the biodiversity of different regions of the world, finding and tracking populations of rare species, and understanding communication in species other than humans. Knowledge in the speech processing community can inform and transform the analysis, classification and understanding of these vocalisations within the wider scientific community. Numerous collaborations have already developed between researchers in the areas of speech, audio and language and those in the ornithology and zoology community. This special session aims to bring together researchers from both sides to explore state of the art, consider major challenges in this domain, and identify potential areas for collaboration. Our target audience is both those already involved in such research, and any Interspeech attendee who may like to get involved in this exciting area of research.
Sunday, 11 September | 16:00 – 18:00 | Grand Ballroom BC
Dayana Ribas, firstname.lastname@example.org
One of the challenges currently faced in speech processing is the migration of laboratory results to real applications. As performing evaluations in the targeted scenarios of use is difficult to carry out, researchers have resorted to simulating data in controlled scenarios. However many datasets include some levels and types of distortion that never happen in real life. This can result in satisfactory performance on scenarios that will never happen in practice, while the performance may be much worse in real scenarios. Furthermore, complex and expensive methods might be obtained that are actually not required.
This session aims to provide a forum for the cross-fertilization of expertise and experimental evidence about “realism” across different areas of robust speech processing. Through the study of the state of the art and the exchange of specialized experiences, we aim to characterize real scenarios by measuring the ranges and combinations of different parameters, and to establish “good practices” regarding which parameter violations are acceptable or not given the task to be solved and the limitations of today’s data collection and simulation tools.
Monday, 12 September | 10:00 – 12:00 | Grand Ballroom BC
Yao Qian, email@example.com
This special session aims to promote research on the state-of-the-art speech and language technologies for human-machine conversation-based language learning. Recent advances in deep learning with big data have improved significantly the performance of speech recognition, dialogue management, language understanding and machine translation, which bring conversation-based language learning and assessment supported by machines much closer to reality and commercialization.
We like to invite researchers and engineers who worked actively in computer-aided, audio-visual language learning, including but not limited to the following topics: automatic scoring and assessment, learning error detection and diagnosis, spoken dialogue for tutoring system, speech and language technologies for education, etc. to submit papers. This special session will be a forum to present new R/D results which can support interactive language learning applications between human and machine.
Monday, 12 September | 10:00 – 12:00 | Grand Ballroom BC
Laurent Besacier, firstname.lastname@example.org
This special session aims at gathering researchers in speech technology and researchers in linguistics (working in language documentation and fundamentals of speech science). Such a partnership is particularly important for Sub-Saharan African languages which tend to remain under-resourced, under-documented and often also un-written.
Prospective authors are invited to submit original papers in the following areas:
•ASR and TTS for Sub-Saharan African languages and dialects
•Cross-lingual and multi-lingual acoustic and lexical modeling
•Applications of spoken language technologies for the African continent
•Phonetic and linguistic studies in Sub-Saharan African languages
•Zero resource speech technologies: unsupervised discovery of linguistic units
•Language documentation for endangered languages of Africa
•Machine-assisted annotation of speech and laboratory phonology
•Resource / Corpora production in African languages