Special Session Descriptions

 

Special sessions will introduce conference attendees to relevant 'hot' topics which may not be covered in other sessions, and topics which could generate instructive debate on speech technology and its applications. These sessions are developed from Papers submitted during the online paper submission process.

 

AUDITORY-VISUAL EXPRESSIVE SPEECH AND GESTURE IN HUMANS AND MACHINES

(Area 13: Speech and Spoken-language based Multimodal Processing and Systems)

Friday, 9 September | 11:00 – 13:00 | Grand Ballroom BC

Friday, 9 September | 14:30 – 16:30 | Pacific Concourse Poster A

 

  1. Names and affiliation of organizers:
  • Jeesun Kim, Associate Professor, The MARCS Institute, Western Sydney University, Australia
  • Gérard Bailly, Professor, GIPSA-Lab/Speech & Cognition dpt., CNRS/Grenoble-Alpes University, France

 

  1. Contact person:

Jeesun Kim, j.kim@westernsydney.edu.au

 

  1. Short description of the special session

The topic ‘Auditory-visual expressive speech and gesture in humans and machines’, encompasses many research fields and is relevant to researchers: who investigate the role of the talker’s face and head movements in human face-to-face communication; who are interested in the relationship between speech and gesture; and who are working to develop platforms for human-machine communication (e.g., a key topic for sociable humanoid robots).

 

The proposed session aims to bring together these researchers to create the focus and critical mass for effective interaction, more specifically to enable sharing of techniques and investigative methods and research findings. It will provide a forum for researchers to explore how studies about human communication may be relevant for enabling social machines. Conversely, it will provide an opportunity for researchers working with machines to showcase developments in their field. The feedback between the two communities will be stimulating and rewarding. 

 

  1. Link to more detailed description

http://www.westernsydney.edu.au/marcs/news/special_session_av_expressive

 

THE REDDOTS CHALLENGE: TOWARDS CHARACTERIZING SPEAKERS FROM SHORT UTTERANCES

(Area 4: Speaker and Language Identification)

Friday, 9 September | 14:30 – 16:30 | Grand Ballroom BC

 

  1. Names and affiliation of organizers:
  • Kong Aik Lee, Institute for Infocomm Research, A*STAR, Singapore
  • Anthony Larcher, LIUM, Université du Maine, France
  • Hagai Aronowitz, IBM Research Haifa, Israel
  • Guangsen Wang, Institute for Infocomm Research, A*STAR, Singapore
  • Patrick Kenny, CRIM, Canada

 

  1. Contact person:

 

  1. Short description of the special session

The RedDots project was initiated, with collaboration from multiple sites, as a follow-up to a special session during INTERSPEECH 2014. It was set out to collect speech data through mobile crowd-sourcing, with the benefit of potentially wider population and greater diversity. The project was rolled out in January 29, 2015. At the time of writing, the project has recruited 89 speakers (72 male, 17 female) from 21 countries, with a total of 875 complete sessions.

 

The purpose of this special session is to gather the research efforts towards a common goal of exploring new directions and better understanding of speaker-channel-phonetic variability modelling for text-dependent and text-prompted speaker verification over short utterances. Papers submitted to the special session are encouraged to provide results based on the RedDots database in order to enable comparison of algorithms and methods. To get the database, please contact the organizers or visit the page http://goo.gl/forms/Dpk3OiJkWV.

 

  1. Link to more detailed description

https://sites.google.com/site/thereddotsproject/reddots-challenge

 

Intelligibility under the microscope

(Area 1: Speech Perception, Production, and Acquisition)

Friday, 9 September | 14:30 – 16:30 | Pacific Concourse Poster B

 

  1. Names and affiliation of organizers:
  • Ricard Marxer, University of Sheffield, UK
  • Martin Cooke, Ikerbasque (Basque Foundation for Science), Spain
  • Jon P. Barker, University of Sheffield, UK

 

  1. Contact person:

Ricard Marxer, r.marxer@sheffield.ac.uk

 

  1. Short description of the special session 

Existing models of intelligibility can successfully estimate word identification in broadly stated noise conditions. These predictions may be characterized as 'macroscopic' in that they represent averages -- averages over many listeners and over many speech tokens. This Special Session asks whether we can go beyond macroscopic predictions. We invite work that might contribute to a new breed of `microscopic' models that are evaluated according to their ability to make precise predictions of what a specific listener might hear in response to a specific noisy speech token. Developing such models will deepen our understanding of speech perception and enable a wealth of new intelligibility modeling applications. To focus the session, we will provide contributers with access to two large corpora that record 'slips of the ear' made by listeners hearing words in complex noise backgrounds. Participants will be encouraged to use this data where it can support the aims of their work.

 

  1. Link to more detailed description

http://spandh.dcs.shef.ac.uk/2016_is_microintelligibility

 

THE SPEAKERS IN THE WILD (SITW) SPEAKER RECOGNITION CHALLENGE

(Area 4: Speaker and Language Identification)

Friday, 9 September | 17:00 – 19:00 | Grand Ballroom BC

 

  1. Names and affiliation of organizers:
  • Mitchell McLaren, Speech Technology and Research (STAR) Laboratory at SRI International, Menlo Park, California, USA
  • Aaron Lawson, Speech Technology and Research (STAR) Laboratory at SRI International, Menlo Park, California, USA
  • Luciana Ferrer, Departamento de Computacion, FCEN, Universidad de Buenos Aires and CONICET, Argentina
  • Diego Castán, Speech Technology and Research (STAR) Laboratory at SRI International, Menlo Park, California USA
     
  1. Contact person:

Mitchell McLaren, mitchell.mclaren@sri.com Please CC sitw_poc@speech.sri.com to include all organizers.

 

  1. Short description of the special session 

The Speakers in the Wild (SITW) speaker recognition challenge will focus on the challenges of applying current speaker recognition technology to unconstrained conditions of real-world data. The challenge is based on a newly collected database of speech samples from open source media consisting of single and multi-speaker audio acquired across unconstrained or 'wild' conditions. Multiple speech samples from nearly three hundred individuals are represented in the database, with all noise, reverb, compression and other artifacts being natural characteristics of the original audio. Challenge participants will be provided with the SITW database on which they can benchmark current technologies and explore new high-risk techniques for the task of speaker recognition under the conditions exhibited in the data. The special session will be dedicated to the discussion of applied technology, performance thereof and any issues highlighted as a result of the challenge.

 

  1. Link to more detailed description

http://www.speech.sri.com/projects/sitw

 

CLINICAL AND NEUROSCIENCE-INSPIRED VOCAL BIOMARKERS OF NEUROLOGICAL AND PSYCHIATRIC DISORDERS

(Area 3: Analysis of Paralinguistics in Speech and Language)

Saturday, 10 September | 10:00 – 12:00 | Grand Ballroom BC

Saturday, 10 September | 13:30 – 15:30 | Pacific Concourse Poster D

 

  1. Names and affiliation of organizers:
  • Nicholas Cummins, Universität Passau, Germany
  • Julien Epps, UNSW Australia, Data61, Australia
  • Emily Mower Provost, University of Michigan, USA
  • Thomas Quatieri, MIT Lincoln Laboratory, USA
  • Stefan Scherer, USC Institute for Creative Technologies, USA

 

  1. Contact person:

Nicholas Cummins, n.p.cummins@unsw.edu.au

 

  1. Short description of the special session 

This session will focus on the latest developments within speech-based neurological and psychiatric assessment. Topics will include (but not limited to) the automatic detection of depression, PTSD, schizophrenia, traumatic brain injury, dementia, Parkinson’s disease and autism. Participants are encouraged to target the following themes: (i) Novel clinically- or neuroscience-motivated analysis methods and vocal features: features designed to capture speech effects specific to one or more conditions, (ii) Nuisance Variability Compensation: removing effects of comorbid conditions or unwanted acoustic variability, (iii) Clinical utility and quantifying uncertainty: considerations of clinical utility or systems that self-determine a level of uncertainty associated with detection, and (iv) Cross-corpus studies; analysing the similarities and differences in speech patterns between different conditions or different recording paradigms.

 

  1. Link to more detailed description

http://cnivb2016.ee.unsw.edu.au/

 

SINGING SYNTHESIS CHALLENGE: FILL-IN THE GAP

(Area 7: Speech Synthesis and Spoken Language Generation)

Saturday, 10 September | 10:00 – 12:00 | Bayview A

 

  1. Names and affiliation of organizers:
  • Christophe d’Alessandro, LIMSI-CNRS, France
  • Axel Roebel, IRCAM-CNRS-UMPC, France
  • Olivier Deroo, ACAPELA Group, Belgium

 

  1. Contact person:

Christophe d’Alessandro, cda@limsi.fr

 

  1. Short description of the special session 

The special session “Singing Synthesis Challenge: Fill-In the Gap” aims at bringing together research teams working on singing synthesis from all over the world by means of proposing a common challenge. The challenge will be to fill-in the gap in a well-known song (e.g. “Autumn leaves”), i.e. to synthesize a new, specially written couplet. The new couplet includes new lyrics, and possibly a new melody, to be inserted in the song. The chosen song is a top hit song, so that a large number of interpretations are available on the net and can be used for reference, acoustic analysis, machine learning, comparison etc. All aspects of singing synthesis and all methodologies are welcome, including both off-line (studio) singing synthesis systems, with no limits on time for producing the result, and performative (real-time) singing instruments. While contributors are encouraged to produce a singing synthesis, other aspects like evaluation methodologies are welcome and will be considered as valid contributions. Interspeech 2016 attendants will be given the opportunity to vote for their preferred synthetic song.

 

  1. Link to more detailed description

http://chanter.limsi.fr 

https://chanter.limsi.fr/doku.php?id=sidebar#special_session_interspeech_2016_singing_synthesis_challenge_fill-in_the_gap

 

The winner of the Singing synthesis challenge at Interspeech 2016 was: 

Jordi Bonada, Martí Umbert, Merlijn Blaauw

"Expressive Singing Synthesis Based on Unit Selection for the Singing Synthesis Challenge 2016"

 

SHARING RESEARCH AND EDUCATION RESOURCES FOR UNDERSTANDING SPEECH PROCESSING

(Area 10: Speech Recognition—Technologies and Systems for New Applications)

Saturday, 10 September | 13:30 – 15:30 | Grand Ballroom BC

 

  1. Names and affiliation of organizers:
  • Eric Fosler-Lussier, The Ohio State University, USA
  • Rebecca Bates, Minnesota State University, Mankato, USA
  • Florian Metze, Carnegie Mellon University, USA

 

  1. Contact person:

Eric Fosler-Lussier, speech-kitchen@lists.andrew.cmu.edu

 

  1. Short description of the special session

Speech processing systems have become increasingly complex and difficult to share across sites. Significant time is spent reimplementing published methods; even when software is shared, the lack of common environments between sites means that reproducing results can require significant effort. Open software repositories, virtual machines, and tools for automatically building container environments in the cloud are beginning to facilitate cross-site collaboration.

 

We seek contributions from creators of resources (such as software, VMs, web services, frameworks, datasets, etc.) describing their experience in production and usage of these resources in recognition, synthesis, or other speech/language processing technologies. In addition, we welcome essay contributions addressing the use of resources in both research and education settings. Featured virtual machines will be shared with session attendees in order to foster community discussion and encourage use after the session.

 

  1. Link to more detailed description

http://speechkitchen.org/interspeech-2016-special-session/

 

VOICE CONVERSION CHALLENGE 2016

(Area 7: Speech Synthesis and Spoken Language Generation)

Saturday, 10 September | 13:30 – 15:30 | Bayview A

 

  1. Names and affiliation of organizers:
  • Tomoki Toda, Nagoya University, Japan
  • Junichi Yamagishi, National Institute of Informatics, Japan and University of Edinburgh, UK
  • Fernando Villavicencio, National Institute of Informatics, Japan
  • Zhizheng Wu, University of Edinburgh, UK
  • Ling-Hui Chen, University of Science and Technology of China, China
  • Daisuke Saito, University of Tokyo, Japan
  • Mirjam Wester, University of Edinburgh, UK

 

  1. Contact person:

Tomoki Toda, tomoki@icts.nagoya-u.ac.jp

 

  1. Short description of the special session

The focus of our Special Session is on better understanding and comparing the current performance of various voice conversion techniques on identical speech corpora. It is a Special Session with an incorporated, standard challenge "Voice Conversion Challenge 2016." Authors submitting papers to the Special Session will be encouraged to submit results assessing converted voices using a new, standard database provided in the Challenge. The task of the Challenge focuses on conversion of five source speaker's voices to five different target speaker's voices. In total, 25 conversion cases will be evaluated in terms of perceived naturalness and similarity via listening tests. This will help different research groups working on voice conversion to converge on common tasks under common conditions and it will enable us to share our views about the unsolved problems and challenges behind the technologies.

 

  1. Link to more detailed description

http://vc-challenge.org/index.html

 

Computational Paralinguistics Challenge (ComParE): Deception & Sincerity

(Area 3: Analysis of Paralinguistics in Speech and Language)

Sunday, 11 September | 10:00 – 12:00 | Grand Ballroom BC

Sunday, 11 September | 13:30 – 15:30 | Grand Ballroom BC

 

  1. Names and affiliation of organizers:
  • Björn Schuller, University of Passau, Germany & Imperial College London, UK
  • Stefan Steidl, FAU Erlangen-Nuremberg, Germany
  • Anton Batliner, TUM, Germany
  • Julia Hirschberg, Columbia University, New York, USA
  • Judee K. Burgoon, University of Arizona, Tucson, USA
  • Eduardo Coutinho, University of Liverpool, UK & Imperial College London, UK

 

  1. Contact person:

Björn Schuller, bjoern.schuller@imperial.ac.uk

 

  1. Short description of the special session

The Interspeech 2016 Computational Paralinguistics ChallengE (ComParE) is an open Challenge dealing with states of speakers as manifested in in their speech characteristics. There have so far been seven consecutive Challenges at INTERSPEECH since 2009, but there still exists a multiplicity of not yet covered, but highly relevant paralinguistic phenomena.

 

Thus, we introduce two new tasks by the Deception Sub-Challenge and the Sincerity Sub-Challenge. All data including features that may be used are provided by the organisers. Based on speech analysis, in the Deception Sub-Challenge, it has to be automatically determined whether speech is deceptive or not, and in the Sincerity Sub-Challenge, the degree of perceived sincerity has to be determined.

 

Results of the Challenge will be presented at Interspeech 2016 and Prizes will be awarded to the Sub-Challenge winners. 

 

  1. Link to more detailed description

http://emotion-research.net/sigs/speech-sig/is16-compare

 

Speech, audio, and language processing techniques applied to bird and animal vocalisations

(Area 5: Analysis of Speech and Audio Signals)

Sunday, 11 September | 10:00 – 12:00 | Bayview A

Sunday, 11 September | 13:30 – 15:30 | Pacific Concourse Poster B

 

  1. Names and affiliation of organizers:
  • Naomi Harte, Trinity College Dublin, Ireland
  • Peter Jancovic, University of Birmingham, UK
  • Karl-L. Schuchmann, Zoological Research Museum Alexander Koenig & University of Bonn, Germany

 

  1. Contact person:

Naomi Harte, nharte@tcd.ie

 

  1. Short description of the special session 

The ability to analyse sounds from animals and birds has important implications for understanding the biodiversity of different regions of the world, finding and tracking populations of rare species, and understanding communication in species other than humans. Knowledge in the speech processing community can inform and transform the analysis, classification and understanding of these vocalisations within the wider scientific community. Numerous collaborations have already developed between researchers in the areas of speech, audio and language and those in the ornithology and zoology community. This special session aims to bring together researchers from both sides to explore state of the art, consider major challenges in this domain, and identify potential areas for collaboration. Our target audience is both those already involved in such research, and any Interspeech attendee who may like to get involved in this exciting area of research.

 

  1. Link to more detailed description

http://www.mee.tcd.ie/~sigmedia/Research/IS2016

 

REALISM IN ROBUST SPEECH PROCESSING

(Area 10: Speech Recognition—Technologies and Systems for New Applications)

Sunday, 11 September | 16:00 – 18:00 | Grand Ballroom BC

 

  1. Names and affiliation of organizers:
  • Dayana Ribas, CENATAV, Cuba
  • Emmanuel Vincent, Inria, France
  • John H.L. Hansen, Univ. of Texas at Dallas, USA

 

  1. Contact person:

Dayana Ribas, dayanaribasglez@gmail.com

 

  1. Short description of the special session 

One of the challenges currently faced in speech processing is the migration of laboratory results to real applications. As performing evaluations in the targeted scenarios of use is difficult to carry out, researchers have resorted to simulating data in controlled scenarios. However many datasets include some levels and types of distortion that never happen in real life. This can result in satisfactory performance on scenarios that will never happen in practice, while the performance may be much worse in real scenarios. Furthermore, complex and expensive methods might be obtained that are actually not required.

 

This session aims to provide a forum for the cross-fertilization of expertise and experimental evidence about “realism” across different areas of robust speech processing. Through the study of the state of the art and the exchange of specialized experiences, we aim to characterize real scenarios by measuring the ranges and combinations of different parameters, and to establish “good practices” regarding which parameter violations are acceptable or not given the task to be solved and the limitations of today’s data collection and simulation tools.

 

  1. Link to more detailed description

https://team.inria.fr/multispeech/files/2016/03/IS2016-SpecialSession_RealismRobustSpeechProc1.pdf

 

Speech and Language Technologies for Human-Machine Conversation-based Language Education

(Area 10: Speech Recognition—Technologies and Systems for New Applications)

Monday, 12 September | 10:00 – 12:00 | Grand Ballroom BC

 

  1. Names and affiliation of organizers:
  • Yao Qian, Educational Testing Service, USA
  • Helen Meng, The Chinese University of Hong Kong, Hong Kong SAR
  • Frank K. Soong, Microsoft Research, China

 

  1. Contact person:

Yao Qian, yqian@ets.org

 

  1. Short description of the special session

This special session aims to promote research on the state-of-the-art speech and language technologies for human-machine conversation-based language learning. Recent advances in deep learning with big data have improved significantly the performance of speech recognition, dialogue management, language understanding and machine translation, which bring conversation-based language learning and assessment supported by machines much closer to reality and commercialization.

 

We like to invite researchers and engineers who worked actively in computer-aided, audio-visual language learning, including but not limited to the following topics: automatic scoring and assessment, learning error detection and diagnosis, spoken dialogue for tutoring system, speech and language technologies for education, etc. to submit papers. This special session will be a forum to present new R/D results which can support interactive language learning applications between human and machine.

 

  1. Link to more detailed description

http://www.interspeech2016.org/10.12-Special-Session:-Call-for-Papers

 

Sub-Saharan African languages: from speech fundamentals to applications

(Area 10: Speech Recognition—Technologies and Systems for New Applications)

Monday, 12 September | 10:00 – 12:00 | Grand Ballroom BC

 

  1. Names and affiliation of organizers:

 

  1. Contact person:

Laurent Besacier, laurent.besacier@imag.fr

 

  1. Short description of the special session

This special session aims at gathering researchers in speech technology and researchers in linguistics (working in language documentation and fundamentals of speech science). Such a partnership is particularly important for Sub-Saharan African languages which tend to remain under-resourced, under-documented and often also un-written.

 

Prospective authors are invited to submit original papers in the following areas:

 

•ASR and TTS for Sub-Saharan African languages and dialects

•Cross-lingual and multi-lingual acoustic and lexical modeling

•Applications of spoken language technologies for the African continent

•Phonetic and linguistic studies in Sub-Saharan African languages

•Zero resource speech technologies: unsupervised discovery of linguistic units

•Language documentation for endangered languages of Africa

•Machine-assisted annotation of speech and laboratory phonology

•Resource / Corpora production in African languages

 

  1. Link to more detailed description

http://alffa.imag.fr/interspeech-2016-special-session-proposal/