Accepted papers

IEEEXplore Track


  2. "Word-Level ASR Quality Estimation for Efficient Corpus Sampling and Post-Editing through Analyzing Attentions of a Reference-Free Metric",  Golara Javadi, Kamer Ali Yuksel, Yunsu Kim, Thiago Castro Ferreira, Mohamed Al-Badrashiny - presentation

  4. "How Phonemes Contribute to Deep Speaker Models?", Pengqi Li , Tianhao Wang, Lantian Li, Askar Hamdulla, Dong Wang

  5. "Listening Between the Lines: Synthetic Speech Detection Disregarding Verbal Content", Davide Salvi, Temesgen Semu Balcha, Paolo Bestagini, Stefano Tubaro

  6. "Exploring The Multidimensional Representation of Unidimensional Speech Acoustic Parameters Extracted by Deep Unsupervised Models",  Maxime Jacquelin , Maëva Garnier , Laurent Girin , Rémy Vincent , Olivier Perrotin

  7. "Exploring Dominant Paths in CTC-Like ASR Models: Unraveling the Effectiveness of Viterbi Decoding",  Zeyu Zhao, Peter Bell, Ondřej Klejch - presentation

  8. "Regarding Topology and Adaptability in Differentiable WFST-Based E2E ASR", Zeyu Zhao , Pinzhen Chen , Peter Bell - presentation

  9. "High-Fidelity Neural Phonetic Posteriorgrams",  Cameron Churchwell , Max Morrison , Bryan Pardo

  11. "ISPA: Inter-Species Phonetic Alphabet for Transcribing Animal Sounds", Masato Hagiwara. Marius Miron, Jen-Yu Liu

  13. "Speech Representation Analysis Based on Inter- and Intra-Model Similarities", Yassine El Kheir, Ahmed Ali, Shammur Absar Chowdhury

  15. "Explainable Modeling of Gender-Targeting Practices in Toy Advertising Sound and Music", Luca Marinelli, Charalampos Saitis - presentation

  16. "Leveraging Pre-Trained Autoencoders for Interpretable Prototype Learning of Music Audio",  Pablo Alonso-Jiménez, Leonardo Pepino, Roser Batlle-Roca, Pablo Zinemanas,Dmitry Bogdanov, Xavier Serra, Martı́n Rocamora - presentation

  17. "Focal Modulation Networks for Interpretable Sound Classification", Luca Della Libera, Cem Subakan, Mirco Ravanelli

  18. "Why Does Music Source Separation Benefit from Cacophony?", Chang-Bin Jeon, Gordon Wichern, François G. Germain, Jonathan Le Roux - presentation

  19. "Perceptual Musical Features for Interpretable Audio Tagging", Vassilis Lyberatos, Spyridon Kantarelis, Edmund Dervakos and Giorgos Stamou - presentation


Workshop Track


  2. "Explaining Deep Learning Models for Spoofing and Deepfake Detection With SHapley Additive exPlanations",  Wanying Ge, Jose Patino, Massimiliano Todisco and Nicholas Evans - pdf

  4. "Exploratory Self-Attention Visualisation for Explaining Speech Transformers", Erfan A Shams, Julie Carson-Berndsen - pdf

  7. "Explaining Speech Classification Models via Word-Level Audio Segments and Paralinguistic Features", Eliana Pastor, Alkis Koudounas, Giuseppe Attanasio, Dirk Hovy, Elena Baralis - pdf, presentation

  8. "Exploring the inner mechanisms of large generative music models",  Charlotte Pouw, Marcel A. Vélez Vásquez, John Ashley Burgoyne, Willem Zuidema - pdf

  9. "Interpreting End-to-End Deep Learning Models for Speech Source Localization Using Layer-wise Relevance Propagation",  Luca Comanducci, Fabio Antonacci, Augusto Sarti - pdf

  10. "Understanding and Controlling Generative Music Transformers by Probing Individual Attention Heads",  Junghyun Koo, Gordon Wichern, François Germain, Sameer Khurana, Jonathan Le Roux - pdf