Happy to announce that I recently defended my PhD thesis titled “Training Sound Event Classifiers Using Different Types of Supervision”!

Training Sound Event Classifiers Using Different Types of Supervision
Eduardo Fonseca
PhD Thesis, Universitat Pompeu Fabra, 2021
[PDF] [slides] [video]

The thesis was carried out at the Music Technology Group of Universitat Pompeu Fabra in Barcelona, under the supervision of Dr. Xavier Serra and Dr. Frederic Font. The thesis committee was composed by Dr. Emmanouil Benetos (Queen Mary University of London), Dr. Annamaria Mesaros (Tampere University), and Dr. Marius Miron (UPF).

This thesis included two research stays at Google Research and multiple collaborations with researchers from Dublin City University, Université de Lorraine, Inria and Google. The thesis was partially funded by two Google Faculty Research Awards (2017 and 2018).

A peek into the thesis

Here’s the Table of Contents along with some notes (e.g. regarding content extensions with respect to previously published papers).

  1. Introduction. Motivating the thesis and contrasting the commonly-used paradigm of sound event classification with a new perspective based on learning with noisy labels and self-supervision.
  2. Background. Provides relevant background (context of sound event recognition & components of a supervised sound event classification pipeline) as well as literature review. I hope this can be a great resource for entry-level researchers and newcomers to the field.
  3. The Freesound Dataset 50k (FSD50K). This is largely based on our IEEE/ACM Transactions on Audio, Speech, and Language Processing paper.
  4. Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks. This is a substantial extension to our paper with the same title, including additional experiments, figures, analysis and discussion. Worth having a look at it if you are interested in the topic.
  5. Training Sound Event Classifiers With Noisy Labels. This is based on my three papers on learning with noisy labels [ICASSP19, WASPAA19, IEEESPL20]. However, this chapter contains additional figures and some additional discussion.
  6. Self-Supervised Learning of Sound Event Representations. This is based on my two papers on self-supervised learning [ICASSP21, WASPAA21], but contains additional explanations and discussion (especially for the ICASSP21 paper) as well as some reflections in the final Conclusion section.
  7. Summary and Future Perspectives. Including a summary of the contributions and conclusions, as well as the impact of this thesis.

Then there are several Appendixes, describing the DCASE Challenge Tasks I was involved in, as well as the list of publications from this thesis, other contributions and merits, resources, etc.

Please check the thesis report for more details! I really hope it can be useful to someone. If it’s useful to you, you can cite it, or send me a quick email. This way I’ll know that it was useful to someone other than me - a question that comes to mind while writing/preparing a 300ish pages report. :)

By the way, for some unfortunate reason, the Google Scholar entry and bibtex are not accurate (they gave me a surname that I don’t have). The correct bibtex is:

title={Training sound event classifiers using different types of supervision},
author={Fonseca, Eduardo},
school={Universitat Pompeu Fabra}


I would like to express my enormous gratitude to my supervisors, Xavier Serra and Frederic Font, not only for giving me the opportunity to do this work and for invaluable advice, but also for giving me the freedom to choose my own path. I would also like to thank my thesis committee members for thoroughly reading my dissertation, and making the Q&A section a nice and interesting discussion. Finally, I’d like to thank all MTG folks and everyone who I met in these past few years for their help and support along the way. It was fun! (More detailed acknowledgments in the thesis report - page VII).