We’re glad to announce the release of FSD50K, the new open dataset of human-labeled sound events. FSD50K contains over 51k Freesound audio clips, totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. To our knowledge, this is the largest fully-open dataset of human-labeled sound events, and modestly the second largest after AudioSet.



FSD50K’s most important characteristics:

  • FSD50K contains 51,197 audio clips from Freesound, totalling 108.3 hours of multi-labeled audio
  • The dataset encompasses 200 sound classes hierarchically organized with a subset of the AudioSet Ontology, allowing development and evaluation of large-vocabulary machine listening methods
  • The audio content is composed mainly of sound events produced by physical sound sources, including human sounds, sounds of things, animals, natural sounds, musical instruments and more
  • The acoustic material has been manually labeled using the Freesound Annotator platform
  • Clips are of variable length (0.3 to 30s), and ground truth labels are provided at the clip-level (i.e., weak labels)
  • All clips are provided as uncompressed PCM 16 bit 44.1 kHz mono audio files
  • Audio clips are split into a development set (41k clips / 80h, in turn split into train and validation) and an evaluation set (10k clips / 28h)
  • In addition to audio clips and ground truth, additional metadata is made available (including raw annotations, sound predominance ratings, Freesound metadata, and more), allowing a variety of sound event research tasks
  • All these resources are licensed under Creative Commons licenses, which allow sharing and reuse


Interested? Go ahead and check out all the resources we’ve just released:

Also, we will soon publish a blog post. Stay up-to-date about FSD50K by subscribing to the freesound-annotator Google Group. We hope all these resources are useful for the community! FSD50K has been created at the Music Technology Group of Universitat Pompeu Fabra. This effort was kindly sponsored by two Google Faculty Research Awards 2017 and 2018.