FSD50K: This is the main dataset collected with the Freesound Annotator. FSD50K contains 51,197 audio clips from Freesound, totalling over 100h of audio manually labeled using 200 classes drawn from the AudioSet Ontology. To our knowledge, this is the largest fully-open dataset of human-labeled sound events ever released. It includes a development set (train and validation) and an evaluation set. A bunch of additional metadata is provided, allowing a variety of sound event research tasks. Check all the details in our journal preprint!
FSDKaggle2019: Over 100h of audio distributed in 80 classes of everyday sounds. Consists of a curated train set from FSD (4970 clips, 10.5h), a noisy set from Flickr (19,815 clips, 80h), and a test set from FSD (4481 clips, 12.9h). It was collected for the DCASE2019 Challenge Task2: Audio tagging with noisy labels and minimal supervision. The dataset is described in our paper Audio tagging with noisy labels and minimal supervision, and allows development and evaluation of machine listening methods in conditions of label noise, minimal supervision, and real-world acoustic mismatch.
FSDKaggle2018: 11k audio clips and 18h of training data unequally distributed in 41 classes of everyday sounds. It was collected for the DCASE2018 Challenge Task2: General-purpose tagging of Freesound audio with AudioSet labels. The dataset is described in our paper General-purpose tagging of Freesound audio with AudioSet labels: task description, dataset, and baseline.
FSDnoisy18k: an audio dataset collected with the aim of fostering the investigation of label noise in sound event classification. It contains 42.5 hours of audio across 20 sound classes, including a small amount of manually-labeled data and a larger quantity of real-world noisy data. Please check its companion site for a detailed dataset description and download, and our ICASSP 2019 paper for an evaluation of noise-robust loss functions using FSDnoisy18k.