SLUE Tasks¶

Tasks¶

Task	Primary metric
Automatic Speech Recognition (ASR)	WER
Named Entity Recognition (NER)	F1
Sentiment Analysis	F1

Datasets¶

SLUE uses the VoxCeleb and VoxPopuli datasets.

We’ve diligently curated subsets of these datasets for fine-tuning and evaluation for SLUE tasks. You can take advantage of our redistribution, so you don’t need to download the entire (and large) dataset. With this dataset, we include the human annotation and transcription for SLUE tasks. All that’s required is to run the script and it will handle everything necessary - downloading and preprocessing included.

Here is a brief overview of the datasets. For more in-depth information, please refer to our paper.

Corpus	Size - utts (hours)			Tasks	License
Corpus	Fine-tune	Dev	Test	Tasks	License
SLUE-VoxPopuli	5,000 (14.5)	1,753 (5.0)	1,842 (4.9)	ASR, NER	CC0 (check complete license here)
SLUE-VoxCeleb	5,777 (12.8)	1,454 (3.2)	3,553 (7.8)	ASR, SA	CC-BY 4.0 (check complete license here)

SLUE Benchmark

SLUE Tasks¶

Tasks¶

Datasets¶