SLUE Tasks¶
Tasks¶
Task |
Primary metric |
---|---|
Automatic Speech Recognition (ASR) |
WER |
Named Entity Recognition (NER) |
F1 |
Sentiment Analysis |
F1 |
Datasets¶
SLUE uses the VoxCeleb and VoxPopuli datasets.
We’ve diligently curated subsets of these datasets for fine-tuning and evaluation for SLUE tasks. You can take advantage of our redistribution, so you don’t need to download the entire (and large) dataset. With this dataset, we include the human annotation and transcription for SLUE tasks. All that’s required is to run the script and it will handle everything necessary - downloading and preprocessing included.
Here is a brief overview of the datasets. For more in-depth information, please refer to our paper.
Corpus | Size - utts (hours) | Tasks | License | ||
---|---|---|---|---|---|
Fine-tune | Dev | Test | |||
SLUE-VoxPopuli | 5,000 (14.5) | 1,753 (5.0) | 1,842 (4.9) | ASR, NER | CC0 (check complete license here) |
SLUE-VoxCeleb | 5,777 (12.8) | 1,454 (3.2) | 3,553 (7.8) | ASR, SA | CC-BY 4.0 (check complete license here) |