EndedAI & MLAR / VR

NADI 2026 Task 2: Spoken Dialect Identification

Organizer: prsull; 7 submissions; Part of the Nuanced Arabic Dialect Identification 2026 shared task, this Spoken Dialect Identification task focuses on out-of-domain dialect ident...

Organizer prsull

Official site

About this hackathon

Part of the Nuanced Arabic Dialect Identification 2026 shared task, this Spoken Dialect Identification task focuses on out-of-domain dialect identification, where the final test set will be a blind set from an unknown domain. This year we focus on an out-of-domain Spoken dialect ID task. Language and dialect ID models may be somewhat prone to overfitting to a training domain, limiting their applicability in real world scenarios. This blind domain evaluation aims to test the generalizability of these models. For our baseline we provide a training script to finetune a pretrained ECAPA-TDNN language ID system on a 200hr subset of the ADI-20 dataset. Training is unrestricted, and participants are free to train on the full ADI-17/20 datasets. Because this is a blind out-of-domain evaluation, we encourage participants to consider evaluating their models on selected data from other domains such as radio, read speech, conversational telephone etc.

Tracks

General Track

Organizer: prsull; 7 submissions; Part of the Nuanced Arabic Dialect Identification 2026 shared task, this Spoken Dialect Identification task focuses on out-of-domain dialect identification, where the final test set will be a blind set from an unknown domain. This year we focus on an out-of-domain Spoken dialect ID task. Language and dialect ID models may be somewhat prone to overfitting to a training domain, limiting their applicability in real world scenarios. This blind domain evaluation aims to test the generalizability of these models. For our baseline we provide a training script to finetune a pretrained ECAPA-TDNN language ID system on a 200hr subset of the ADI-20 data