NADI 2026  Task 2: Spoken Dialect Identification
LiveAI & MLAR / VR

NADI 2026 Task 2: Spoken Dialect Identification

Part of the Nuanced Arabic Dialect Identification 2026 shared task, this Spoken Dialect Identification task focuses on out-of-domain dialect identification, where the final test se...

prsullOrganizer prsull
Official site

About this hackathon

Part of the Nuanced Arabic Dialect Identification 2026 shared task, this Spoken Dialect Identification task focuses on out-of-domain dialect identification, where the final test set will be a blind set from an unknown domain. This year we focus on an out-of-domain Spoken dialect ID task. Language and dialect ID models may be somewhat prone to overfitting to a training domain, limiting their applicability in real world scenarios. This blind domain evaluation aims to test the generalizability of these models. For our baseline we provide a training script to finetune a pretrained ECAPA-TDNN language ID system on a 200hr subset of the ADI-20 dataset. Training is unrestricted, and participants are free to train on the full ADI-17/20 datasets. Because this is a blind out-of-domain evaluation, we encourage participants to consider evaluating their models on selected data from other domains such as radio, read speech, conversational telephone etc.

Tracks

General Track

Part of the Nuanced Arabic Dialect Identification 2026 shared task, this Spoken Dialect Identification task focuses on out-of-domain dialect identification, where the final test set will be a blind set from an unknown domain. This year we focus on an out-of-domain Spoken dialect ID task. Language and dialect ID models may be somewhat prone to overfitting to a training domain, limiting their applicability in real world scenarios. This blind domain evaluation aims to test the generalizability of t

Prizes

1

Project Prize

Part of the Nuanced Arabic Dialect Identification 2026 shared task, this Spoken Dialect Identification task focuses on out-of-domain dialect identification, where the final test set will be a blind set from an unknown domain. This year we focus on an out-of-domain Spoken dialect ID task. Language and dialect ID models may be somewhat prone to overfitting to a training domain, limiting their applicability in real world scenarios. This blind domain evaluation aims to test the generalizability of t

$1,000

Schedule

  1. Jun 16, 04:00 PM

Tags

#Codabench#AI#Competition#competition