ArbDialectID at MADAR Shared Task 1: Language Modelling and Ensemble Learning for Fine Grained Arabic Dialect Identification pdf

تفاصيل الدراسة

ArbDialectID at MADAR Shared Task 1: Language Modelling and Ensemble Learning for Fine Grained Arabic Dialect Identification pdf

ArbDialectID at MADAR Shared Task 1: Language Modelling and Ensemble Learning for Fine Grained Arabic Dialect Identification pdf

ملخص الدراسة:

In this paper, we present a Dialect Identification system (ArbDialectID) that competed at Task 1 of the MADAR shared task, MADARTravel Domain Dialect Identification. We build a course and a fine-grained identification model to predict the label (corresponding to a dialect of Arabic) of a given text. We build two language models by extracting features at two levels (words and characters). We firstly build a coarse identification model to classify each sentence into one out of six dialects, then use this label as a feature for the fine-grained model that classifies the sentence among 26 dialects from different Arab cities, after that we apply ensemble voting classifier on both sub-systems. Our system ranked 1st that achieving an f-score of 67.32%. Both the models and our feature engineering tools are made available to the research community.

توثيق المرجعي (APA)

خصائص الدراسة

  • المؤلف

    Abu Kwaik, Kathrein

    Saad, Motaz K

  • سنة النشر

    2019-08

  • الناشر:

    Association for Computational Linguistics

  • المصدر:

    المستودع الرقمي للجامعة الإسلامية بغزة

  • نوع المحتوى:

    Conference Paper

  • اللغة:

    English

  • محكمة:

    نعم

  • الدولة:

    فلسطين

  • النص:

    دراسة كاملة

  • نوع الملف:

    pdf

0المراجعات

أترك تقييمك

درجة تقييم