بناء مجذر للكلمات العربية لتصنيف الملفات النصية pdf
ملخص الدراسة:
This thesis proposes a new stemming algorithm that addresses the ambiguity, irregular words and broken plural problems in current stemming algorithms, which are divided to two approaches, the root stemming and the light stemming. The proposed algorithm will depend on introducing new rules of patterns which increase efficiency of identifying words. Such algorithm will contribute to enhanced efficiency and speed of information retrieval and search engines. By using these rules, it can determine whether the sequence of affixes is a part of the real word or not. Thus the ambiguity problem can be solved. A new Arabic IR tool has been developed which has many options using java programming language with JDK 1.6; it allows user to load any data set, choose from any included stemmers, choose from the eight normalization steps, define the set of constants like “prefixes, suffixes, stopwords”, text classification, make comparisons between stemmers and extract charts that show these comparisons. The new tool used to test the proposed stemmer and the results which has been derived using CNN, BBC and OSAC corpora show that the proposed stemmer increases accuracy of text classification to an average of 91.7% which is better than using Light 10 or Khoja which achieve average accuracy of 90.2 % and 89.17% respectively.
توثيق المرجعي (APA)
خصائص الدراسة
-
المؤلف
Zaalan, Mohamoud Eleyan Al
-
سنة النشر
2014
-
الناشر:
الجامعة الإسلامية - غزة
-
المصدر:
المستودع الرقمي للجامعة الإسلامية بغزة
-
نوع المحتوى:
رسالة ماجستير
-
اللغة:
English
-
محكمة:
نعم
-
الدولة:
فلسطين
-
النص:
دراسة كاملة
-
نوع الملف:
pdf