CS 69191: Masters Seminar
CS 89191: Doctoral Seminar
Spring 2009
Doctoral Student Presentation:
Feature Selection Techniques for Enhancing Text Categorization
Mohammed Al-Refai
This research aims to enhance text categorization by increasing
results accuracy (classification correctness), reducing required
classification time, and reducing dataset size (save memory). To
achieve these goals the classification process was divided into two
steps. In the first step, a technique that is called feature selection
is utilized to select a set of features from the dataset, where these
features reflects the semantic and meaning of the dataset; while
removing features that are redundant or meaningless. This was done
through introducing three feature selection methods namely, stemming,
light-stemming, and word clustering technique. In the second step,
text classification process was applied with the selected features
only. K-nearest neighbors' classifier was applied for categorizing
text documents. The following points are brief definitions for the
three introduced feature selection techniques: 1-Stemming: reduces
words to their stems. 2-Light-stemming: this introduced approach
doesn't produce the linguistic stem exactly, rather it removes most
frequent suffixes and prefixes. 3-Word clustering: clustering words
based on the Symantec relation between them. As a future work, there
are many ideas such as applying feature selection methods for web
mining, and developing a statistical feature selection method for
enhancing text categorization and information retrieval.