CS 68191 Masters Seminar / CS 89191 Doctoral Seminar
Spring 2007
Masters Student Presentation
Data Discretization Techniques
Chibuike Muoh
Data discretization is defined as a process of converting continuous
data attributes values to a finite set of intervals with minimal loss
of information. Several discretization methods exist in the literature
ranging from simple methods such as equal frequency/width to more
complicated methods involving information theoretic measures and
statistical measures of data dependency. But generally most existing
discretization methods can be broadly classified into two categories
based on the approach, splitting and merging. In this presentation we
introduce discretization as a preprocessing task for data
classification problems and highlight some of the different approach
to discretization. We also briefly describe a new novel discretization
technique using a dynamic programming with an objective function that
can be used evaluate the different discretization methods.