CS 68191 Masters Seminar / CS 89191 Doctoral Seminar
Spring 2007



Masters Student Presentation

Data Discretization Techniques

Chibuike Muoh



Data discretization is defined as a process of converting continuous data attributes values to a finite set of intervals with minimal loss of information. Several discretization methods exist in the literature ranging from simple methods such as equal frequency/width to more complicated methods involving information theoretic measures and statistical measures of data dependency. But generally most existing discretization methods can be broadly classified into two categories based on the approach, splitting and merging. In this presentation we introduce discretization as a preprocessing task for data classification problems and highlight some of the different approach to discretization. We also briefly describe a new novel discretization technique using a dynamic programming with an objective function that can be used evaluate the different discretization methods.