Symbolic Class Description with Interval Data

Mohamed Mehdi Limam, Universite Paris Dauphine, France
Edwin Diday, Universite Paris Dauphine, France
Suzanne Winsberg, IRCAM, Paris, France


We aim is to describe a class, C, from a given population, by partitioning it; each class of the partition is described by a conjunction of characteristic properties, and the class, C, is described by a disjunction of these conjunctions. We employ a stepwise top-down binary tree method. At each step we choose the best variable and its optimal splitting in order to optimize simultaneously a discrimination criterion furnished by a given prior partition of the population as well as a homogeneity criterion. Therefore the classes we obtain are homogenous with respect to the variables describing them, and of course will discriminated from each other with respect to these same variables, but in addition they will be discriminated from each other with respect to the prior partition. Not only does this approach combine both supervised and unsupervised learning, it also deals with a data table in which each cell contains an interval, so it deals with symbolic data, (see Bock and Diday, 2002). The algorithm may be extended or reduced to deal with other types of data, for example histogram type data, (see Vrac et al, 2003), and classical data. We also introduce a new stopping rule We illustrate the method on both simulated and real data.


H.H. Bock and E. Diday (Eds.): (2002) "Analysis of Symbolic Data",
Springer, Heidelberg.
Vrac. M., Diday, E., Winsberg, S., and Limam, M.M.: (2003), Symbolic Class
Description in " Data Analysis, Classification and Related Methods;
Proceedings of the 8th Conference of the IFCS", Springer, Heidelberg.