Talk: Tesselation of Concepts by Spatial Clustering in Symbolic Data Analysis


First we recall the general framework of Symbolic Data Analysis (see Bock, Diday and al (2000)) and its relation to concept analysis. Then, we present several kinds of compatibility between a tesselation and a dissimilarity which allow for a spatial representation of a clustering pyramid where each level is associated to a concept. In Symbolic Data Analysis, the entries of the input data table (Units x Descriptors) are sets of categories or numbers, intervals or probability distributions. This kind of descriptors (so-called symbolic variables) is better suited than the standard numerical or categorical ones, for characterizing a "concept" (as a town, an insurance company, a species of animals) by its intent. In the SODAS software, the description of the concepts can be "native" (i.e. given by expert knowledge) or induced from their extent.

The extent of a concept is a class of standard units called "individuals" (as the inhabitant of a town, a set of insurance companies, a sample of animals of a given species, clusters obtained from a clustering). In addition to their standard description, relationships as logical rules and taxonomies between the standard descriptors of the individuals, can be added as input. Starting from a symbolic data table which describes concepts of a first order, new concepts can be obtained from a Galois lattice built on these data in order to obtain second order concepts (representing for instance, classes of towns or classes of species). Three kinds of compatibility between a tessellation and a dissimilarity are considered: convex, connected and "perfect". Then, we extend standard hierarchies and pyramids with linear support to hierarchies and pyramids with multidimensional support based on a tesselation where a) m edges defining m equal angles, meet at each node and b) the smallest cycles contain k edges of equal lengh. Instead of representing the extent of the concepts on a straight line as in standard hierarchical or pyramidal conceptual clustering, it is then possible to represent them with their symbolic interpretation as a surface or as a volume by using a tessellation of the numerous pixels of a screen.