Abstract

Walter Grandinetti

This document was produced with DocBook 5.0

Data mining, defined as the knowledge extraction process from vast volume of data, has began to play an important role in many application domains, resulting especially useful in medicine and biology, as well as in competitive domains such as commercial environments and sports activities.

The knowledge is represented as a set of relationships among data elements, usually known as patterns. It is important to identify from the set of patterns those that best sum up the relations among the data, called interesting patterns. Data mining face two major problems: First, establishing every possible relation among the data elements to produce patterns. Second, distinguishing whether a pattern can be catalogued as potentially interesting or not.

Besides being interesting, patterns occurring repeatedly may be used as classifiers of new instances. Set of patterns can be used to describe the elements present within each class, thus new instances can be sorted out according to their closeness (nearness) to the patterns describing each class.

Analogously, the classification of new instances can be done using patterns that just capture the differences between classes. The advantage of this approach is that the amount of patterns needed to describe such difference is significantly lower than the amount of patterns needed to describe each class. The patterns describing differences between classes are known as emerging patterns and they proved to be versatile even in other application domains such as early detection of anomalous situations (for instance, climatic changes, intruder detection, etc).

The thesis main goal consist of showing a new way of mining emerging patterns in order to allow the enhancement of the tools that make use of such patterns.

Key words: Data Mining, Emerging Patterns, Maximal Patterns, Frequent Patterns, Classifiers