Abstract:
Cluster analysis is an important tool in the exploration of large collections of data, revealing patterns and significant correlations in the data. The fuzzy approach to the clustering problem enhances the modeling capability as the results are expressed in soft clusters (instead of crisp clusters), where the data points may have partial memberships in several clusters. In this paper we will discuss about the most used fuzzy cluster analysis techniques and we will address an important issue: finding the optimal number of clusters. This problem is known as the cluster validity problem and is one of the most challenging aspects of fuzzy and classical cluster analysis. We will describe several methods and we will combine and compare them on several synthetic data sets.