Feature Selection based on a new estimator of Intrinsic Dimension

Feature Selection based on a new estimator of Intrinsic Dimension

Recent breakthroughs in technology have radically improved our ability to collect and store data. As a consequence, the size of data sets has been increasing rapidly, which poses great challenges in terms of information and knowledge extraction. Traditional issues, such as the multi-scale variability of data or the presence of extremes, noise and outliers, become harder to handle. Besides, new pitfalls must be addressed. They mainly follow from the empty space phenomenon and the increase in computational efficiency requirements.

This project comes within the scope of data mining, which is an interdisciplinary subfield of computer science dealing with the above-mentioned problems. The main objective is to develop new tools, algorithms and methodologies using the (possibly fractal) intrinsic dimension of data in order to conduct fundamental tasks, such as spatial autocorrelation detection and quantification, clustering, dimensionality reduction and (supervised) feature selection.

Contributors: Jean Golay and Mikhail Kanevski