Licencja
On resilient feature selection: Computational foundations of r-C-reducts
Abstrakt (EN)
The task of feature selection is crucial for constructing prediction and classification models, resulting in their higher quality and interpretability. However, it is often neglected that some of selected features may become temporarily unavailable in a long-term timeframe, which can disable a pre-trained model and cause a big impact on business continuity. One approach is to rely on a collection of diverse feature subsets with their corresponding prediction models treated as an ensemble. Another approach is to search for feature sets with a guarantee of providing sufficient predictive power even if some of their elements are dropped. In this paper, we focus on that latter idea, referring to it as resilient feature selection. We discuss it using an example of the rough-set-based notion of approximate reduct – an irreducible subset of features providing a satisfactory level of information about the considered target variable. We study NP-hardness of the problem of finding minimal -reducts, i.e., irreducible subsets of features that assure the aforementioned level expressed by means of an information-preserving criterion function even after disallowing arbitrary r features. We discuss opportunities of exhaustive and heuristic search of feature subsets specified in this way. The discussed idea of resilience is surely more general and one may consider it as an extension of many other, not necessarily rough-set-based feature selection methods.