A NEAR OPTIMAL SOLUTION FOR MAXIMUM RELEVANCE MINIMUM REDUNDANCY FEATURE SELECTION
MetadataShow full item record
In many real-world applications, a high number of features could result in noisy and redundant information, which could degrade the general performance of classification tasks. Feature selection techniques with the purpose of eliminating such features have been actively studied. In several information-theoretic approaches, such features are conventionally obtained by maximizing relevance to the class while the redundancy among the features used is minimized. This is an NP-hard problem and still remains to be a challenge. This research proposes an alternative feature selection strategy on binary text representation data based on the properties of submodular functions, with the purpose of providing a theoretical lower bound for finding a near optimal solution based on the Maximum Relevance-Minimum Redundancy criterion. In doing so, the proposed method can achieve a 2-approximation by a naive greedy search. Empirical experiments validated and benchmarked against different baseline methods show that the proposed technique is a promising approach on binary data in general.