Your conditions: 秦晓军
  • 基于改进CHI和带权ECE结合的特征选择方法

    Subjects: Computer Science >> Integration Theory of Computer Science submitted time 2018-07-09 Cooperative journals: 《计算机应用研究》

    Abstract: This paper analyzed the characteristics and deficiencies of chi-square statistics and expected cross-entropy methods for feature selection of text classification. In order to avoid the poor classification of traditional CHI and ECE methods on unbalanced data sets, this paper presented an improved CHI method (pCHI) by introducing adjustment factors and removing negative correlation influencing factors, and presented a weighted ECE method(ωECE) to compensate the disadvantages of the ECE method tending to select high-frequency features of weak distinguishing ability. After synthesizing the two improved methods, this paper further proposed a feature selection method based on combining improved CHI and weighted ECE (pCHIωECE) . Through comparative experiments, the precision rate and F1 value of the pCHIωECE method are both superior to those of the CHI, ECE, pCHI and ωECE methods, and moreover, the dimensionality and stability of the method are better.