A new data analysis method based on feature linear combination

Copyright © 2019 Elsevier Inc. All rights reserved..

In biological data, feature relationships are complex and diverse, they could reflect physiological and pathological changes. Defining simple and efficient classification rules based on feature relationships is helpful for discriminating different conditions and studying disease mechanism. The popular data analysis method, k top scoring pairs (k-TSP), explores the feature relationship by focusing on the difference of the relative level of two features in different groups and classifies samples based on the exploration. To define more efficient classification rules, we propose a new data analysis method based on the linear combination of k > 0 top scoring pairs (LC-k-TSP). LC-k-TSP applies support vector machine (SVM) to define the best linear relationship of each feature pair, scores feature pairs by the discriminative abilities of the corresponding linear combinations and selects k disjoint top scoring pairs to construct an ensemble classifier. Experiments on twelve public datasets showed the superiority of LC-k-TSP over k-TSP which evaluates the relationship of every two features in the same way. The experiment also illustrated that LC-k-TSP performed similarly to SVM and random forest (RF) in accuracy rate. LC-k-TSP studies the own unique linear combination for each feature pair and defines simple classification rules, it is easy to explore the biomedical explanation. Finally, we applied LC-k-TSP to analyze the hepatocellular carcinoma (HCC) metabolomics data and define the simple classification rules for discrimination of different liver diseases. It obtained accuracy rates of 89.76% and 89.13% in distinguishing between small HCC and hepatic cirrhosis (CIR) groups as well as between HCC and CIR groups, superior to 87.99% and 80.35% by k-TSP. Hence, defining classification rules based on feature relationships is an effective way to analyze biological data. LC-k-TSP which checks different feature pairs by their corresponding unique best linear relationship has the superiority over k-TSP which checks each pair by the same linear relationship. Availability and implementation: http://www.402.dicp.ac.cn/download_ok_4.htm.

Medienart:

E-Artikel

Erscheinungsjahr:

2019

Erschienen:

2019

Enthalten in:

Zur Gesamtaufnahme - volume:94

Enthalten in:

Journal of biomedical informatics - 94(2019) vom: 15. Juni, Seite 103173

Sprache:

Englisch

Beteiligte Personen:

Lin, Xiaohui [VerfasserIn]
Zhang, Yanhui [VerfasserIn]
Li, Chao [VerfasserIn]
Wang, Jue [VerfasserIn]
Luo, Ping [VerfasserIn]
Zhou, Huiwei [VerfasserIn]

Links:

Volltext

Themen:

Classification
Feature relationship
Journal Article
Metabolomics
Research Support, Non-U.S. Gov't

Anmerkungen:

Date Completed 21.09.2020

Date Revised 21.09.2020

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1016/j.jbi.2019.103173

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM295865733