Generating bulk RNA-Seq gene expression data based on generative deep learning models and utilizing it for data augmentation

Copyright © 2023 Elsevier Ltd. All rights reserved..

Large-scale high-throughput transcriptome sequencing data holds significant value in biomedical research. However, practical challenges such as difficulty in sample acquisition often limit the availability of large sample sizes, leading to decreased reliability of the analysis results. In practice, generative deep learning models, such as Generative Adversarial Networks (GANs) and Diffusion Models (DMs), have been proven to generate realistic data and may be used to solve this promblem. In this study, we utilized bulk RNA-Seq gene expression data to construct different generative models with two data preprocessing methods: Min-Max-GAN, Z-Score-GAN, Min-Max-DM, and Z-Score-DM. We demonstrated that the generated data from the Min-Max-GAN model exhibited high similarity to real data, surpassing the performance of the other models significantly. Furthermore, we trained the models on the largest dataset available to date, achieving MMD (Maximum Mean Discrepancy) of 0.030 and 0.033 on the training and independent datasets, respectively. Through SHAP (SHapley Additive exPlanations) explanations of our generative model, we also enhanced our model's credibility. Finally, we applied the generated data to data augmentation and observed a significant improvement in the performance of classification models. In summary, this study establishes a GAN-based approach for generating bulk RNA-Seq gene expression data, which contributes to enhancing the performance and reliability of downstream tasks in high-throughput transcriptome analysis.

Medienart:

E-Artikel

Erscheinungsjahr:

2024

Erschienen:

2024

Enthalten in:

Zur Gesamtaufnahme - volume:169

Enthalten in:

Computers in biology and medicine - 169(2024) vom: 05. Feb., Seite 107828

Sprache:

Englisch

Beteiligte Personen:

Wang, Yinglun [VerfasserIn]
Chen, Qiurui [VerfasserIn]
Shao, Hongwei [VerfasserIn]
Zhang, Rongxin [VerfasserIn]
Shen, Han [VerfasserIn]

Links:

Volltext

Themen:

Deep learning
Generative learning
Journal Article
Machine learning
Transcriptome

Anmerkungen:

Date Completed 08.02.2024

Date Revised 08.02.2024

published: Print-Electronic

Citation Status MEDLINE

doi:

10.1016/j.compbiomed.2023.107828

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM36591813X