Details der Publikation - Explainable deep neural networks for novel viral genome prediction

Explainable deep neural networks for novel viral genome prediction

Abstract Viral infection causes a wide variety of human diseases including cancer and COVID-19. Viruses invade host cells and associate with host molecules, potentially disrupting the normal function of hosts that leads to fatal diseases. Novel viral genome prediction is crucial for understanding the complex viral diseases like AIDS and Ebola. While most existing computational techniques classify viral genomes, the efficiency of the classification depends solely on the structural features extracted. The state-of-the-art DNN models achieved excellent performance by automatic extraction of classification features, but the degree of model explainability is relatively poor. During model training for viral prediction, proposed CNN, CNN-LSTM based methods (EdeepVPP, EdeepVPP-hybrid) automatically extracts features. EdeepVPP also performs model interpretability in order to extract the most important patterns that cause viral genomes through learned filters. It is an interpretable CNN model that extracts vital biologically relevant patterns (features) from feature maps of viral sequences. The EdeepVPP-hybrid predictor outperforms all the existing methods by achieving 0.992 mean AUC-ROC and 0.990 AUC-PR on 19 human metagenomic contig experiment datasets using 10-fold cross-validation. We evaluate the ability of CNN filters to detect patterns across high average activation values. To further asses the robustness of EdeepVPP model, we perform leave-one-experiment-out cross-validation. It can work as a recommendation system to further analyze the raw sequences labeled as ‘unknown’ by alignment-based methods. We show that our interpretable model can extract patterns that are considered to be the most important features for predicting virus sequences through learned filters..

Medienart:	E-Artikel

Erscheinungsjahr:	2021
Erschienen:	2021

Enthalten in:	Zur Gesamtaufnahme - volume:52
Enthalten in:	Applied intelligence - 52(2021), 3 vom: 25. Juni, Seite 3002-3017

Sprache:	Englisch

Beteiligte Personen:	Dasari, Chandra Mohan [VerfasserIn] Bhukya, Raju [VerfasserIn]

Links:	Volltext [lizenzpflichtig]

BKL:	54.72$jKünstliche Intelligenz 30.20$jNichtlineare Dynamik
Themen:	Convolution neural network Interpretable Learned filters Motif Splice sites Splicing

RVK:	RVK Klassifikation ELIB23 ELIB09

Anmerkungen:	© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021

doi:	10.1007/s10489-021-02572-3

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	OLC2129367836

Internformat


LEADER	01000naa a22002652 4500
001	OLC2129367836
003	DE-627
005	20230505213419.0
007	cr uuu---uuuuu
008	230505s2021 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1007/s10489-021-02572-3 \|2 doi
035			\|a (DE-627)OLC2129367836
035			\|a (DE-He213)s10489-021-02572-3-e
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
082	0	4	\|a 004 \|q VZ
084			\|a ELIB23 \|q VZ \|2 rvk
084			\|a ELIB09 \|q VZ \|2 rvk
084			\|a 54.72$jKünstliche Intelligenz \|2 bkl
084			\|a 30.20$jNichtlineare Dynamik \|2 bkl
100	1		\|a Dasari, Chandra Mohan \|e verfasserin \|0 (orcid)0000-0002-0515-4979 \|4 aut
245	1	0	\|a Explainable deep neural networks for novel viral genome prediction
264		1	\|c 2021
336			\|a Text \|b txt \|2 rdacontent
337			\|a Computermedien \|b c \|2 rdamedia
338			\|a Online-Ressource \|b cr \|2 rdacarrier
500			\|a © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021
520			\|a Abstract Viral infection causes a wide variety of human diseases including cancer and COVID-19. Viruses invade host cells and associate with host molecules, potentially disrupting the normal function of hosts that leads to fatal diseases. Novel viral genome prediction is crucial for understanding the complex viral diseases like AIDS and Ebola. While most existing computational techniques classify viral genomes, the efficiency of the classification depends solely on the structural features extracted. The state-of-the-art DNN models achieved excellent performance by automatic extraction of classification features, but the degree of model explainability is relatively poor. During model training for viral prediction, proposed CNN, CNN-LSTM based methods (EdeepVPP, EdeepVPP-hybrid) automatically extracts features. EdeepVPP also performs model interpretability in order to extract the most important patterns that cause viral genomes through learned filters. It is an interpretable CNN model that extracts vital biologically relevant patterns (features) from feature maps of viral sequences. The EdeepVPP-hybrid predictor outperforms all the existing methods by achieving 0.992 mean AUC-ROC and 0.990 AUC-PR on 19 human metagenomic contig experiment datasets using 10-fold cross-validation. We evaluate the ability of CNN filters to detect patterns across high average activation values. To further asses the robustness of EdeepVPP model, we perform leave-one-experiment-out cross-validation. It can work as a recommendation system to further analyze the raw sequences labeled as ‘unknown’ by alignment-based methods. We show that our interpretable model can extract patterns that are considered to be the most important features for predicting virus sequences through learned filters.
650		4	\|a Splice sites
650		4	\|a Interpretable
650		4	\|a Convolution neural network
650		4	\|a Motif
650		4	\|a Splicing
650		4	\|a Learned filters
700	1		\|a Bhukya, Raju \|4 aut
773	0	8	\|i Enthalten in \|t Applied intelligence \|d Springer US, 1991 \|g 52(2021), 3 vom: 25. Juni, Seite 3002-3017 \|h Online-Ressource \|w (DE-627)271180919 \|w (DE-600)1479519-X \|w (DE-576)102669074 \|x 1573-7497 \|7 nnns
773	1	8	\|g volume:52 \|g year:2021 \|g number:3 \|g day:25 \|g month:06 \|g pages:3002-3017
856	4	0	\|u https://dx.doi.org/10.1007/s10489-021-02572-3 \|z lizenzpflichtig \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a SYSFLAG_A
912			\|a GBV_OLC
912			\|a GBV_ILN_11
912			\|a GBV_ILN_20
912			\|a GBV_ILN_22
912			\|a GBV_ILN_23
912			\|a GBV_ILN_24
912			\|a GBV_ILN_31
912			\|a GBV_ILN_32
912			\|a GBV_ILN_39
912			\|a GBV_ILN_40
912			\|a GBV_ILN_60
912			\|a GBV_ILN_62
912			\|a GBV_ILN_63
912			\|a GBV_ILN_69
912			\|a GBV_ILN_70
912			\|a GBV_ILN_73
912			\|a GBV_ILN_74
912			\|a GBV_ILN_90
912			\|a GBV_ILN_95
912			\|a GBV_ILN_100
912			\|a GBV_ILN_101
912			\|a GBV_ILN_105
912			\|a GBV_ILN_110
912			\|a GBV_ILN_120
912			\|a GBV_ILN_138
912			\|a GBV_ILN_150
912			\|a GBV_ILN_151
912			\|a GBV_ILN_152
912			\|a GBV_ILN_161
912			\|a GBV_ILN_170
912			\|a GBV_ILN_171
912			\|a GBV_ILN_187
912			\|a GBV_ILN_213
912			\|a GBV_ILN_224
912			\|a GBV_ILN_230
912			\|a GBV_ILN_250
912			\|a GBV_ILN_281
912			\|a GBV_ILN_285
912			\|a GBV_ILN_293
912			\|a GBV_ILN_370
912			\|a GBV_ILN_602
912			\|a GBV_ILN_636
912			\|a GBV_ILN_702
912			\|a GBV_ILN_2001
912			\|a GBV_ILN_2003
912			\|a GBV_ILN_2004
912			\|a GBV_ILN_2005
912			\|a GBV_ILN_2006
912			\|a GBV_ILN_2007
912			\|a GBV_ILN_2008
912			\|a GBV_ILN_2009
912			\|a GBV_ILN_2010
912			\|a GBV_ILN_2011
912			\|a GBV_ILN_2014
912			\|a GBV_ILN_2015
912			\|a GBV_ILN_2020
912			\|a GBV_ILN_2021
912			\|a GBV_ILN_2025
912			\|a GBV_ILN_2026
912			\|a GBV_ILN_2027
912			\|a GBV_ILN_2031
912			\|a GBV_ILN_2034
912			\|a GBV_ILN_2037
912			\|a GBV_ILN_2038
912			\|a GBV_ILN_2039
912			\|a GBV_ILN_2044
912			\|a GBV_ILN_2048
912			\|a GBV_ILN_2049
912			\|a GBV_ILN_2055
912			\|a GBV_ILN_2057
912			\|a GBV_ILN_2059
912			\|a GBV_ILN_2061
912			\|a GBV_ILN_2064
912			\|a GBV_ILN_2065
912			\|a GBV_ILN_2068
912			\|a GBV_ILN_2088
912			\|a GBV_ILN_2093
912			\|a GBV_ILN_2106
912			\|a GBV_ILN_2107
912			\|a GBV_ILN_2108
912			\|a GBV_ILN_2110
912			\|a GBV_ILN_2111
912			\|a GBV_ILN_2112
912			\|a GBV_ILN_2113
912			\|a GBV_ILN_2118
912			\|a GBV_ILN_2129
912			\|a GBV_ILN_2143
912			\|a GBV_ILN_2144
912			\|a GBV_ILN_2147
912			\|a GBV_ILN_2148
912			\|a GBV_ILN_2152
912			\|a GBV_ILN_2153
912			\|a GBV_ILN_2188
912			\|a GBV_ILN_2190
912			\|a GBV_ILN_2232
912			\|a GBV_ILN_2336
912			\|a GBV_ILN_2446
912			\|a GBV_ILN_2470
912			\|a GBV_ILN_2474
912			\|a GBV_ILN_2507
912			\|a GBV_ILN_2522
912			\|a GBV_ILN_2548
912			\|a GBV_ILN_4035
912			\|a GBV_ILN_4037
912			\|a GBV_ILN_4046
912			\|a GBV_ILN_4112
912			\|a GBV_ILN_4125
912			\|a GBV_ILN_4126
912			\|a GBV_ILN_4242
912			\|a GBV_ILN_4246
912			\|a GBV_ILN_4249
912			\|a GBV_ILN_4251
912			\|a GBV_ILN_4305
912			\|a GBV_ILN_4306
912			\|a GBV_ILN_4307
912			\|a GBV_ILN_4313
912			\|a GBV_ILN_4322
912			\|a GBV_ILN_4323
912			\|a GBV_ILN_4324
912			\|a GBV_ILN_4325
912			\|a GBV_ILN_4326
912			\|a GBV_ILN_4328
912			\|a GBV_ILN_4333
912			\|a GBV_ILN_4334
912			\|a GBV_ILN_4335
912			\|a GBV_ILN_4336
912			\|a GBV_ILN_4338
912			\|a GBV_ILN_4393
912			\|a GBV_ILN_4700
936	r	v	\|a ELIB23
936	r	v	\|a ELIB09
936	b	k	\|a 54.72$jKünstliche Intelligenz \|q VZ \|0 10641240X \|0 (DE-625)10641240X
936	b	k	\|a 30.20$jNichtlineare Dynamik \|q VZ \|0 106418947 \|0 (DE-625)106418947
951			\|a AR
952			\|d 52 \|j 2021 \|e 3 \|b 25 \|c 06 \|h 3002-3017

Explainable deep neural networks for novel viral genome prediction

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände