Details der Publikation - Estimating amino acid substitution models from genome datasets

Estimating amino acid substitution models from genome datasets : a simulation study on the performance of estimated models

© The Author(s) 2023. Published by Oxford University Press on behalf of the European Society of Evolutionary Biology. All rights reserved. For permissions, please e-mail: journals.permissionsoup.com..

Estimating parameters of amino acid substitution models is a crucial task in bioinformatics. The maximum likelihood (ML) approach has been proposed to estimate amino acid substitution models from large datasets. The quality of newly estimated models is normally assessed by comparing with the existing models in building ML trees. Two important questions remained are the correlation of the estimated models with the true models and the required size of the training datasets to estimate reliable models. In this article, we performed a simulation study to answer these two questions based on simulated data. We simulated genome datasets with different numbers of genes/alignments based on predefined models (called true models) and predefined trees (called true trees). The simulated datasets were used to estimate amino acid substitution model using the ML estimation methods. Our experiments showed that models estimated by the ML methods from simulated datasets with more than 100 genes have high correlations with the true models. The estimated models performed well in building ML trees in comparison with the true models. The results suggest that amino acid substitution models estimated by the ML methods from large genome datasets are a reliable tool for analyzing amino acid sequences.

Medienart:	E-Artikel

Erscheinungsjahr:	2024
Erschienen:	2024

Enthalten in:	Zur Gesamtaufnahme - volume:37
Enthalten in:	Journal of evolutionary biology - 37(2024), 2 vom: 14. Feb., Seite 256-265

Sprache:	Englisch

Beteiligte Personen:	Tinh, Nguyen Huy [VerfasserIn] Dang, Cuong Cao [VerfasserIn] Vinh, Le Sy [VerfasserIn]

Links:	Volltext

Themen:	Amino acid substitution models Journal Article Maximum likelihood estimation methods Simulated amino acid data Time-nonreversible models Time-reversible models

Anmerkungen:	Date Completed 19.02.2024 Date Revised 19.02.2024 published: Print Citation Status MEDLINE

doi:	10.1093/jeb/voad017

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM368563189

Internformat


LEADER	01000caa a22002652 4500
001	NLM368563189
003	DE-627
005	20240219232136.0
007	cr uuu---uuuuu
008	240217s2024 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1093/jeb/voad017 \|2 doi
028	5	2	\|a pubmed24n1299.xml
035			\|a (DE-627)NLM368563189
035			\|a (NLM)38366253
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Tinh, Nguyen Huy \|e verfasserin \|4 aut
245	1	0	\|a Estimating amino acid substitution models from genome datasets \|b a simulation study on the performance of estimated models
264		1	\|c 2024
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 19.02.2024
500			\|a Date Revised 19.02.2024
500			\|a published: Print
500			\|a Citation Status MEDLINE
520			\|a © The Author(s) 2023. Published by Oxford University Press on behalf of the European Society of Evolutionary Biology. All rights reserved. For permissions, please e-mail: journals.permissionsoup.com.
520			\|a Estimating parameters of amino acid substitution models is a crucial task in bioinformatics. The maximum likelihood (ML) approach has been proposed to estimate amino acid substitution models from large datasets. The quality of newly estimated models is normally assessed by comparing with the existing models in building ML trees. Two important questions remained are the correlation of the estimated models with the true models and the required size of the training datasets to estimate reliable models. In this article, we performed a simulation study to answer these two questions based on simulated data. We simulated genome datasets with different numbers of genes/alignments based on predefined models (called true models) and predefined trees (called true trees). The simulated datasets were used to estimate amino acid substitution model using the ML estimation methods. Our experiments showed that models estimated by the ML methods from simulated datasets with more than 100 genes have high correlations with the true models. The estimated models performed well in building ML trees in comparison with the true models. The results suggest that amino acid substitution models estimated by the ML methods from large genome datasets are a reliable tool for analyzing amino acid sequences
650		4	\|a Journal Article
650		4	\|a amino acid substitution models
650		4	\|a maximum likelihood estimation methods
650		4	\|a simulated amino acid data
650		4	\|a time-nonreversible models
650		4	\|a time-reversible models
700	1		\|a Dang, Cuong Cao \|e verfasserin \|4 aut
700	1		\|a Vinh, Le Sy \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t Journal of evolutionary biology \|d 1995 \|g 37(2024), 2 vom: 14. Feb., Seite 256-265 \|w (DE-627)NLM087794160 \|x 1010-061X \|7 nnns
773	1	8	\|g volume:37 \|g year:2024 \|g number:2 \|g day:14 \|g month:02 \|g pages:256-265
856	4	0	\|u http://dx.doi.org/10.1093/jeb/voad017 \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 37 \|j 2024 \|e 2 \|b 14 \|c 02 \|h 256-265

Estimating amino acid substitution models from genome datasets : a simulation study on the performance of estimated models

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände