Mottle : Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent
Copyright: © 2024 Prusokiene et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited..
Current tools for estimating the substitution distance between two related sequences struggle to remain accurate at a high divergence. Difficulties at distant homologies, such as false seeding and over-alignment, create a high barrier for the development of a stable estimator. This is especially true for viral genomes, which carry a high rate of mutation, small size, and sparse taxonomy. Developing an accurate substitution distance measure would help to elucidate the relationship between highly divergent sequences, interrogate their evolutionary history, and better facilitate the discovery of new viral genomes. To tackle these problems, we propose an approach that uses short-read mappers to create whole-genome maps, and gradient descent to isolate the homologous fraction and calculate the final distance value. We implement this approach as Mottle. With the use of simulated and biological sequences, Mottle was able to remain stable to 0.66-0.96 substitutions per base pair and identify viral outgroup genomes with 95% accuracy at the family-order level. Our results indicate that Mottle performs as well as existing programs in identifying taxonomic relationships, with more accurate numerical estimation of genomic distance over greater divergences. By contrast, one limitation is a reduced numerical accuracy at low divergences, and on genomes where insertions and deletions are uncommon, when compared to alternative approaches. We propose that Mottle may therefore be of particular interest in the study of viruses, viral relationships, and notably for viral discovery platforms, helping in benchmarking of homology search tools and defining the limits of taxonomic classification methods. The code for Mottle is available at https://github.com/tphoward/Mottle_Repo.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2024 |
---|---|
Erschienen: |
2024 |
Enthalten in: |
Zur Gesamtaufnahme - volume:19 |
---|---|
Enthalten in: |
PloS one - 19(2024), 3 vom: 21., Seite e0298834 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Prusokiene, Alisa [VerfasserIn] |
---|
Links: |
---|
Themen: |
---|
Anmerkungen: |
Date Completed 25.03.2024 Date Revised 25.03.2024 published: Electronic-eCollection Citation Status MEDLINE |
---|
doi: |
10.1371/journal.pone.0298834 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM370025164 |
---|
LEADER | 01000caa a22002652 4500 | ||
---|---|---|---|
001 | NLM370025164 | ||
003 | DE-627 | ||
005 | 20240325235330.0 | ||
007 | cr uuu---uuuuu | ||
008 | 240323s2024 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1371/journal.pone.0298834 |2 doi | |
028 | 5 | 2 | |a pubmed24n1346.xml |
035 | |a (DE-627)NLM370025164 | ||
035 | |a (NLM)38512939 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Prusokiene, Alisa |e verfasserin |4 aut | |
245 | 1 | 0 | |a Mottle |b Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent |
264 | 1 | |c 2024 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 25.03.2024 | ||
500 | |a Date Revised 25.03.2024 | ||
500 | |a published: Electronic-eCollection | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Copyright: © 2024 Prusokiene et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. | ||
520 | |a Current tools for estimating the substitution distance between two related sequences struggle to remain accurate at a high divergence. Difficulties at distant homologies, such as false seeding and over-alignment, create a high barrier for the development of a stable estimator. This is especially true for viral genomes, which carry a high rate of mutation, small size, and sparse taxonomy. Developing an accurate substitution distance measure would help to elucidate the relationship between highly divergent sequences, interrogate their evolutionary history, and better facilitate the discovery of new viral genomes. To tackle these problems, we propose an approach that uses short-read mappers to create whole-genome maps, and gradient descent to isolate the homologous fraction and calculate the final distance value. We implement this approach as Mottle. With the use of simulated and biological sequences, Mottle was able to remain stable to 0.66-0.96 substitutions per base pair and identify viral outgroup genomes with 95% accuracy at the family-order level. Our results indicate that Mottle performs as well as existing programs in identifying taxonomic relationships, with more accurate numerical estimation of genomic distance over greater divergences. By contrast, one limitation is a reduced numerical accuracy at low divergences, and on genomes where insertions and deletions are uncommon, when compared to alternative approaches. We propose that Mottle may therefore be of particular interest in the study of viruses, viral relationships, and notably for viral discovery platforms, helping in benchmarking of homology search tools and defining the limits of taxonomic classification methods. The code for Mottle is available at https://github.com/tphoward/Mottle_Repo | ||
650 | 4 | |a Journal Article | |
700 | 1 | |a Boonham, Neil |e verfasserin |4 aut | |
700 | 1 | |a Fox, Adrian |e verfasserin |4 aut | |
700 | 1 | |a Howard, Thomas P |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t PloS one |d 2006 |g 19(2024), 3 vom: 21., Seite e0298834 |w (DE-627)NLM167327399 |x 1932-6203 |7 nnns |
773 | 1 | 8 | |g volume:19 |g year:2024 |g number:3 |g day:21 |g pages:e0298834 |
856 | 4 | 0 | |u http://dx.doi.org/10.1371/journal.pone.0298834 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 19 |j 2024 |e 3 |b 21 |h e0298834 |