Inferring the Probability of the Derived vs. the Ancestral Allelic State at a Polymorphic Site
Copyright © 2018 Keightley and Jackson..
It is known that the allele ancestral to the variation at a polymorphic site cannot be assigned with certainty, and that the most frequently used method to assign the ancestral state-maximum parsimony-is prone to misinference. Estimates of counts of sites that have a certain number of copies of the derived allele in a sample (the unfolded site frequency spectrum, uSFS) made by parsimony are therefore also biased. We previously developed a maximum likelihood method to estimate the uSFS for a focal species using information from two outgroups while assuming simple models of nucleotide substitution. Here, we extend this approach to allow multiple outgroups (implemented for three outgroups), potentially any phylogenetic tree topology, and more complex models of nucleotide substitution. We find, however, that two outgroups and the Kimura two-parameter model are adequate for uSFS inference in most cases. We show that using parsimony to infer the ancestral state at a specific site seriously breaks down in two situations. The first is where the outgroups provide no information about the ancestral state of variation in the focal species. In this case, nucleotide variation will be underestimated if such sites are excluded. The second is where the minor allele in the focal species agrees with the allelic state of the outgroups. In this situation, parsimony tends to overestimate the probability of the major allele being derived, because it fails to account for the fact that sites with a high frequency of the derived allele tend to be rare. We present a method that corrects this deficiency and is capable of providing nearly unbiased estimates of ancestral state probabilities on a site-by-site basis and the uSFS.
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2018 |
---|---|
Erschienen: |
2018 |
Enthalten in: |
Zur Gesamtaufnahme - volume:209 |
---|---|
Enthalten in: |
Genetics - 209(2018), 3 vom: 01. Juli, Seite 897-906 |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Keightley, Peter D [VerfasserIn] |
---|
Links: |
---|
Themen: |
Ancestral allele |
---|
Anmerkungen: |
Date Completed 09.10.2018 Date Revised 14.11.2018 published: Print-Electronic Citation Status MEDLINE |
---|
doi: |
10.1534/genetics.118.301120 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM284155446 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM284155446 | ||
003 | DE-627 | ||
005 | 20231225042545.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231225s2018 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1534/genetics.118.301120 |2 doi | |
028 | 5 | 2 | |a pubmed24n0947.xml |
035 | |a (DE-627)NLM284155446 | ||
035 | |a (NLM)29769282 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Keightley, Peter D |e verfasserin |4 aut | |
245 | 1 | 0 | |a Inferring the Probability of the Derived vs. the Ancestral Allelic State at a Polymorphic Site |
264 | 1 | |c 2018 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 09.10.2018 | ||
500 | |a Date Revised 14.11.2018 | ||
500 | |a published: Print-Electronic | ||
500 | |a Citation Status MEDLINE | ||
520 | |a Copyright © 2018 Keightley and Jackson. | ||
520 | |a It is known that the allele ancestral to the variation at a polymorphic site cannot be assigned with certainty, and that the most frequently used method to assign the ancestral state-maximum parsimony-is prone to misinference. Estimates of counts of sites that have a certain number of copies of the derived allele in a sample (the unfolded site frequency spectrum, uSFS) made by parsimony are therefore also biased. We previously developed a maximum likelihood method to estimate the uSFS for a focal species using information from two outgroups while assuming simple models of nucleotide substitution. Here, we extend this approach to allow multiple outgroups (implemented for three outgroups), potentially any phylogenetic tree topology, and more complex models of nucleotide substitution. We find, however, that two outgroups and the Kimura two-parameter model are adequate for uSFS inference in most cases. We show that using parsimony to infer the ancestral state at a specific site seriously breaks down in two situations. The first is where the outgroups provide no information about the ancestral state of variation in the focal species. In this case, nucleotide variation will be underestimated if such sites are excluded. The second is where the minor allele in the focal species agrees with the allelic state of the outgroups. In this situation, parsimony tends to overestimate the probability of the major allele being derived, because it fails to account for the fact that sites with a high frequency of the derived allele tend to be rare. We present a method that corrects this deficiency and is capable of providing nearly unbiased estimates of ancestral state probabilities on a site-by-site basis and the uSFS | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
650 | 4 | |a ancestral allele | |
650 | 4 | |a derived allele | |
650 | 4 | |a misinference | |
650 | 4 | |a nucleotide polymorphism | |
650 | 4 | |a parsimony | |
650 | 4 | |a unfolded site frequency spectrum | |
700 | 1 | |a Jackson, Benjamin C |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Genetics |d 1916 |g 209(2018), 3 vom: 01. Juli, Seite 897-906 |w (DE-627)NLM000003506 |x 1943-2631 |7 nnns |
773 | 1 | 8 | |g volume:209 |g year:2018 |g number:3 |g day:01 |g month:07 |g pages:897-906 |
856 | 4 | 0 | |u http://dx.doi.org/10.1534/genetics.118.301120 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 209 |j 2018 |e 3 |b 01 |c 07 |h 897-906 |