Details der Publikation - Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data

Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data

BACKGROUND: One goal of structural biology is to understand how a protein's 3-dimensional conformation determines its capacity to interact with potential ligands. In the case of small chemical ligands, deconstructing a static protein-ligand complex into its constituent atom-atom interactions is typically sufficient to rapidly predict ligand affinity with high accuracy (>70% correlation between predicted and experimentally-determined affinity), a fact that is exploited to support structure-based drug design. We recently found that protein-DNA/RNA affinity can also be predicted with high accuracy using extensions of existing techniques, but protein-protein affinity could not be predicted with >60% correlation, even when the protein-protein complex was available.

METHODS: X-ray and NMR structures of protein-protein complexes, their associated binding affinities and experimental conditions were obtained from different binding affinity and structural databases. Statistical models were implemented using a generalized linear model framework, including the experimental conditions as new model features. We evaluated the potential for new features to improve affinity prediction models by calculating the Pearson correlation between predicted and experimental binding affinities on the training and test data after model fitting and after cross-validation. Differences in accuracy were assessed using two-sample t test and nonparametric Mann-Whitney U test.

RESULTS: Here we evaluate a range of potential factors that may interfere with accurate protein-protein affinity prediction. We find that X-ray crystal resolution has the strongest single effect on protein-protein affinity prediction. Limiting our analyses to only high-resolution complexes (≤2.5 Å) increased the correlation between predicted and experimental affinity from 54 to 68% (p = 4.32x10-3). In addition, incorporating information on the experimental conditions under which affinities were measured (pH, temperature and binding assay) had significant effects on prediction accuracy. We also highlight a number of potential errors in large structure-affinity databases, which could affect both model training and accuracy assessment.

CONCLUSIONS: The results suggest that the accuracy of statistical models for protein-protein affinity prediction may be limited by the information present in databases used to train new models. Improving our capacity to integrate large-scale structural and functional information may be required to substantively advance our understanding of the general principles by which a protein's structure determines its function.

Medienart:	E-Artikel

Erscheinungsjahr:	2017
Erschienen:	2017

Enthalten in:	Zur Gesamtaufnahme - volume:18
Enthalten in:	BMC bioinformatics - 18(2017), Suppl 5 vom: 23. März, Seite 102

Sprache:	Englisch

Beteiligte Personen:	Dias, Raquel [VerfasserIn] Kolaczkowski, Bryan [VerfasserIn]

Links:	Volltext

Themen:	Binding affinity Intermolecular interactions Journal Article Machine learning Protein-protein Proteins Scoring functions

Anmerkungen:	Date Completed 07.09.2017 Date Revised 02.12.2018 published: Electronic Citation Status MEDLINE

doi:	10.1186/s12859-017-1533-z

funding:
Förderinstitution / Projekttitel:

PPN (Katalog-ID):	NLM270500162

Internformat


LEADER	01000naa a22002652 4500
001	NLM270500162
003	DE-627
005	20231224230703.0
007	cr uuu---uuuuu
008	231224s2017 xx \|\|\|\|\|o 00\| \|\|eng c
024	7		\|a 10.1186/s12859-017-1533-z \|2 doi
028	5	2	\|a pubmed24n0901.xml
035			\|a (DE-627)NLM270500162
035			\|a (NLM)28361672
040			\|a DE-627 \|b ger \|c DE-627 \|e rakwb
041			\|a eng
100	1		\|a Dias, Raquel \|e verfasserin \|4 aut
245	1	0	\|a Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data
264		1	\|c 2017
336			\|a Text \|b txt \|2 rdacontent
337			\|a ƒaComputermedien \|b c \|2 rdamedia
338			\|a ƒa Online-Ressource \|b cr \|2 rdacarrier
500			\|a Date Completed 07.09.2017
500			\|a Date Revised 02.12.2018
500			\|a published: Electronic
500			\|a Citation Status MEDLINE
520			\|a BACKGROUND: One goal of structural biology is to understand how a protein's 3-dimensional conformation determines its capacity to interact with potential ligands. In the case of small chemical ligands, deconstructing a static protein-ligand complex into its constituent atom-atom interactions is typically sufficient to rapidly predict ligand affinity with high accuracy (>70% correlation between predicted and experimentally-determined affinity), a fact that is exploited to support structure-based drug design. We recently found that protein-DNA/RNA affinity can also be predicted with high accuracy using extensions of existing techniques, but protein-protein affinity could not be predicted with >60% correlation, even when the protein-protein complex was available
520			\|a METHODS: X-ray and NMR structures of protein-protein complexes, their associated binding affinities and experimental conditions were obtained from different binding affinity and structural databases. Statistical models were implemented using a generalized linear model framework, including the experimental conditions as new model features. We evaluated the potential for new features to improve affinity prediction models by calculating the Pearson correlation between predicted and experimental binding affinities on the training and test data after model fitting and after cross-validation. Differences in accuracy were assessed using two-sample t test and nonparametric Mann-Whitney U test
520			\|a RESULTS: Here we evaluate a range of potential factors that may interfere with accurate protein-protein affinity prediction. We find that X-ray crystal resolution has the strongest single effect on protein-protein affinity prediction. Limiting our analyses to only high-resolution complexes (≤2.5 Å) increased the correlation between predicted and experimental affinity from 54 to 68% (p = 4.32x10-3). In addition, incorporating information on the experimental conditions under which affinities were measured (pH, temperature and binding assay) had significant effects on prediction accuracy. We also highlight a number of potential errors in large structure-affinity databases, which could affect both model training and accuracy assessment
520			\|a CONCLUSIONS: The results suggest that the accuracy of statistical models for protein-protein affinity prediction may be limited by the information present in databases used to train new models. Improving our capacity to integrate large-scale structural and functional information may be required to substantively advance our understanding of the general principles by which a protein's structure determines its function
650		4	\|a Journal Article
650		4	\|a Binding affinity
650		4	\|a Intermolecular interactions
650		4	\|a Machine learning
650		4	\|a Protein-protein
650		4	\|a Scoring functions
650		7	\|a Proteins \|2 NLM
700	1		\|a Kolaczkowski, Bryan \|e verfasserin \|4 aut
773	0	8	\|i Enthalten in \|t BMC bioinformatics \|d 2000 \|g 18(2017), Suppl 5 vom: 23. März, Seite 102 \|w (DE-627)NLM109215982 \|x 1471-2105 \|7 nnns
773	1	8	\|g volume:18 \|g year:2017 \|g number:Suppl 5 \|g day:23 \|g month:03 \|g pages:102
856	4	0	\|u http://dx.doi.org/10.1186/s12859-017-1533-z \|3 Volltext
912			\|a GBV_USEFLAG_A
912			\|a GBV_NLM
951			\|a AR
952			\|d 18 \|j 2017 \|e Suppl 5 \|b 23 \|c 03 \|h 102

Improving the accuracy of high-throughput protein-protein affinity prediction may require better training data

Zugang & Verfügbarkeit

Zugehörige Publikationen/Bände