Multi-modal deep learning improves grain yield prediction in wheat breeding by fusing genomics and phenomics
© The Author(s) 2023. Published by Oxford University Press..
MOTIVATION: Developing new crop varieties with superior performance is highly important to ensure robust and sustainable global food security. The speed of variety development is limited by long field cycles and advanced generation selections in plant breeding programs. While methods to predict yield from genotype or phenotype data have been proposed, improved performance and integrated models are needed.
RESULTS: We propose a machine learning model that leverages both genotype and phenotype measurements by fusing genetic variants with multiple data sources collected by unmanned aerial systems. We use a deep multiple instance learning framework with an attention mechanism that sheds light on the importance given to each input during prediction, enhancing interpretability. Our model reaches 0.754 ± 0.024 Pearson correlation coefficient when predicting yield in similar environmental conditions; a 34.8% improvement over the genotype-only linear baseline (0.559 ± 0.050). We further predict yield on new lines in an unseen environment using only genotypes, obtaining a prediction accuracy of 0.386 ± 0.010, a 13.5% improvement over the linear baseline. Our multi-modal deep learning architecture efficiently accounts for plant health and environment, distilling the genetic contribution and providing excellent predictions. Yield prediction algorithms leveraging phenotypic observations during training therefore promise to improve breeding programs, ultimately speeding up delivery of improved varieties.
AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/BorgwardtLab/PheGeMIL (code) and https://doi.org/doi:10.5061/dryad.kprr4xh5p (data).
Medienart: |
E-Artikel |
---|
Erscheinungsjahr: |
2023 |
---|---|
Erschienen: |
2023 |
Enthalten in: |
Zur Gesamtaufnahme - volume:39 |
---|---|
Enthalten in: |
Bioinformatics (Oxford, England) - 39(2023), 6 vom: 01. Juni |
Sprache: |
Englisch |
---|
Beteiligte Personen: |
Togninalli, Matteo [VerfasserIn] |
---|
Links: |
---|
Themen: |
Journal Article |
---|
Anmerkungen: |
Date Completed 15.06.2023 Date Revised 16.06.2023 published: Print Dryad: 10.5061/dryad.kprr4xh5p Citation Status MEDLINE |
---|
doi: |
10.1093/bioinformatics/btad336 |
---|
funding: |
|
---|---|
Förderinstitution / Projekttitel: |
|
PPN (Katalog-ID): |
NLM357233131 |
---|
LEADER | 01000naa a22002652 4500 | ||
---|---|---|---|
001 | NLM357233131 | ||
003 | DE-627 | ||
005 | 20231226072137.0 | ||
007 | cr uuu---uuuuu | ||
008 | 231226s2023 xx |||||o 00| ||eng c | ||
024 | 7 | |a 10.1093/bioinformatics/btad336 |2 doi | |
028 | 5 | 2 | |a pubmed24n1190.xml |
035 | |a (DE-627)NLM357233131 | ||
035 | |a (NLM)37220903 | ||
035 | |a (PII)btad336 | ||
040 | |a DE-627 |b ger |c DE-627 |e rakwb | ||
041 | |a eng | ||
100 | 1 | |a Togninalli, Matteo |e verfasserin |4 aut | |
245 | 1 | 0 | |a Multi-modal deep learning improves grain yield prediction in wheat breeding by fusing genomics and phenomics |
264 | 1 | |c 2023 | |
336 | |a Text |b txt |2 rdacontent | ||
337 | |a ƒaComputermedien |b c |2 rdamedia | ||
338 | |a ƒa Online-Ressource |b cr |2 rdacarrier | ||
500 | |a Date Completed 15.06.2023 | ||
500 | |a Date Revised 16.06.2023 | ||
500 | |a published: Print | ||
500 | |a Dryad: 10.5061/dryad.kprr4xh5p | ||
500 | |a Citation Status MEDLINE | ||
520 | |a © The Author(s) 2023. Published by Oxford University Press. | ||
520 | |a MOTIVATION: Developing new crop varieties with superior performance is highly important to ensure robust and sustainable global food security. The speed of variety development is limited by long field cycles and advanced generation selections in plant breeding programs. While methods to predict yield from genotype or phenotype data have been proposed, improved performance and integrated models are needed | ||
520 | |a RESULTS: We propose a machine learning model that leverages both genotype and phenotype measurements by fusing genetic variants with multiple data sources collected by unmanned aerial systems. We use a deep multiple instance learning framework with an attention mechanism that sheds light on the importance given to each input during prediction, enhancing interpretability. Our model reaches 0.754 ± 0.024 Pearson correlation coefficient when predicting yield in similar environmental conditions; a 34.8% improvement over the genotype-only linear baseline (0.559 ± 0.050). We further predict yield on new lines in an unseen environment using only genotypes, obtaining a prediction accuracy of 0.386 ± 0.010, a 13.5% improvement over the linear baseline. Our multi-modal deep learning architecture efficiently accounts for plant health and environment, distilling the genetic contribution and providing excellent predictions. Yield prediction algorithms leveraging phenotypic observations during training therefore promise to improve breeding programs, ultimately speeding up delivery of improved varieties | ||
520 | |a AVAILABILITY AND IMPLEMENTATION: Available at https://github.com/BorgwardtLab/PheGeMIL (code) and https://doi.org/doi:10.5061/dryad.kprr4xh5p (data) | ||
650 | 4 | |a Journal Article | |
650 | 4 | |a Research Support, U.S. Gov't, Non-P.H.S. | |
650 | 4 | |a Research Support, Non-U.S. Gov't | |
700 | 1 | |a Wang, Xu |e verfasserin |4 aut | |
700 | 1 | |a Kucera, Tim |e verfasserin |4 aut | |
700 | 1 | |a Shrestha, Sandesh |e verfasserin |4 aut | |
700 | 1 | |a Juliana, Philomin |e verfasserin |4 aut | |
700 | 1 | |a Mondal, Suchismita |e verfasserin |4 aut | |
700 | 1 | |a Pinto, Francisco |e verfasserin |4 aut | |
700 | 1 | |a Govindan, Velu |e verfasserin |4 aut | |
700 | 1 | |a Crespo-Herrera, Leonardo |e verfasserin |4 aut | |
700 | 1 | |a Huerta-Espino, Julio |e verfasserin |4 aut | |
700 | 1 | |a Singh, Ravi P |e verfasserin |4 aut | |
700 | 1 | |a Borgwardt, Karsten |e verfasserin |4 aut | |
700 | 1 | |a Poland, Jesse |e verfasserin |4 aut | |
773 | 0 | 8 | |i Enthalten in |t Bioinformatics (Oxford, England) |d 1998 |g 39(2023), 6 vom: 01. Juni |w (DE-627)NLM094620342 |x 1367-4811 |7 nnns |
773 | 1 | 8 | |g volume:39 |g year:2023 |g number:6 |g day:01 |g month:06 |
856 | 4 | 0 | |u http://dx.doi.org/10.1093/bioinformatics/btad336 |3 Volltext |
912 | |a GBV_USEFLAG_A | ||
912 | |a GBV_NLM | ||
951 | |a AR | ||
952 | |d 39 |j 2023 |e 6 |b 01 |c 06 |