Fast activation maximization for molecular sequence design

© 2021. The Author(s)..

BACKGROUND: Optimization of DNA and protein sequences based on Machine Learning models is becoming a powerful tool for molecular design. Activation maximization offers a simple design strategy for differentiable models: one-hot coded sequences are first approximated by a continuous representation, which is then iteratively optimized with respect to the predictor oracle by gradient ascent. While elegant, the current version of the method suffers from vanishing gradients and may cause predictor pathologies leading to poor convergence.

RESULTS: Here, we introduce Fast SeqProp, an improved activation maximization method that combines straight-through approximation with normalization across the parameters of the input sequence distribution. Fast SeqProp overcomes bottlenecks in earlier methods arising from input parameters becoming skewed during optimization. Compared to prior methods, Fast SeqProp results in up to 100-fold faster convergence while also finding improved fitness optima for many applications. We demonstrate Fast SeqProp's capabilities by designing DNA and protein sequences for six deep learning predictors, including a protein structure predictor.

CONCLUSIONS: Fast SeqProp offers a reliable and efficient method for general-purpose sequence optimization through a differentiable fitness predictor. As demonstrated on a variety of deep learning models, the method is widely applicable, and can incorporate various regularization techniques to maintain confidence in the sequence designs. As a design tool, Fast SeqProp may aid in the development of novel molecules, drug therapies and vaccines.

Medienart:

E-Artikel

Erscheinungsjahr:

2021

Erschienen:

2021

Enthalten in:

Zur Gesamtaufnahme - volume:22

Enthalten in:

BMC bioinformatics - 22(2021), 1 vom: 20. Okt., Seite 510

Sprache:

Englisch

Beteiligte Personen:

Linder, Johannes [VerfasserIn]
Seelig, Georg [VerfasserIn]

Links:

Volltext

Themen:

Activation maximization
DNA
Deep learning
Design
Gradient ascent
Journal Article
Neural network
Optimization
Protein
RNA
Sequence design

Anmerkungen:

Date Completed 22.10.2021

Date Revised 03.04.2024

published: Electronic

Citation Status MEDLINE

doi:

10.1186/s12859-021-04437-5

funding:

Förderinstitution / Projekttitel:

PPN (Katalog-ID):

NLM332107124