Discovering complex molecular processes with machine learning


A new machine-guided path sampling algorithm learns the mechanism of molecular self-organization.

In a paper that has just appeared in Nature Computational Science, a joint team of researchers in Austria, Germany, and the Netherlands present a machine learning algorithm that is instrumental in modelling rare events such as crystallisation, transport and folding. The new computer simulation method can be used to efficiently capture and analyze rare molecular processes, revealing how molecules self-organize and function.

The emergence of a new crystal from a spontaneously formed nucleus or the folding of a biopolymer are examples of rare molecular events that proceed rapidly after long waiting times in seeming stability. Due to their rarity as well as the complexity of the molecular systems involved, it is very difficult to obtain a thorough understanding of these processes - even when using computer models. Because it is impossible to predict when a rare event will occur, these models resort to calculating the dynamics of the molecular systems in series of tiny molecular steps. This can take up to a billion steps in a single simulation, but often this is not even sufficient given the timescale of many relevant molecular processes.

The paper in Nature Computational Science reports how the research team, which included Prof. Christoph Dellago and his former graduate student Christian Leitold from the Faculty of Physics of the University of Vienna, found a solution to this challenge by combining computer simulations and artificial intelligence. They developed a machine learning algorithm that learns how to sample rare events. Using deep learning it builds mathematical models, based on transition path sampling, that help identify tell-tale signs for impending molecular transitions. The algorithm thus is able to ‘home in’ on the brief moments of the actual transitions, which prevents wasting computational resources waiting for such events to occur.

Thus, the algorithm can be used to study rare events occurring on previously inaccessible timescales. By autonomously initialising and analysing the modelling data, the algorithm reduces the amount of input required from researchers. Furthermore, by distilling the learned models into a human-accessible form, via so-called symbolic regression, the algorithm aids researchers in understanding and generalizing the findings to broad classes of systems.

The research was carried out in a cooperation between the Department of Theoretical Biophysics at the Max Planck Institute of Biophysics, the Institute for Advanced Studies (both in Frankfurt, Germany), the Institute of Biophysics at Goethe University Frankfurt, the Faculty of Physics at the University of Vienna (Austria) and the Van ’t Hoff Institute for Molecular Sciences at the University of Amsterdam (The Netherlands).



Hendrik Jung, Roberto Covino, A. Arjun, Christian Leitold, Christoph Dellago, Peter G. Bolhuis, Gerhard Hummer, Machine-guided path sampling to discover mechanisms of molecular self-organization”, Nature Computational Science (2023).
DOI: 10.1038/s43588-023-00428-z


Scientific Contact:

Prof. Christoph Dellago

Faculty of Physics

University of Vienna

1090 Wien, Kolingasse 14-16

T +43-1-4277-512 60

M +43-664-817 5111

In an iterative learning and sampling cycle, the algorithm learns key features of the transition mechanism from molecular pathways, as illustrated here for the folding of a polymer. In turn, this information is then used to guide the generation of new transition pathways.

© Christian Leitold and Christoph Dellago