I-TASSER-MTD: a deep-learning-based platform for multi-domain protein structure and function prediction

X Zhou, W Zheng, Y Li, R Pearce, C Zhang, EW Bell… - Nature …, 2022 - nature.com
Nature Protocols, 2022nature.com
Most proteins in cells are composed of multiple folding units (or domains) to perform
complex functions in a cooperative manner. Relative to the rapid progress in single-domain
structure prediction, there are few effective tools available for multi-domain protein structure
assembly, mainly due to the complexity of modeling multi-domain proteins, which involves
higher degrees of freedom in domain-orientation space and various levels of continuous
and discontinuous domain assembly and linker refinement. To meet the challenge and the …
Abstract
Most proteins in cells are composed of multiple folding units (or domains) to perform complex functions in a cooperative manner. Relative to the rapid progress in single-domain structure prediction, there are few effective tools available for multi-domain protein structure assembly, mainly due to the complexity of modeling multi-domain proteins, which involves higher degrees of freedom in domain-orientation space and various levels of continuous and discontinuous domain assembly and linker refinement. To meet the challenge and the high demand of the community, we developed I-TASSER-MTD to model the structures and functions of multi-domain proteins through a progressive protocol that combines sequence-based domain parsing, single-domain structure folding, inter-domain structure assembly and structure-based function annotation in a fully automated pipeline. Advanced deep-learning models have been incorporated into each of the steps to enhance both the domain modeling and inter-domain assembly accuracy. The protocol allows for the incorporation of experimental cross-linking data and cryo-electron microscopy density maps to guide the multi-domain structure assembly simulations. I-TASSER-MTD is built on I-TASSER but substantially extends its ability and accuracy in modeling large multi-domain protein structures and provides meaningful functional insights for the targets at both the domain- and full-chain levels from the amino acid sequence alone.
nature.com