The Theobald Lab


Evolution of macromolecular assemblies

Our interdisciplinary research coalesces at the junction of macromolecular crystallography, molecular evolution, biochemistry, and structural bioinformatics. The lab's main long-term goal is to elucidate the pathways and mechanisms by which the structures and functions of macromolecular complexes evolve. Several interrelated projects are currently underway, all of which involve studying the three-dimensional structures of macromolecular complexes by integrating empirical experimentation with bioinformatics.

malate dehydrogenase (MDH) ancestor crystals malate dehydrogenase (MDH) ancestor diffraction pattern

Crystals and 2.8 Å diffraction of a resurrected ancestral enzyme, malate dehydrogenase from green sulfur bacteria (Chlorobia)

The structures and functions of biological macromolecules are the end-products of a historical process, which in some cases may stretch back over the past 3-4 billion years. Hence a precise molecular understanding of macromolecular assemblies ultimately must be informed by evolutionary mechanisms. For knowledge of the macromolecular structure-function relationship, we consider it essential to explicitly incorporate modern developments in population genetics, phylogenetics, and probability theory. Conversely, structural knowledge also informs evolutionary inferences.

Our lab is interested in many diverse, basic, and unresolved problems in molecular evolution:

The answers to these questions have broad implications for understanding the protein structure-function relationship, including rational efforts to design (and redesign) proteins for particular functions.

Bayesian methods in structural bioinformatics

From a Bayesian viewpoint, probability is a measure of a degree of belief, and thus probability theory can formally be considered as an extension of classic Aristotelian logic in the presence of uncertainty. In recent years Bayesian statistics has experienced a great resurgence, due to theoretical advances, massive increases in computing power, and successful applications to complex and difficult scientific problems.

Bayes theorem

Bayes theorem, the universal acid relating empirical observations to theory (data D to a hypothesis H)

Accurate analysis of structural differences and commonalities is of fundamental importance for understanding the structure, function, and evolution of biological macromolecules. For the past 40 years, structural analysis methods have relied on the biophysically unrealistic and restrictive least-squares criterion to find optimal superpositions. We are developing probabilistic models of structural change that can take advantage of powerful maximum likelihood (ML) and Bayesian techniques, which will greatly expand our abilities to accurately superposition, align, and analyze structural conformations. While we concentrate specifically on the conformations of macromolecules, the methods we are developing have broad mathematical generality and will impact not only molecular structural biology but also an unusually wide range of scientific fields, including any that compare the shapes and conformations of objects.

We are also interested in developing likelihood and Bayesian methods for single-molecule structural analysis and electron microscopy image reconstruction.

Mechanisms of single-stranded nucleic acid recognition

Specific recognition of single-stranded nucleic acids by proteins is crucial for maintaining, manipulating, and utilizing the genetic material contained in cells. Single-stranded nucleic acid recognition participates in a large variety of fundamental, dynamic biological functions, including telomere regulation, mRNA splicing, DNA replication, transcription regulation, DNA damage repair, stages in bacterial and viral life cycles, translation, and retrotransposition. More generally, non-specific binding of single-stranded nucleic acids is also essential for DNA damage repair, DNA replication, apoptosis, recombination, chromosomal condensation, and bacterial conjugation. Consequently, protein recognition of single-stranded nucleic acids has been clearly implicated in many pathological processes in humans, ranging from cancer, aging, various infectious diseases, autoimmune illnesses, to drug dependency.

In the past few years, several high-resolution structures of sequence-specific ssDNA and ssRNA complexes have been determined. The intermolecular interactions observed in these structures hint at possible mechanisms for sequence-specific recognition, and several common themes have emerged. However, relative to double-stranded DNA complexes, very little is known regarding mechanisms of single-stranded nucleic acid recognition. We are using a combination of crystallographic studies, bioinformatics, and evolutionary analysis to explore mechanisms of single-stranded nucleic acid recognition in various model systems, including telomeric proteins and transcription regulators of toxin expression in pathogenic bacteria.

The hippogriff (part eagle, part lion, part horse) in the image above symbolizes the empirical testability of evolutionary theory: Given what we know of the evolution and phylogeny of modern animals, we conclude that such a creature will never be found, neither living nor fossilized.