Statistical methods for protein structure prediction
The objective of our project is to develop quantitative methods for the prediction of protein structures. The approach is to exploit the fast-growing protein structure and sequence databases and to extract information that relates the protein sequence and structure. The protein sequence-structure relationship information can be used to predict the protein structure from a sequence. The main focus is on finding new representations of protein sequence and structure and employing robust statistical methods, such as the Estimating Equations, to extract the sequence-structure relationships. In the past ten years, my second mentor, Dr. Zhao and his colleagues have been actively developing methods based on estimating equation. They have successfully applied it for genetic epidemiology, including methods for assessing the familial aggregation of diseases, for characterizing underlying disease genes via segregation analysis and for finding disease genes via linkage analysis. The success of these developments also suggests that the estimating equation can be used as an appropriate framework for developing protein structure prediction models. In genetic epidemiology, the goal is to relate the disease phenotype with causal factors using data collected from family members. In protein structure prediction, a relationship is sought that maps the sequence to the structure form a known 3D structure database. The major difference between these two problems is only the different data type. We believe that the combination of the 3D profile approach in the Zhang lab and the Estimating Equations technique developed in the Zhao lab could yield a more accurate method of structure prediction.