Our work consists of devising new algorithms to solve problems associated with protein structure and creating the computer programs that use these algorithms to analyze experimental results and make new predictions.
1) There are certainly a lot of known protein structures from experiment, and there are several ways they are usually classified. We have been developing automatic classification methods that reflect 3D structure and oligomeric states but do not include sequence or function information. The ultimate intent is to use the classification tree as a structure prediction tool for sequences having little resemblance to those of proteins with known structure.
2) Predicting the native conformation of a protein from its amino acid sequence is a long outstanding problem. In principle it can be solved by an accurate description of the inter- and intra-molecular forces at the atomic level followed by a lot of computation. However, that remains an infeasible calculation by several orders of magnitude. Instead, we have been developing approximations to the true free energy of the protein solution appropriate to a simplified representation of its structure and conformation such that the calculations are feasible. This entails both fundamental reasoning and broad surveys over experimental data on protein structure and thermodynamics.
Recognizing Protein Folds by Cluster Distance Geometry, G.M. Crippen, Proteins: Struct. Func. Bioinf., 60, 82-89 (2005).
A novel approach to structural alignment using realistic structural and environmental information, Y. Chen and G.M. Crippen, Protein Science, 14, 2935-2946 (2005).
CASA: An efficient automated assignment of protein mainchain NMR data using an ordered tree search algorithm, J. Wang, T. Wang, E.R.P. Zuiderweg, and G.M. Crippen, J. Biomol. NMR, 33, 261-279 (2005).
An iterative refinement algorithm for consistency based multiple structural alignment methods, Y. Chen and G.M. Crippen, Bioinformatics, 22, 2087-2093 (2006).
Fold recognition via a tree, Y. Chen and G.M. Crippen, J. Comput. Biol., 13(9), 1565-1573 (2006).
A cheminformatic toolkit for mining biomedical knowledge, G.R. Rosania, G.M. Crippen, P. Woolf, D. States, K. Shedden, Pharmaceutical Research, 24(10), 1791-1802 (2007).
Chemical data mining of the NCI human tumor cell line database, H. Wang, J. Klinginsmith, X. Dong, A.C. Lee, R. Guha, Y. Wu, G.M. Crippen, and D.J. Wild, J. Chem. Inf. Model, 47, 2063-2076 (2007).
Data Mining the NCI60 to Predict Generalized Cytotoxicity, A.C. Lee, K. Shedden, G.R. Rosania, and G.M. Crippen, J. Chem. Inf. Model, 48, 1379-1388 (2008).