TY - JOUR
T1 - Structure-based function inference using protein family-specific fingerprints
AU - Bandyopadhyay, Deepak
AU - Huan, Jun
AU - Liu, Jinze
AU - Prins, Jan
AU - Snoeyink, Jack
AU - Wang, Wei
AU - Tropsha, Alexander
PY - 2006/6
Y1 - 2006/6
N2 - We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining. For a new structure, all occurrences of these family-specific fingerprints may be found by a fast algorithm for subgraph isomorphism; the structure can then be assigned to a family with a confidence value derived from the number of fingerprints found and their distribution in background proteins. In validation experiments, we infer the function of new members added to SCOP families and we discriminate between structurally similar, but functionally divergent TIM barrel families. We then apply our method to predict function for several structural genomics proteins, including orphan structures. Some predictions have been corroborated by other computational methods and some validated by subsequent functional characterization.
AB - We describe a method to assign a protein structure to a functional family using family-specific fingerprints. Fingerprints represent amino acid packing patterns that occur in most members of a family but are rare in the background, a nonredundant subset of PDB; their information is additional to sequence alignments, sequence patterns, structural superposition, and active-site templates. Fingerprints were derived for 120 families in SCOP using Frequent Subgraph Mining. For a new structure, all occurrences of these family-specific fingerprints may be found by a fast algorithm for subgraph isomorphism; the structure can then be assigned to a family with a confidence value derived from the number of fingerprints found and their distribution in background proteins. In validation experiments, we infer the function of new members added to SCOP families and we discriminate between structurally similar, but functionally divergent TIM barrel families. We then apply our method to predict function for several structural genomics proteins, including orphan structures. Some predictions have been corroborated by other computational methods and some validated by subsequent functional characterization.
KW - Almost-Delaunay
KW - Delaunay
KW - Orphan structures
KW - Protein classification
KW - Structural genomics
KW - Structure-based function inference
KW - Subgraph mining
UR - http://www.scopus.com/inward/record.url?scp=33744491290&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33744491290&partnerID=8YFLogxK
U2 - 10.1110/ps.062189906
DO - 10.1110/ps.062189906
M3 - Article
C2 - 16731985
AN - SCOPUS:33744491290
SN - 0961-8368
VL - 15
SP - 1537
EP - 1543
JO - Protein Science
JF - Protein Science
IS - 6
ER -