Mining of super-secondary structure motifs in 3D
protein structures
Zeyar
Aung and Jinyan Li
Institute
for
Project
Description
Super-Secondary structure elements
(super-SSEs) are the structurally conserved ensembles of secondary structure
elements (SSEs) within a protein. They are of great biological interest. Here,
we present a method to formally represent and mine the sequence order
independent super-SSE motifs that occur repeatedly in large data sets of
protein structures. We represent a protein structure as a graph, and mine the
common cliques from a set of protein graphs in order to find the motifs.
We mine two categories of super-SSE motifs:
the generic motifs that occur frequently across the entire database of protein
structures, and the fold-specific motifs that are concentrated in particular
protein fold types. From the experimental data set of 600 proteins belonging to
15 large SCOP Folds, we have discovered 21generic motifs and 75 fold-specific
motifs that are both statistically significant and biologically relevant. A
number of the discovered motifs (both generic and fold-specific) resemble the
well-known super-SSE motifs in the literature such as beta hairpins, Greek
keys, zinc fingers, etc. Some of the discovered motifs are of novel shapes that
have not been documented yet. Our method is time-efficient where it can
discover all the motifs across the 600 proteins in less than 14 minutes on a
stand-alone PC.
Data
Set
- List of 600
proteins (This is the same data set as the one used in Aung and Tan, J. Comp. Biol., 12: 1221-41, 2005.)
Motifs
Discovered
- Generic Motifs
(Explanation of format)
- Fold-specific
Motifs (Explanation of format)
Effects
of Parameters
Color
Figures
- Rank #44
Fold-specific Motif