Mining
of super-secondary structure motifs in 3D protein structures
Data
Mining Department
Institute
for
NEW!!! (December
2008)
We have launched
our S4
server.
S4:
Server for Super-Secondary Structure Motif Mining
(The web server is currently under
maintenance. It will be up again in July 2009.)
Project Description
Super-Secondary structure elements
(super-SSEs) are the structurally conserved ensembles of secondary
structure
elements (SSEs) within a protein. They are of great biological
interest. Here,
we present a method to formally represent and mine the sequence order
independent super-SSE motifs that occur repeatedly in large data sets
of
protein structures. We represent a protein structure as a graph, and
mine the
common cliques from a set of protein graphs in order to find the motifs.
We mine two categories of super-SSE
motifs:
the generic motifs that occur frequently across the entire database of
protein
structures, and the fold-specific motifs that are concentrated in
particular
protein fold types. From the experimental data set of 600 proteins
belonging to
15 large SCOP Folds, we have discovered 21generic motifs and 75
fold-specific
motifs that are both statistically significant and biologically
relevant. A
number of the discovered motifs (both generic and fold-specific)
resemble the
well-known super-SSE motifs in the literature such as beta hairpins,
Greek
keys, zinc fingers, etc. Some of the discovered motifs are of novel
shapes that
have not been documented yet. Our method is time-efficient where it can
discover all the motifs across the 600 proteins in less than 14 minutes
on a
stand-alone PC.
Data Set
- List of
600
proteins (This is the same data set as the one used in Aung and
Tan, J.
Comp. Biol., 12: 1221-41, 2005.)
Motifs Discovered
- Generic
Motifs
(Explanation of format)
- Fold-specific
Motifs (Explanation of format)
Effects of Parameters
Color Figures
- Rank #44
Fold-specific Motif
Publication
Z. Aung
and J. Li,
Mining super-secondary structure motifs from 3D protein structures: a
sequence
order independent approach, Genome Informatics, 19:15-26,
2007.
(the 18th International Conference on Genome Informatics; GIW'07)
Server
S4:
Server for Super-Secondary Structure Motif Mining
- Relased in January 2009.
- This is a web server implementation with some enhancements of the
above
Genome Informatics paper.