Mining of super-secondary structure motifs in 3D protein structures

Zeyar Aung and Jinyan Li

Institute for Infocomm Research, Singapore

 

Project Description

Super-Secondary structure elements (super-SSEs) are the structurally conserved ensembles of secondary structure elements (SSEs) within a protein. They are of great biological interest. Here, we present a method to formally represent and mine the sequence order independent super-SSE motifs that occur repeatedly in large data sets of protein structures. We represent a protein structure as a graph, and mine the common cliques from a set of protein graphs in order to find the motifs.

We mine two categories of super-SSE motifs: the generic motifs that occur frequently across the entire database of protein structures, and the fold-specific motifs that are concentrated in particular protein fold types. From the experimental data set of 600 proteins belonging to 15 large SCOP Folds, we have discovered 21generic motifs and 75 fold-specific motifs that are both statistically significant and biologically relevant. A number of the discovered motifs (both generic and fold-specific) resemble the well-known super-SSE motifs in the literature such as beta hairpins, Greek keys, zinc fingers, etc. Some of the discovered motifs are of novel shapes that have not been documented yet. Our method is time-efficient where it can discover all the motifs across the 600 proteins in less than 14 minutes on a stand-alone PC.

 

Data Set

- List of 600 proteins (This is the same data set as the one used in Aung and Tan, J. Comp. Biol., 12: 1221-41, 2005.)

 

Motifs Discovered

- Generic Motifs (Explanation of format)

- Fold-specific Motifs (Explanation of format)

 

Effects of Parameters

Click here

 

Color Figures

- Rank #1 Generic Motif

- Rank #1 Fold-specific Motif

- Rank #3 Fold-specific Motif

- Rank #44 Fold-specific Motif