eFamily

eFamily Project Members

The CATH database is novel hierarchical classification of protein domain structures, which clusters proteins at four major levels, Class(C), Architecture(A), Topology(T) and Homologous superfamily (H). Class, derived from secondary structure content, is assigned for more than 90% of protein structures automatically. Architecture, which describes the gross orientation of secondary structures, independent of connectivities, is currently assigned manually. The topology level clusters structures according to their toplogical connections and numbers of secondary structures. The homologous superfamilies cluster proteins with highly similar structures and functions. The assignments of structures to toplogy families and homologous superfamilies are made by sequence and structure comparisons.


InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.



The Macromolecular Structure Database (MSD) is a collection, management and distribution of data about macromolecular structures, derived in part from the Protein Data Bank (PDB). MSD also provides a comprehensive mapping between protein sequences in UniPort and protein structures in the database.



Pfam is a database of multiple sequence alignments and hidden Markov models. There are two parts to Pfam, termed Pfam-A and Pfam-B. Pfam-A contains over 7,500 high quality, manually curated protein families. Associated with each family is a description of the family and appropriate links to other databases. The other part, Pfam-B, is derived from prodom and represents sequences clusters that are not covered by Pfam-A regions. Together, Pfam-A and Pfam-B covers approximately 95% of the UniProt database.


The SCOP (Structural Classfication of Proteins) database is developed as an evolutionary classification, in which the main focus is to place the proteins in a coherent evolutionary framework, based on their conserved structural features. The database aims to provide a comprehensive and detailed description of the relationships between all proteins whose 3D structures have been determined.