Database of Sequence-Similar, Structure-Dissimilar Protein Pairs in the PDB

Mickey Kosloff and Rachel Kolodny, Proteins: Structure, Function, and Bioinformatics (2008) Volume 71, Issue 2 , Pages 891 - 902

Below are the number of pairs in each category of our dataset. For a full list of pairs and their alignments in each subset follow the links.

Sequence Identity

Total Pairs (sequence clusters)




= 100%


444 (184)

158 (60)

≥ 99%


757 (216)

278 (69)

≥ 70%


6873 (353)

1575 (126)

≥ 50%


11749 (401)

2653 (138)

(1) Excluding chains that are redundant both in sequence and in structure - see text for details.

(2) For a complete list of all chain pairs in the dataset click here


A manually annotated subset of pairs with 99%-100% sequence identity and RMSD >=6A, clustered into biologically-related "families" (and analyzed in Figure 4 and 5 of the manuscript) can be found here

Supplementary Material to the manuscript can be found here : Figure S1 details differences between geometry-based alignments and sequence-alignments in a subset of pairs with sequence-based structure-superpositioning RMSD>=6. Figure S2 shows an example of the database entry for the cABL kinase (see discussion in the text on how this entry can be used in homology modeling of cABL homologs).