Name Uploaded Size
foldseek/teddb.tar.gz Wed, 26 Feb 2025 19:19:27 GMT 320.4 GB
foldseek/teddb_afdb50.tar.gz Wed, 26 Feb 2025 21:16:51 GMT 59.5 GB
foldseek/teddymer.tar.gz Mon, 02 Mar 2026 22:28:26 GMT 20.5 GB

Readme

This dataset presents 0.5 million non-singleton structural clusters generated by our Foldseek-Multimer-Clustering algorithm.
The clusters are derived from the TED database, where domains are segmented across the entire AlphaFold Database. We generated a dimer database from the entire TED database, treating each domain as a chain.
From 10 million dimers whose chains are fully annotated by CATH and are entries in afdb50 (a MMseqs-clustered version of afdb), we obtained 3.5 million clusters—500,000 non-singleton and 3 million singleton.
Parameter used: coverage 0.6, chain tm threshold 0.7, interface lddt threshold 0.3, cluster mode 0, cov-mode 0.

Teddymer

Data Description

teddb.tar.gz: Foldseek database for TED

  1. teddb_h: This file contains the accession ID for TED, and its boundaries. All _h files for the following databases has this information.

teddb_afdb50.tar.gz: Foldseek database for TED, containing only entries that belong to afdb50

teddymer.tar.gz - dir_ted_afdb50_cath_dimerdb: Input foldseek database

  1. ted_afdb50_cath_dimerdb: A dimer database created from teddb, containing only entries that belong to afdb50 and have both domains annotated by CATH.

teddymer.tar.gz - teddymer_repdb: Representative database

  1. teddymer_repdb: This representative database contains representatives of both non-singleton and singleton clusters.

teddymer.tar.gz - cluster.tsv: Cluster results

  1. repId: Name of the representative, e.g., 318687DI_AF-A0A016SRM1-F1-model_v4.
  2. memId: Name of the member.

teddymer.tar.gz - nonsingletonrep_metadata.tsv: Metadata for the non-singleton clusters.

  1. DimerIndex: Index of the dimer of the representative. Linked to ted_afdb50_cath_dimerdb.source
  2. UniProtID: UniProt ID of the representative.
  3. DomainPair: TED domains composing the representative.
  4. MemberCount: Number of members in the cluster.
  5. InterfaceLength: Length of the interface of the representative.
  6. AvgIntPAE: Average PAE of the interface residues of the representative.
  7. AvgIntPlddt: Average Plddt of the interface residues of the representative.
  8. IntPlddt: Plddt of the interface residues, one number per residue, separated by chain (colon).

How to generate c-alpha only .pdb from database: .pdb files will be created in the directory, resultPDBdir

  1. mkdir resultPDBdir
  2. foldseek convert2pdb db resultPDBdir/ --pdb-output-mode 1

How to generate dimer .pdb files

  1. wget https://ted.cathdb.info/api/v1/files/AF-A0A005-F1-model_v4_TED01.pdb
  2. wget https://ted.cathdb.info/api/v1/files/AF-A0A005-F1-model_v4_TED02.pdb
  3. pdb_chain -A AF-A0A005-F1-model_v4_TED01.pdb | pdb_reres -1 > A0A005_TED01.pdb
  4. pdb_chain -B AF-A0A005-F1-model_v4_TED02.pdb | pdb_reres -1 > A0A005_TED02.pdb
  5. cat A0A005_TED01.pdb <(echo "TER") A0A005_TED02.pdb <(echo "END") > A0A005_TED01_TED02.pdb