Data:
  • Complete release data is available at Zenodo.
  • Harmonized list of human transcription factors and respective mouse orthologs based on the TFClass classification (Extended with Codebook TFs for v14): tf_masterlist.tsv.

Many practical motif applications require a set of motifs with reduced redundancy i.e. where similar motifs belonging to related transcription factors are grouped together and only a single matrix represents the group. To this end, we have created the non-redundant set of HOCOMOCO v14 motifs, a derivative of the HOCOMOCO v14 CORE collection.

To this end, we estimated the motif similarities with MacroAPE (see opera.autosome.org/macroape and doi:10.1186/1748-7188-8-23) at the motif P-value cutoff of 0.0005 and default matrix discretization of 1 (upscaled to 10 to reach a better precision for the cases when similarity estimates with the default discretization exceeded 0.01).

Using the pairwise motif similarity matrix, we performed hierarchical clustering using sklearn agglomerative clustering ('average' linkage). The number of clusters was taken to maximize the silhouette score resulting in 523 clusters at the silhouette score of 0.16.

For each cluster, the single representative motif was taken according to the best average similarity to other motifs in the cluster. Only ABC-quality motifs were considered as cluster representatives. The annotation contains a list of motifs that constitute a cluster and the list of respective TFs (UniProt IDs).


Contacts
Tools:
  • MoLoTool - web interface for motif finding.
  • SPRY-SARUS tool for motif finding (Java): jar, readme
  • MACRO-APE tool for motif comparison, P-value and threshold estimation: jar, manual, website
  • PERFECTOS-APE tool for functional annotation of sequence variants overlappint TFBS: jar, manual, website
Citation:
Ilya E Vorontsov, Irina A Eliseeva, Arsenii Zinkevich, Mikhail Nikonov, Sergey Abramov, Alexandr Boytsov, Vasily Kamenets, Alexandra Kasianova, Semyon Kolmykov, Ivan S Yevshin, Alexander Favorov, Yulia A Medvedeva, Arttu Jolma, Fedor Kolpakov, Vsevolod J Makeev, Ivan V Kulakovskiy
Nucleic Acids Research, gkad1077 (16 November 2023)
doi: 10.1093/nar/gkad1077
License: HOCOMOCO motif collection is distributed under WTFPL. If you prefer more standard licenses, feel free to treat WTFPL as CC-BY.

HOCOMOCO v14 subcollections

H14CORE H14INVIVO H14INVITRO H14RSNP
Number of TFs 1107
(MOUSE subset: 809)
1107
(MOUSE subset: 809)
1107
(MOUSE subset: 809)
1107
(MOUSE subset: 809)
Number of motifs 1595
(MOUSE subset: 1245)
1595
(MOUSE subset: 1245)
1579
(MOUSE subset: 1229)
1595
(MOUSE subset: 1245)
Complete model annotation
(including gene id mapping)
All motifs H14CORE_annotation.jsonl H14INVIVO_annotation.jsonl H14INVITRO_annotation.jsonl H14RSNP_annotation.jsonl
MOUSE subset H14CORE-MOUSE_annotation.jsonl H14INVIVO-MOUSE_annotation.jsonl H14INVITRO-MOUSE_annotation.jsonl H14RSNP-MOUSE_annotation.jsonl
PWM One file per matrix
H14CORE_pwm.tar.gz H14INVIVO_pwm.tar.gz H14INVITRO_pwm.tar.gz H14RSNP_pwm.tar.gz
Flat file H14CORE_pwms.txt H14INVIVO_pwms.txt H14INVITRO_pwms.txt H14RSNP_pwms.txt
PCM One file per matrix
H14CORE_pcm.tar.gz H14INVIVO_pcm.tar.gz H14INVITRO_pcm.tar.gz H14RSNP_pcm.tar.gz
Flat file H14CORE_pcms.txt H14INVIVO_pcms.txt H14INVITRO_pcms.txt H14RSNP_pcms.txt
PFM One file per matrix
H14CORE_pfm.tar.gz H14INVIVO_pfm.tar.gz H14INVITRO_pfm.tar.gz H14RSNP_pfm.tar.gz
Flat file H14CORE_pfms.txt H14INVIVO_pfms.txt H14INVITRO_pfms.txt H14RSNP_pfms.txt
Threshold to P-value map
H14CORE_thresholds.tar.gz H14INVIVO_thresholds.tar.gz H14INVITRO_thresholds.tar.gz H14RSNP_thresholds.tar.gz
Matrices in other formats JASPAR H14CORE_jaspar_format.txt H14INVIVO_jaspar_format.txt H14INVITRO_jaspar_format.txt H14RSNP_jaspar_format.txt
MEME H14CORE_meme_format.meme H14INVIVO_meme_format.meme H14INVITRO_meme_format.meme H14RSNP_meme_format.meme
TRANSFAC H14CORE_transfac_format.txt H14INVIVO_transfac_format.txt H14INVITRO_transfac_format.txt H14RSNP_transfac_format.txt
HOMER