FAQ - HOmo sapiens COmprehensive MOdel COllection

Q: I am a bit lost, which (sub-)collection should I use in my analysis?

A: In most of the scenarios, you can safely use the main CORE collection. However, for better results, you should consult the following graphical scheme.

Q: But there are so many extra motif annotations such as subtype or quality, should I somehow pre-filter motifs based on those?

A: In the CORE collection all motifs are reliable. C-quality motifs have only a single supporting experiment but were nonetheless manually curated and benchmarked. Also, in the CORE collection, there are only a handful of D-quality motifs representing a few rare subtypes, which were not rediscovered when updating v11 to v12 and later to v13-v14 but were retained for consistency.
In the sub-collections, D quality denotes non-benchmarked motifs, e.g. in the ‘invivo’ sub-collection the D quality motifs were not tested on ChIP-Seq data.
Don't hesitate to consult the scheme for further hints.

Q: Can you explain the structure of motif IDs?

A: Let's use AHR.H14CORE.0.P.B as an example.
Here AHR denotes the UniProt ID prefix (most of the time identical between human and mouse orthologs, e.g. AHR_HUMAN and AHR_MOUSE).
H14CORE denotes the subcollection, and can also be H14RSNP/H14INVIVO/H14INVITRO in downloadable motif sets.
0 is the subtype number, where 0 denotes the most common motifs scoring the best across all benchmarking datasets.
P is the type of the experiment that yielded motifs that were assigned to the subtype during expert curation. P means that this motif was found in ChIP-Seq experiments. See details in a section of help page devoted to Experimental data types.
B is the motif quality on the ABCD scale, see help page section Quality ratings .

Q: How do I select HOCOMOCO motifs for mouse or other species?

A: Please consult the scheme.

Q: In HOCOMOCO v11 there were separate mouse and human collections. Why is there only one joint collection in v13?

A: Even in HOCOMOCO v11 we were relying on cross-validation between human and mouse datasets. As human and mouse TFs share highly similar and often identical DNA-binding domains, we have taken the next step and selected the most reliable motif well-performing across the whole range of available data for both species. Based on benchmarking results, we consider v13 motifs to be generally more informative and more reliable than in the previous releases. Please check the HOCOMOCO v12 paper for more details.

Q: Where is the dinucleotide motif collection of HOCOMOCO v11?

A: In this release, we have focused our efforts on expanding the fraction of TFs covered by reliable motifs, rigorous benchmarking, and comprehensive annotation of motif subtypes. Dinucleotide motifs will continue to be available in HOCOMOCO v11, see Downloads.

Q: Where are sequence alignments used to construct the weight matrices?

A: We decided to discard this information as being of limited value. For particular TFs and motifs, these data are available from older HOCOMOCO releases, see Downloads.