The Unknome project

The human genome encodes ~20,000 proteins, many still uncharacterised. Scientific and social factors have resulted in a focus on well-studied proteins, leading to a concern that poorly understood genes are unjustifiably neglected. To address this, we have developed an “Unknome database” that ranks proteins based on how little is known about them.

The database is intended to aid the selection poorly characterised proteins from humans or model organisms so that they can be targeted for investigation. We welcome feedback! Please email Tim Stevens


The Unknome database is described in this publication, along with our application of it to investigate in Drosophila a set of poorly understood proteins:

Functional unknomics: closing the knowledge gap to accelerate biomedical research
Joao Rocha, Satish Arcot Jayaram, Tim J Stevens, Nadine Muschalik, Rajen D Shah, Sahar Emran, Cristina Robles, Matthew Freeman, Sean Munro

Technical details

The overall principle of the unknome database is to assign a knownness score to proteins. Each protein is placed in a cluster of orthologues based on the Panther database. The knowness score is defined as the largest number of Gene Ontology (GO) terms that has been assigned to a member of that cluster. Because GO annotations vary in confidence and relevance to function, different types of evidence can be assigned a different weight when calculating the score. The list of scored clusters can also be restricted to those containing proteins from humans and/or the main model organisms.

See the Ranked Clusters section for a list of protein clusters ranked by their knowness score with links to further information on the cluster and the proteins it contains.

See Cluster details for information about each cluster showing the GO terms assigned to its members, and how its knowness has changed over time.

Settings shows the weights applied to different types of GO annotation. Our default settings give most weight to manual curation and experimental evidence. We excluded “Cellular component” as a Domain as it provides limited functional information. It is possible to alter these settings and calculate a custom unknome, but be patient!

Unknome Downloads

The Unknome database may be downloaded as: