the reliability of electronic annotations

Think all electronic annotations are crap? You may want to think again.

This recent article and the subsequent blog posts highlight how *smart* non-curated, electronic annotation methods can compete with (and augment) curated efforts.

Here is Iddo Freidberg’s insightful analysis

and Christophe Dessimoz’s inside scoop

Using a controlled, computer-parsable vocabulary (with some sort of structure) enables ‘smart’ electronic annotation that can consider both the information content of your annotation and the confidence of the assignment. For electronic assignment this is much better than picking between “I don’t know” and “a highly specific guess that could be wrong.”

Here is an analysis from the paper on the performance of annotations across sources used at Uniprot-goa:

Notice that larger circles tend to do better in terms of coverage and reliability (with respect to their particular category). Larger circles means more frequent, more frequent often means more general (higher in the ontology). I believe this an important aspect in ‘smart’ automated function prediction. If you are using an ontology and multiple annotation sources, you can design your electronic annotation method to make a tradeoff. The assignment can be more general and more confident (by adding evidence as you go up), while still making an improvement in the knowledge about a gene/protein. You just can’t do that with dumb text!
Also, I want to point out: this doesn’t take away from the importance of manual curation. The paper actually highlights its critical role.

“This is not to say that the curators have made themselves redundant.
On the contrary, as we highlight above, most electronic annotations
heavily rely on manually curated UniProtKB keywords and InterPro
entries. Moreover, given the essential role of curators in embedding
experimental results into ontologies, so does the present study.”

Rebroadcasting some quality signal here.

I can’t tell you how many times I have heard “electronic annotations for Gene Ontology can’t be trusted” as a condemnation of using GO assignments. As we keep accumulating genome sequence, encountering a manually curated gene will be like hitting the lottery (especially for microbes). It is great to see that some intelligent assignments are being made and steps are being taken to characterize their reliability. This article and the subsequent blog posts get a lot of things right. I especially appreciate the use of time-lapse validation, the clear understanding and accounting for the “open-world” assumption, and the creation of performance metrics *that make sense* with respect to the Gene Ontology.

In fact this performance evaluation has got me thinking about my ye old annotation code GRC *shameless plug*

The annotation code had some unsophisticated use of thresholds but it also had, what I thought, were some interesting metrics for evaluating the performance of GO assignment using a reference. It classified annotations as compatible, incompatible, and confirmed.  It went like this: for a particular annotation evaluation using a reference annotation ‘r‘. A term assignment is labelled confirmed if it coincides with or is the ancestor of a reference GO term belonging to r. A term is labelled compatible if it has as its ancestor one of the specific GO terms assigned to r. These represent potential refinements of the current annotation of the gene. A term is labelled incompatible if it does not meet the requirements to be labelled confirmed or compatible.

With respect to organizing ‘live’ non-curated annotations, before time-lapse validation can take place, I think it might be interesting to know if an electronic annotation is ‘compatible’ with the most specific curated/experimental term assigned to a protein.

Here the term “intracellular part” represents a reference function assigned to the reference gene. The terms “intracellular”, “membrane”, and “DNA helicase complex” represent possible GO term assignments and their evaluation with respect to the reference term.

Also, this ‘status’ might be a useful, additional piece of evidence to factor in when determining whether to make an electronic assignment in the first place. If the program ‘knows’ that the annotation you are about to make is ‘compatible’ with an existing experimentally derived annotation, shouldn’t that count for something?


2 Responses to “the reliability of electronic annotations”

  1. Deniz Kural Says:

    Great post. Do you think community / voting-based efforts to “rate” the annotations would be valuable? One can think of up/down voting, or stack-overflow like point systems. It can even be weighted by user influence, etc.

    • anwarren Says:

      Thanks. Yes, frankly I think a crowd sourcing infrastructure for community annotations hooked into a “typed” metadata system would be of extremely high value. It creates a feedback loop of knowledge, reputation, and value for the resource while “letting go” of the notion of a “correct” or single annotation. This is especially useful in explicitly considering and representing multiple experimental data sources that are seemingly in conflict without dismissal or exclusion.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: