WikiProteins: a collaborative space for biologists to annotate proteins

The WikiProfessional project (like Wikipedia, but for narrow and deep exploration of highly specialized domains) just launched with its first beta wiki: WikiProteins is a place where biologists can collectively annotate an enormous database of proteins, a database culled from the best open science journals in the field.

The new paper describes a major advantage to this approach. Traditionally, biological information has been divided between two approaches: data mining, which involves parsing existing information to identify semantic content and connections within it, and curating, which involves expert, manual analysis of data. By importing information from both types of sources, WikiProteins should theoretically contain the best properties of both types of data: reliable information supplied by experts and potential connections among data that haven't previously been explored.

The paper provides a number of measures of the success of this approach. For one, the import process has identified over a million individual authors, and a similar number of concepts that connect them and the other items stored in the database. The different data sources also seem to have paid off, as the authors determined that well over half of the protein-protein interactions brought in from curated databases could not have been identified by data-mining PubMed abstracts.

In calling for biologists to get involved in the beta process, the people who generated WikiProteins have a number of roles in mind. For starters, they expect that the data mining process has generated a significant number of spurious connections, and hope that the community will help in pruning those. For example, they noted that the gene abbreviation "CLB2" mapped to at least five different genes (depending on the organism), as well as a material used in dentistry, Clearfil Liner Bond 2; manual intervention may be needed to sort these out. They're also hoping that contributors will simply dump sentences from the literature into WikiProteins in order for them to be indexed and further connections mined.