WikiProteins: a collaborative space for biologists to annotate proteins
LinkThe new paper describes a major advantage to this approach. Traditionally, biological information has been divided between two approaches: data mining, which involves parsing existing information to identify semantic content and connections within it, and curating, which involves expert, manual analysis of data. By importing information from both types of sources, WikiProteins should theoretically contain the best properties of both types of data: reliable information supplied by experts and potential connections among data that haven't previously been explored.
The paper provides a number of measures of the success of this approach. For one, the import process has identified over a million individual authors, and a similar number of concepts that connect them and the other items stored in the database. The different data sources also seem to have paid off, as the authors determined that well over half of the protein-protein interactions brought in from curated databases could not have been identified by data-mining PubMed abstracts.
In calling for biologists to get involved in the beta process, the people who generated WikiProteins have a number of roles in mind. For starters, they expect that the data mining process has generated a significant number of spurious connections, and hope that the community will help in pruning those. For example, they noted that the gene abbreviation "CLB2" mapped to at least five different genes (depending on the organism), as well as a material used in dentistry, Clearfil Liner Bond 2; manual intervention may be needed to sort these out. They're also hoping that contributors will simply dump sentences from the literature into WikiProteins in order for them to be indexed and further connections mined.

The new paper describes a major advantage to this approach. Traditionally, biological information has been divided between two approaches: data mining, which involves parsing existing information to identify semantic content and connections within it, and curating, which involves expert, manual analysis of data. By importing information from both types of sources, WikiProteins should theoretically contain the best properties of both types of data: reliable information supplied by experts and potential connections among data that haven't previously been explored.

the latest
latest episodes









I've browsed through the PDB wiki in the past, a project to annotate Protein Data Bank entries.
http://pdbwiki.org/index.php/Main_Page
Efforts like this will never reach their full potential
a) Vastly improve the ability to automatically mine known publications
b) Train biologists in some sort of new nomenclature and descriptive language to describe proteins and their functions/interactions
I really do not see most scientists having time to visit and maintain efficient entries on external websites while simultaneously fulfilling the requirements of the journals they publish in (a core part of their work). It's going to take an enforcement effort by the journals. It worked with the PDB... journals began requiring their submitting authors to deposit their structures in the PDB as a precondition to their publication being accepted.