BioGRID Curation Guide
This is an old revision of the document!
Table of Contents
BioGRID Curation Guide
Introduction
The Biological General Repository for Interaction Datasets (BioGRID) is a database of physical and genetic interactions developed by Mike Tyers' group. It contains high-throughput (HTP) interaction data for Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens, and other organisms. For yeast, BioGRID also contains many genetic and protein interactions curated from the primary literature. SGD has incorporated BioGRID's interaction data and will also continue to manually curate yeast interactions. What follows is a description of the BioGRID curation procedure along with some issues and details to keep in mind while curating this information.
The following sections contain detailed information on BioGRID curation guidelines.
Adding Interactions
Curation Questions and Answers
- Should we curate all interactions in a paper?
- We are capturing all interactions in a paper, even if they are not the main focus of an experiment and have been curated previously. For example, in Fig. 3 of PMID: 15608614, the authors were looking at the effects on the TOM complex after depletion of MIM1. Even if there aren't any interactions being added for MIM1 based on this figure, the interactions between the subunits of the TOM complex (e.g. TOM70, TOM40, etc.) should be added under 'co-purification'.
- Why is it only possible to add protein-RNA interactions and not DNA-protein interactions?
- Initially, the interactions project was only going to include protein-protein and genetic interactions, not nucleic acid-protein interactions. However, during the course of the curation task, a useful paper containg hundreds of protein-mRNA interactions was found and so the option to add protein-RNA interactions was added to the curation tool. This was done to allow the database to distinguish between protein-protein and protein-mRNA interactions. The same option was not added for DNA-protein interactions because it would have required going back and re-reading too many papers. It was decided that DNA-protein interactions would be curated as part of a separate project.
- How should protein-RNA interactions be curated?
- For an example of how to curate protein-RNA interactions, refer to figures 1B and 3B of PMID: 10747033. The U4, U5 and U6 snRNAs were detected in the Lsm8p-TAP and Lsm3p-TAP affinity-purified fractions. Note that these interactions have to be entered using the “force interactions” option for now because the IMS system needs to be updated to recognize genes that only encode RNA products. Note that we are not curating all types of protein-RNA interactions. For example, see Fig. 7 in PMID: 7862160. This papers shows that Ref2p binds RNA by using the following substrates: CYC1 pre-mRNA, poly-G and poly-A ribonucleotide homopolymers, and various Ty RNA substrates. These types of protein-RNA interactions would not be curated using the IMS system because none of them are specific substrates for Ref2p.
- What about genes that are 'not physically mapped' in SGD or genes that only encode RNA products?
- Currently, the tool will not allow you to enter genes that are 'not physically mapped' in SGD. This is also true for genes that only encode RNA products (e.g. Tlc1, Snr1). However, you can use the “force interactions” option to enter the gene names for now. The IMS system will eventually be updated to recognize these features.
- What if an interaction can't be assigned one of the available experimental system categories?
- Occasionally, an interaction cannot be readily assigned a protein or genetic interaction category, in which case, the closest substitute should be chosen and an explanation of the exact experimental context should be given in the qualification text box.
- How should non-allelic non-complementation entries be categorized?
- For any “non-allelic non-complementation” results, enter them as 'synthetic lethal' interactions and include a description in the qualification box.
- What about very large-scale data sets, such as those found in supplementary Excel tables or text files?
- If you come across a paper that has a list of interactions in a supplementary Excel table or text file, make a note of the PubMed ID so that the interactions can be added to the database in bulk using a loading script rather than via the curation interface. You can also send the relevant data files via email, especially if you feel they need to be interpreted and modified for proper loading. Later on, the curator tool will include the option to upload large HTP data files, but until then they will be uploaded using a script.
- How often are new papers added to the database?
- New papers will be added on a weekly basis.
- What if I'm not sure how to curate a paper?
- If you are having trouble curating a difficult paper, then it is a good idea to email other curators for advice.
- What if a paper also contains interactions for an organism I'm not working on?
- If a paper contains interactions in other organisms, then we can go ahead and curate all of the interactions for all relevant organisms if they're straightforward. However, if there's any confusion about the genes/proteins involved, then it is best to only curate the relevant interactions. When you are finished, mark the paper as read and select the 'Full Text' option for the project you're in, then email the PubMed ID to other curators who may be able to help out with the curation of the other organisms.
- If a paper shows that the presence of a protein is required for the formation of a protein complex, is this enough evidence to enter an interaction between the required protein and all the subunits of the complex (e.g. Fig. 1, PMID: 12377768)?
- In Fig. 1 of this example, the authors show that Tim11p is required for dimerization of the FIFO-ATP synthase dimer, but they only immunoblotted with an antibody against one subunit of the F1 sector of the complex. This would not warrant entering an interaction with all of the subunits of the complex because we are not going to add inferences based on this type of in vivo data. It is also important to note that none of the existing experimental systems in the curation tool would apply to this type of experiment.
BioGRID Team Members, References, and Funding Details
For more information on the BioGRID and the history of the BioGRID, a full list of our publications, team members, and funding sources can be found on our About Us Page.