BioGRID Curation Guide
Table of Contents
BioGRID Curation Guide
The following sections contains detailed information on BioGRID curation guidelines.
Curation Questions and Answers
- Should we curate all interactions in a paper?
- We capture all interactions in a paper, even if they are not the main focus of an experiment and have been curated previously. For example, in Fig. 3 of PMID: 15608614, the authors were looking at the effects on the TOM complex after depletion of MIM1. Even if there aren't any interactions being added for MIM1 based on this figure, the interactions between the subunits of the TOM complex (e.g. TOM70, TOM40, etc.) should be added under 'co-purification'. We also curate interactions found in supplementary files.
- Why is it only possible to add protein-RNA interactions and not DNA-protein interactions?
- Initially, the interactions project was only going to include protein-protein and genetic interactions, not nucleic acid-protein interactions. However, during the course of the curation task, a useful paper containg hundreds of protein-mRNA interactions was found and so the option to add protein-RNA interactions was added to the curation tool. This was done to allow the database to distinguish between protein-protein and protein-mRNA interactions. It was decided that DNA-protein interactions might be curated as part of a separate project.
- How should protein-RNA interactions be curated?
- For an example of how to curate protein-RNA interactions, refer to figures 1B and 3B of PMID: 10747033. The U4, U5 and U6 snRNAs were detected in the Lsm8p-TAP and Lsm3p-TAP affinity-purified fractions. Note that these interactions have to be entered using the “force interactions” option for now because the IMS system needs to be updated to recognize genes that only encode RNA products. Note that we are not curating all types of protein-RNA interactions. For example, see Fig. 7 in PMID: 7862160. This papers shows that Ref2p binds RNA by using the following substrates: CYC1 pre-mRNA, poly-G and poly-A ribonucleotide homopolymers, and various Ty RNA substrates. These types of protein-RNA interactions would not be curated using the IMS system because none of them are specific substrates for Ref2p.
- What about genes that are 'not physically mapped' in SGD or genes that only encode RNA products?
- Currently, the tool will not allow you to enter genes that are 'not physically mapped' in SGD. This is also true for genes that only encode RNA products (e.g. Tlc1, Snr1). However, you can use the “force interactions” option to enter the gene names for now. The IMS system will eventually be updated to recognize these features.
- What is the difference between “Affinity Capture-MS/Western” and “Reconstituted Complex“?
- If the relevant proteins are co-expressed in the cell, then the interaction is considered to occur in vivo and can be classified as “Affinity Capture-Western” or “Affinity Capture-MS”. If the bait is not co-expressed in the cell, but merely purified and incubated with a cell lysate in vitro or even purified recombinant proteins (e.g. GST pull-down), then we use the in vitro option of “Reconstituted Complex”.
- What if an interaction can't be assigned one of the available experimental system categories?
- Occasionally, an interaction cannot be readily assigned a protein or genetic interaction category, in which case, the closest substitute should be chosen and an explanation of the exact experimental context should be given in the qualification text box.
- How should non-allelic non-complementation entries be categorized?
- For any “non-allelic non-complementation” results, enter them as 'synthetic lethal' interactions and include a description in the qualification box.
- What about very large-scale data sets, such as those found in supplementary Excel tables or text files?
- If you come across a paper that has a list of interactions in a supplementary Excel table or text file, make a note of the PubMed ID so that the interactions can be added to the database in bulk using a loading script rather than via the curation interface. You can also send the relevant data files via email, especially if you feel they need to be interpreted and modified for proper loading. Later on, the curator tool will include the option to upload large HTP data files, but until then they will be uploaded using a script. Note that we are only loading the high confidence interactions, so if some have been invalidated by the authors, then those do not need to be curated.
- When is a protein considered to interact with itself as a bait and hit?
- This type of interaction is reserved for situations where clear evidence has been shown for the presence of a dimer or multimer. The most common experiment involves tagging a single protein with two different tags, pulling down with one and then blotting for the other. “Affinity Capture-Western” applies if it's an in vivo experiment and “Reconstituted Complex” if it's an in vitro one. Crystal structures may also show dimerization.
- If I have to arbitrarily choose a directionality for an interaction should I enter the interaction twice (once in each direction)?
- No, we curate using a spoke model and entering interactions in both directions will artificially inflate the number of interactions in the database (making it appear as if an interaction has been shown twice in a paper when it has not). Choose the directionality that makes the most sense based on BioGRID guidelines in the Directionality of Interactions (Baits/Hits) section and only enter the interaction once.
- What if I'm not sure how to curate a paper?
- If you are having trouble curating a difficult paper, then it is a good idea to email other curators for advice.
- What if a paper also contains interactions for an organism I'm not working on?
- If the paper only contains interactions from an organism that isn’t in the organism pull down menu in the IMS, then move it to the “Miscellaneous curation” project. If the paper contains interactions from a variety of organisms, then go ahead and curate all of the interactions for all relevant organisms if they're straightforward. However, if there's any confusion about the genes/proteins involved, then it is best to only curate the relevant interactions for your organism of interest. When you are finished, mark the paper as read and select the 'Full Text' option for the project you're in, then email the PubMed ID to other curators who may be able to help out with the curation of interactions from other organisms in that paper.
- If a paper shows that the presence of a protein is required for the formation of a protein complex, is this enough evidence to enter an interaction between the required protein and all the subunits of the complex (e.g. Fig. 1, PMID: 12377768)?
- In Fig. 1 of this example, the authors show that Tim11p is required for dimerization of the FIFO-ATP synthase dimer, but they only immunoblotted with an antibody against one subunit of the F1 sector of the complex. This would not warrant entering an interaction with all of the subunits of the complex because we are not going to add inferences based on this type of in vivo data. It is also important to note that none of the existing experimental systems in the curation tool would apply to this type of experiment.
- Do we enter “data not shown” (DNS) interactions?
- In general, we do not curate DNS interactions, but occasionally we do and this can be left up to curator judgment. If a curator thinks they're important, then the DNS interactions can be entered in the IMS. However, it is a good idea to add a “Data not shown” note in the qualifications. For example, if an experiment was carried out with four different genes but only one representative gel was shown and the authors state that “similar results were also observed for X, Y, and Z”, then these DNS interactions can be curated.
- What if only the double mutant (or multiple mutant) is shown and not each of the single mutants?
- As a rule, we only curate genetic interactions when each of the single mutants is shown in comparison to the double or multiple mutants within the same paper. However, exceptions are made when authors refer to another paper in which they show the single and double mutants (e.g. PMID 20111601 references single mutants in PMID 15961414, fig 4). In some cases, the single mutants might be mentioned as “data not shown”. Curator judgment can be used when curating these types of interactions. However, for rescue experiments of any type (phenotypic suppression/synthetic rescue) you can infer the interaction with only one phenotype shown.
- How much detail are we capturing (e.g. different alleles, distinct phenotypes)?
- Sometimes papers show more complicated results with suppression, or sythetic sick/lethality interactions depending on the alleles used. For example, in table 1 of PMID 15657399, different phenotypes (i.e. salt sensitive, formamide sensitive, temperature sensitive, SS, SL) are listed depending on the alleles of ARP2 and ARP3 used in the interaction experiment . If it is easy to capture the allele information and distinct phenotypes, then we should go ahead and do so. However, in more complicated situations involving too much detail as shown in table 1 of this paper, it is okay to only curate the most severe phenotype which would be synthetic lethality. Again, curator judgment applies.
- How do we deal with conflicting data?
- BioGRID curates interaction evidence as it is presented in each publication independently. This can lead to data conflicts, which are simply a reflection of the literature. Occasionally an author will fail to reproduce the result of a previous publication and will convey this either in their own subsequent publications or as a personal communication to another author. Since we have no systematic method to find and remove all such partial retractions, in the interests of data consistency we do not remove any of these interactions, even if requested by the author. We do, however, systematically remove interactions from retracted publications (as tagged in MedLine with PT-Retracted Publication).
- Do we curate cross-species interactions?
- We curate cross-species interactions, e.g. between a yeast and a human protein. The curation tool allows us to select separate species for the bait and hit using the relevant pulldown menus. However, this does not include cross-species complementation experiments because they are not really genetic interactions between two genes, but rather a test to determine functional orthologs in other species.
BioGRID Team Members, References, and Funding Details
For more information on the BioGRID and the history of the BioGRID, a full list of our publications, team members, and funding sources can be found on our About Us Page.