BioGRID PROJECTINDEX Formatted Downloads

The BioGRID PROJECTINDEX format is a custom BioGRID tab-delimited format that is designed to provide an annotation gene list associated with a given BioGRID curation project in a way that is simple to read in Microsoft Excel or to script into Bioinformatics applications. All columns are separated by tabs.

Back to BioGRID Download Formats

How to Detect a BioGRID PROJECTINDEX file

BioGRID PROJECTINDEX files are denoted by the extension .projectindex.txt or .projectindex.zip

Header Definitions

The first line of a BioGRID PROJECTINDEX file is the heading line and starts with a hash (#). This line is purely for informational purposes and gives a brief description of the content contained in each column. If you are scripting the use of this file, you can simply ignore it.

Column Definitions

The column contents of BioGRID PROJECTINDEX files will always contain the following columns:

  1. BioGRID ID is the identifier in the BioGRID database that corresponds to the gene. These identifiers are best used for creating links to the BioGRID from your own websites or applications. To link to a page within our site, simply append the URL: http://www.thebiogrid.org/ID/ to each ID. For example, http://www.thebiogrid.org/31623/.
  2. Entrez Gene ID is the identifier from the Entrez-Gene database that corresponds to the interacting gene. If no Entrez Gene ID is available, this will be a “-”.
  3. Systematic name is a plain text systematic name if known for the gene. Will be a “-” if no name is available.
  4. Official symbol is a common gene name/official symbol for the gene. Will be a “-” if no name is available.
  5. Synonyms/Aliases is a “|” separated list of alternate identifiers for the gene. Will be “-” if no aliases are available.
  6. Organism ID is the NCBI taxonomy ID for the gene.
  7. Organism Name is the official name of the organism for the gene.
  8. Interaction Count is the number of interactions in the BioGRID for this gene.
  9. PTM Count is the number if post translational modifications in the BioGRID for this gene.
  10. Chemical Interaction Count is the number of chemical interactions in the BioGRID for this gene.
  11. Source is the source database for the curation of this gene within this project.

Additional Column Definitions (Repeating)

In addition to above, PROJECTINDEX files can contain any number of additional columns, depending on the project being represented. Since these columns may differ significantly from one project to the next, it's best to examine the project page within BioGRID for more detailed information about the project specific columns. Any additional columns will come in the following sets of 7 columns which are repeated N times for each unique dataset:

  1. <COLUMN_NAME>_values - This column contains the values used on the web version for the corresponding column on our project pages. This will contain one or more values (separated by “|”) and are usually a type of classification or ontology reference.
  2. <COLUMN_NAME>_ids - If the value entered up above also has an accompanying official id, this column will contain matching “|” separate ID values associated with those in the <COLUMN_NAME>_values column. For example, if <COLUMN_NAME>_values contains Gene Ontology Terms, this column would contain corresponding Gene Ontology Term IDs. If not applicable, this column will contain “-”
  3. <COLUMN_NAME>_tags - Tags are another level of associated qualifications on an original value. A use for this column might be to further subclassify a major category or to simply add-on more information about a mapped value. If not applicable, this column will contain “-”
  4. <COLUMN_NAME>_evidence_values - This column contains human readable values used as evidence for corresponding values in the <COLUMN_NAME>_values column. For example, if <COLUMN_NAME>_values contains a Gene Ontology Term, this column might contain a reference to a publication where that mapping was first indicated. If there is multiple evidence values for one <COLUMN_NAME>_values entry, they will be separated by “,” with unique sets being separated by “|”. For example, if <COLUMN_NAME>_values contains there term “nucleus|cell wall” this column may contain: Costessi A (2011),Wee S (2002)|Okumura F (2012) indicating that COSTESSI A and WEE S both refer to nucleus, and OKUMURA refers to cell wall. If no evidence values are provided, this will be “-”
  5. <COLUMN_NAME>_evidence_ids - These are ids that correspond to the values in <COLUMN_NAME>_evidence_values. For example, if <COLUMN_NAME>_evidence_values contains Costessi A (2011),Wee S (2002)|Okumura F (2012), this column might contain: 138657,58230|140283 corresponding to BioGRID publication IDs. If no evidence ids are provided, this will be “-”
  6. <COLUMN_NAME>_evidence_classes - This column contains the class of evidence being presented. It contains generic terms representing sources of evidence such as PUBLICATION|INTERPRO|PFAM. If no evidence classes are provided, this will be “-”
  7. <COLUMN_NAME>_evidence_methods - This column contains whether or not the evidence is considered to be experimentally derived or inferred. Each piece of evidence can have an associated method. If no evidence methods are provided, this will be “-”.

All columns are mandatory so columns with no values are filled with “-“

Back to BioGRID Download Formats

 
biogrid_projectindex.txt · Last modified: 2018/09/13 01:47 by biogridadmin