This is an old revision of the document!


BioGRID PROJECTINDEX Formatted Downloads

The BioGRID PROJECTINDEX format is a custom BioGRID tab-delimited format that is designed to provide an annotation gene list associated with a given BioGRID curation project in a way that is simple to read in Microsoft Excel or to script into Bioinformatics applications. All columns are separated by tabs.

Back to BioGRID Download Formats

How to Detect a BioGRID PROJECTINDEX file

BioGRID PROJECTINDEX files are denoted by the extension .projectindex.txt or .projectindex.zip

Header Definitions

The first line of a BioGRID PROJECTINDEX file is the heading line and starts with a hash (#). This line is purely for informational purposes and gives a brief description of the content contained in each column. If you are scripting the use of this file, you can simply ignore it.

Column Definitions

The column contents of BioGRID PROJECTINDEX files will always contain the following columns:

  1. BioGRID ID is the identifier in the BioGRID database that corresponds to the gene. These identifiers are best used for creating links to the BioGRID from your own websites or applications. To link to a page within our site, simply append the URL: http://www.thebiogrid.org/ID/ to each ID. For example, http://www.thebiogrid.org/31623/.
  2. Entrez Gene ID is the identifier from the Entrez-Gene database that corresponds to the interacting gene. If no Entrez Gene ID is available, this will be a “-”.
  3. Systematic name is a plain text systematic name if known for the gene. Will be a “-” if no name is available.
  4. Official symbol is a common gene name/official symbol for the gene. Will be a “-” if no name is available.
  5. Synonyms/Aliases is a “|” separated list of alternate identifiers for the gene. Will be “-” if no aliases are available.
  6. Organism ID is the NCBI taxonomy ID for the gene.
  7. Organism Name is the official name of the organism for the gene.
  8. Interaction Count is the number of interactions in the BioGRID for this gene.
  9. PTM Count is the number if post translational modifications in the BioGRID for this gene.
  10. Chemical Interaction Count is the number of chemical interactions in the BioGRID for this gene.
  11. Source is the source database for the curation of this gene within this project.

Additional Column Definitions (Repeating)

In addition to above, PROJECTINDEX files can contain any number of additional columns, depending on the project being represented. Since these columns may differ significantly from one project to the next, it's best to examine the project page within BioGRID for more detailed information about the project specific columns. Any additional columns will come in the following sets of 7 columns which are repeated N times for each unique dataset:

  1. <COLUMN_NAME>_values - This column contains the values used on the web version for the corresponding column on our project pages. This will contain one or more values (separated by “|”) and are usually a type of classification or ontology reference.
  2. <COLUMN_NAME>_ids - If the value entered up above also has an accompanying official id, this column will contain matching “|” separate ID values associated with those in the <COLUMN_NAME>_values column. For example, if <COLUMN_NAME>_values contains Gene Ontology Terms, this column would contain corresponding Gene Ontology Term IDs. If not applicable, this column will contain “-”
  3. <COLUMN_NAME>_tags - Tags are another level of associated qualifications on an original V
  4. <COLUMN_NAME>_evidence_values - This is easy to read common reference values for this column in “|” separated format for multiple entries. These are usually terms that classify or annotate a gene. Examples: Group Classifications, Ontology Terms etc. This column will contain “-” if not applicable.
  5. <COLUMN_NAME>_evidence_ids - These are ids that correspond to the values in <COLUMN_NAME>_values. For example, if <COLUMN_NAME>_values contained the gene ontology term: nucleus, this column may contain the corresponding GO ID of GO:0005634. This column will contain “-” if not applicable.
  6. <COLUMN_NAME>_evidence_classes -
  7. <COLUMN_NAME>_evidence_methods -

All columns are mandatory so columns with no values are filled with “-“

Back to BioGRID Download Formats

 
biogrid_projectindex.1536816984.txt.gz · Last modified: 2018/09/13 01:36 by biogridadmin