CRISPR Screen Metadata

Many of the manually curated fields in ORCS use controlled vocabularies developed by curators after a survey of the literature. These vocabularies are provided below.

Field Name Description
Screen Name Each screen is assigned a number followed by the PubMed ID (e.g. 1-PMID26627737) to ensure publications with multiple screens have a unique identifier for each individual screen.
Organism Controlled vocabulary found here.
Screen Rationale A free text field to succinctly describe the purpose of the screen in order to provide additional context for the significance of hits. Common rationales include “Cell essential genes” or “Increased drug resistance” but allow the freedom for added specificity when needed such as “Tumor response to drug treatment”.
Experimental Setup Controlled vocabulary based on survey of the data. Can be easily modified to accommodate new screen types that may be developed.
Duration Controlled vocabulary of hours, days or doublings.
Condition Name/Condition Dosage Ontology IDs are added for various conditions: drug/chemical (ChEBI ID), protein ligand (UniProt ID), virus or bacteria (Taxon ID), mutation (relevant gene ID). If a condition does not have an applicable ID, such as a toxin or relevant culture condition the ID field is left blank. Currently if multiple conditions need to be specified for a screen (usually a mutation in addition to another condition) they can be separated using a pipe character. For the dosage there is a numeric field with an associated controlled vocabulary for units. A list of ontology terms currently in use in ORCS can be found here.
MOI Optional (Multiplicity of Infection) should be entered if provided in the publication.
CRISPR Library Name/Type Controlled vocabulary terms generated by survey of the literature. Can easily be modified as new libraries are developed. Accession IDs associated when possible.
Screen Format Controlled vocabulary terms generated by survey of the literature. Can easily be modified as new screen formats are developed.
Enzyme Controlled vocabulary terms generated by survey of the literature. Can easily be modified as new enzymes such as Base Editors are developed.
Cell Line/Type Ontology terms from BTO (first choice), EFO (second choice) or CLO (third choice).Additional guidelines for this curation can be found here.
Phenotype Ontology terms from APO, CMPO or Controlled Vocabulary as needed. Terms currently in use in ORCS can be found here.
Analysis Method Controlled vocabulary terms generated by survey of the literature. Can easily be modified as new analysis methods are developed.
Significance Threshold When a significance threshold is given in a paper, it can be applied to any score column for a screen and even to multiple score columns based on the analysis method. If only a list of significant hits was published, then all of the gene hits are uploaded as “all significant”. Alternatively, if a yes/no hit value was used to identify gene hits in a paper, then the genes can be uploaded by specifying they're hit status using a boolean column.

Back to ORCS Curation Guide

 
orcs/curation_guide/metadata_terms.txt · Last modified: 2022/07/12 13:54 by jenn