Tasks‎ > ‎

Cancer Genetics (CG) task

The primary CG task is completed. The task received final submissions from six teams. Thank you for your participation!

Please note that the CG task continues as a challenge open to any interested party:

Introduction

The Cancer Genetics (CG) task is an information extraction task organized as part of the BioNLP Shared Task 2013. The CG task aims to advance the automatic extraction of information from statements on the biological processes relating to the development and progression of cancer.




The scientific literature on cancer is enormous, and our understanding of the molecular mechanisms of cancer is developing rapidly: a PubMed query for "cancer" returns 2.7 million scientific article citations, with 140,000 citations regarding "cancer" from 2011. To build and maintain comprehensive, up-to-date knowledge bases on cancer genetics, automatic support for managing the literature is required.

The BioNLP Shared Task series has been instrumental in encouraging the development of methods and resources for the automatic extraction of bio-processes from text, but efforts within this framework have been almost exclusively focused on molecular and sub-cellular level entities and events. To be relevant to cancer biology, event extraction technology must be generalized to be able to address physical entities entities and processes at higher levels of biological organization, such as cell proliferation, apoptosis, blood vessel development, and organ growth. The CG task aims to advance the development of such event extraction methods and the capacity of automatic analysis of texts on cancer biology.

CG task evaluation results

Results for full task, primary evaluation criteria

             -----------------------------------------------------------
              gold (match)   answer (match)   recall    prec.   fscore
             -----------------------------------------------------------
TEES-2.1      5972 ( 2912)     4518 ( 2899)    48.76    64.17    55.41
NaCTeM        5972 ( 2916)     5192 ( 2898)    48.83    55.82    52.09
NCBI          5972 ( 2286)     3878 ( 2282)    38.28    58.84    46.38
RelAgent      5972 ( 2492)     5014 ( 2486)    41.73    49.58    45.32
UET-NII       5972 ( 1174)     1862 ( 1168)    19.66    62.73    29.94
ISI           5972 (  982)     2049 (  980)    16.44    47.83    24.47

These are the primary results for the task. Please click on group names for detailed results.

Task definition

The CG task is an event extraction task following the representation and task setting of the ST’09 and ST’11 main tasks. The representation involves two primary categories of annotation: (physical) entity annotation, and event annotation. Participants in the CG task will be provided with gold standard annotations for entity mentions, also for test data. The task thus focuses efforts on the primary event extraction task.

The entity and event types defined in the CG task are detailed below.

Entities

The CG task entity types are defined with reference to domain standard databases and ontologies, specifically the Gene Ontology (GO), the Cell Ontology (CL), the Common Anatomy Reference Ontology (CARO) and the Chemical Entities of Biological Interest (ChEBI) ontology. (Labels in gray in the following table are included for organization only and are not included in the annotated types in the task.)

 Type  Scope  Reference
Anatomical entity structural organization of organism CARO
   Material anatomical entity anatomical entities (ASs) with mass CARO
      Anatomical structure material ASs with structure CARO
         Organismorganism mentions taxonomy DBs 
         Organism subdivision fiat parts of multicellular organism CARO
         Anatomical system ASs of multiple organs CARO
         Organ ASs of multiple multi-tissue structs. CARO
         Multi-tissue structure AS of multiple tissues CARO
         Tissue ASs of similar cells and ECM CARO
         Developing anatomical structure ASs varying in granularity due to development CARO
         Cell ASs of cell compartment, surrounded by PM CL
         Cellular component ASs that are parts of cells GO-CC
   Organism substance gaseous, liquid or semisolid material ASs CARO
   Immaterial anatomical entity anatomical entities without mass CARO
Molecular entity    
   Gene or gene product genes, RNA and proteins gene/protein DBs
   Simple chemical simple, non-repetitive chemical entities ChEBI
   Protein domain or region parts of protein molecules
   Amino acid amino acid residues ChEBI
   DNA domain or region short, specifically identified spans of DNA
Pathological formation pathological material organism parts
   Cancer cancerous pathological formations

Events

The CG task event types are defined primarily with reference to the Gene Ontology (GO) Biological process subontology. (Labels in gray in the following table are included for organization only and are not included in the annotated types in the task.)

 Type  Arguments
Anatomical
   Development Theme(Anatomical/Pathological)
      Blood vessel development Theme?(Anatomical/Pathological), AtLoc?(Anatomical/Pathological)
   Growth Theme(Anatomical/Pathological)
   Death Theme(Anatomical/Pathological)
      Cell death Theme?(Cell)
   Breakdown Theme(Anatomical/Pathological)
   Cell proliferation Theme(Cell)
   Cell division
Theme(Cell)
   Cell differentiation
Theme(Cell), AtLoc?(Anatomical/Pathological)
   Remodeling Theme(Tissue)
   Reproduction Theme(Organism)
Pathological   
   Mutation Theme(GGP), AtLoc?(Anatomical/Pathological), Site?
   Carcinogenesis Theme?(Anatomical/Pathological), AtLoc?(Anatomical/Pathological)
    Cell transformation
Theme(Cell), AtLoc?(Anatomical/Pathological)
   Metastasis Theme?(Anatomical/Pathological), ToLoc?(Anatomical/Pathological)
   Infection Theme?(Anatomical/Pathological), Participant?(Organism)
 Molecular  
    Metabolism  Theme(Molecule) 
      Synthesis Theme(Simple chemical)
      Catabolism Theme(Molecule) 
         Amino_acid_catabolismTheme?(Molecule) 
         GlycolysisTheme?(Molecule) 
      Gene expression Theme+(GGP)
         Transcription Theme(GGP)
         Translation Theme(GGP)
         Protein processing Theme(GGP)
   Phosphorylation Theme(Molecule), Site?(Protein domain/region) 
   Dephosphorylation  Theme(Molecule), Site?(Protein domain/region) 
   (etc. for other modifications)  
   DNA methylation  Theme(GGP), Site?(Protein or DNA domain/region) 
   DNA demethylation Theme(GGP), Site?(Protein or DNA domain/region) 
   Pathway  Participant?(Molecule)
 General  
   Binding Theme+(Molecule), Site?(Protein or DNA domain/region)
   Dissociation Theme(Molecule), Site?(Protein or DNA domain/region)
   Localization Theme+(Molecule), (At/From/To)Loc?(Anatomical/Pathological)
 Regulation Theme(Any), Cause?(Any) 
     Positive regulation Theme(Any), Cause?(Any) 
     Negative regulation Theme(Any), Cause?(Any) 
 Planned process Theme*(Any), Instrument*(Entity)

Here, “Molecule” refers to an entity annotation of the type Simple chemical or Gene or gene product, “Anatomical” (“Pathological”) to one of any of the anatomical (pathological) entity types (see entity definitions above), “Entity” to any physical entity type, and “Any” to an annotation of any type, either entity or event.

For understanding the annotations, it may be helpful to see a visualization of a small sample of CG task annotations (provided using the brat rapid annotation tool). You can also download the small sample data below.

Evaluation

Submissions to the CG task are evaluated using the event instance-based evaluation criteria established in the BioNLP Shared Task 2009 (see the BioNLP ST'09 overview paper for details). The script evaluation-CG.py is used for evaluation. 

Task organization

The CG task is organized by the University of Manchester and the National Centre for Text Mining (NaCTeM).
  • Sampo Pyysalo: task chair
  • Tomoko Ohta: primary annotation
  • Rafal Rak: workflows, annotation support
  • Sophia Ananiadou: principal investigator
The CG task annotation builds in part on the Multi-Level Event Extraction (MLEE) corpus annotation, which in turn builds on a previously released corpus of angiogenesis domain abstracts annotated by Wang et al (2011). We wish to acknowledge the contribution of our colleagues toward the annotation of this part for the CG task corpus.
ċ
BioNLP-ST_2013_CG_sample-1.0.tar.gz
(22k)
Sampo Pyysalo,
Oct 23, 2012, 9:39 PM
Comments