Cancer Genetics (CG) task

The primary CG task is completed. The task received final submissions from six teams. Thank you for your participation!

Please note that the CG task continues as a challenge open to any interested party:

Introduction

The Cancer Genetics (CG) task is an information extraction task organized as part of the BioNLP Shared Task 2013. The CG task aims to advance the automatic extraction of information from statements on the biological processes relating to the development and progression of cancer.

The scientific literature on cancer is enormous, and our understanding of the molecular mechanisms of cancer is developing rapidly: a PubMed query for "cancer" returns 2.7 million scientific article citations, with 140,000 citations regarding "cancer" from 2011. To build and maintain comprehensive, up-to-date knowledge bases on cancer genetics, automatic support for managing the literature is required.

The BioNLP Shared Task series has been instrumental in encouraging the development of methods and resources for the automatic extraction of bio-processes from text, but efforts within this framework have been almost exclusively focused on molecular and sub-cellular level entities and events. To be relevant to cancer biology, event extraction technology must be generalized to be able to address physical entities entities and processes at higher levels of biological organization, such as cell proliferation, apoptosis, blood vessel development, and organ growth. The CG task aims to advance the development of such event extraction methods and the capacity of automatic analysis of texts on cancer biology.

CG task evaluation results

Results for full task, primary evaluation criteria

-----------------------------------------------------------

gold (match) answer (match) recall prec. fscore

-----------------------------------------------------------

TEES-2.1 5972 ( 2912) 4518 ( 2899) 48.76 64.17 55.41

NaCTeM 5972 ( 2916) 5192 ( 2898) 48.83 55.82 52.09

NCBI 5972 ( 2286) 3878 ( 2282) 38.28 58.84 46.38

RelAgent 5972 ( 2492) 5014 ( 2486) 41.73 49.58 45.32

UET-NII 5972 ( 1174) 1862 ( 1168) 19.66 62.73 29.94

ISI 5972 ( 982) 2049 ( 980) 16.44 47.83 24.47

These are the primary results for the task. Please click on group names for detailed results.

Task definition

The CG task is an event extraction task following the representation and task setting of the ST’09 and ST’11 main tasks. The representation involves two primary categories of annotation: (physical) entity annotation, and event annotation. Participants in the CG task will be provided with gold standard annotations for entity mentions, also for test data. The task thus focuses efforts on the primary event extraction task.

The entity and event types defined in the CG task are detailed below.

Entities

The CG task entity types are defined with reference to domain standard databases and ontologies, specifically the Gene Ontology (GO), the Cell Ontology (CL), the Common Anatomy Reference Ontology (CARO) and the Chemical Entities of Biological Interest (ChEBI) ontology. (Labels in gray in the following table are included for organization only and are not included in the annotated types in the task.)

Events

The CG task event types are defined primarily with reference to the Gene Ontology (GO) Biological process subontology. (Labels in gray in the following table are included for organization only and are not included in the annotated types in the task.)

Here, “Molecule” refers to an entity annotation of the type Simple chemical or Gene or gene product, “Anatomical” (“Pathological”) to one of any of the anatomical (pathological) entity types (see entity definitions above), “Entity” to any physical entity type, and “Any” to an annotation of any type, either entity or event.

For understanding the annotations, it may be helpful to see a visualization of a small sample of CG task annotations (provided using the brat rapid annotation tool). You can also download the small sample data below.

Evaluation

Submissions to the CG task are evaluated using the event instance-based evaluation criteria established in the BioNLP Shared Task 2009 (see the BioNLP ST'09 overview paper for details). The script evaluation-CG.py is used for evaluation.

Task organization

The CG task is organized by the University of Manchester and the National Centre for Text Mining (NaCTeM).

    • Sampo Pyysalo: task chair

    • Tomoko Ohta: primary annotation

    • Rafal Rak: workflows, annotation support

    • Sophia Ananiadou: principal investigator

The CG task annotation builds in part on the Multi-Level Event Extraction (MLEE) corpus annotation, which in turn builds on a previously released corpus of angiogenesis domain abstracts annotated by Wang et al (2011). We wish to acknowledge the contribution of our colleagues toward the annotation of this part for the CG task corpus.