Pathway Curation (PC) task

The primary PC task is completed. The task received final submissions from two teams. Thank you for your participation!

Please note that the CG task continues as a challenge open to any interested party:

Introduction

The Pathway Curation (PC) task is a main task of the BioNLP Shared Task 2013. The PC task aims to evaluate the applicability of event extraction systems to support the curation, evaluation and maintenance of biomolecular pathway models and to encourage the further development of methods for these tasks.

Despite more than a decade of work in biomedical text mining on tasks under headings such as “automatic pathway extraction”, natural language processing and information extraction methods have not been widely embraced by biomedical pathway curation communities. Until recently, biomedical domain IE efforts concentrated on simple representations (e.g. physical entity pairs) that were not sufficiently expressive to address pathway curation, and most work also involved different semantics from those applied in curation efforts. We believe that the structured event representation applied in BioNLP Shared Task main tasks offers many opportunities to make a significant contribution to practical pathway curation efforts. The PC task is proposed as a step toward realizing these opportunities.

To assure that the task and its data is relevant to the needs of pathway curation efforts, the PC task defines its extraction targets and their semantics with reference to physical entity and reaction types applied in pathway model standardization efforts and relevant ontologies such as the Systems Biology Ontology (SBO). Further, The corpus texts are selected on the basis of relevance to a selection of pathway models from Panther Pathway DB [13] and BioModels [11], covering both signaling and metabolic pathways. The texts involve both PubMed publication abstracts and PMC Open Access full-text paper extracts.

PC task evaluation results

Results for full task, primary evaluation criteria

-----------------------------------------------------------

gold (match) answer (match) recall prec. fscore

-----------------------------------------------------------

NaCTeM 4178 ( 2182) 4039 ( 2160) 52.23 53.48 52.84

TEES-2.1 4178 ( 1970) 3521 ( 1964) 47.15 55.78 51.10

These are the primary results for the task. Please click on group names for detailed results.

Task definition

The PC task is an event extraction task following the representation and task setting of the ST’09 and ST’11 main tasks. The representation involves two primary categories of annotation: (physical) entity annotation, and event annotation. Participants in the PC task will be provided with gold standard annotations for entity mentions, also for test data. The task thus focuses efforts on the primary event extraction task.

The entity and event types defined in the PC task are detailed below.

Entities

Events

Here, “Molecule” refers to an entity annotation of any of the types Simple chemical, Gene or gene product, or Complex, and “Any” refers to an annotation of any type, either physical entity or event. Event types are defined with reference to the Systems Biology Ontology (SBO) and the Gene Ontology (GO) Biological process subontology.

For understanding the annotations, it may be helpful to see a visualization of a small sample of PC task annotations (provided using the brat rapid annotation tool). You can also download the small sample data below.

Corpus annotation

Initial physical entity annotation for the PC corpus is created automatically using state-of-the-art entity mention taggers for each of the targeted entity types, integrated in the Argo workflow system. To assure that the quality and consistency of the event annotation, the target of the extraction task, is as high as possible, the event annotation will be created entirely manually, without automatic support. This annotation effort will be carried out using the BRAT annotation tool by a group of biologists in collaboration between NaCTeM and KISTI.

Task organization

The PC task is jointly organized by the University of Manchester and the National Centre for Text Mining (NaCTeM) and the Korea Institute of Science and Technology Information (KISTI), with the support from annotators from various groups.

University of Manchester and NaCTeM

    • Sophia Ananiadou

    • Sampo Pyysalo

    • Tomoko Ohta

    • Rafal Rak

KISTI

    • Sung-Pil Choi

    • Hong-woo Chun

    • Sung-jae Jung

Annotators

    • Hyun Uk Kim (KAIST)

    • Jinki Kim (KAIST)

    • Kyusang Hwang (KAIST)

    • Yonghwa Jo

    • Hyeyeon Choi

References