Introduction

The BioNLP Shared Task (BioNLP-ST) series represents a community-wide trend in text-mining for biology toward fine-grained information extraction (IE). The two previous events, BioNLP-ST 2009 and 2011, attracted wide attention, with over 30 teams submitting final results. The tasks and their data have since served as the basis of numerous studies, released event extraction systems, and published datasets.

As in previous events, the results of BioNLP-ST 2013 has been presented at the ACL/HLT BioNLP-ST workshop colocated with the BioNLP workshop in Sofia, Bulgaria (9 August 2013). The proceedings are available on ACL archive.

A Special Issue of BMC Bioinformatics on BioNLP-ST'13 has been published. See the dedicated Web page for more information.

BioNLP-ST 2013 follows the general outline and goals of the previous tasks. It identifies biologically relevant extraction targets and proposes a linguistically motivated approach to event representation. The tasks in BioNLP-ST 2013 cover many new hot topics in biology that are close to biologists' needs. BioNLP-ST 2013 broadens the scope of the text-mining application domains in biology by introducing new issues on cancer genetics and pathway curation. It also builds on the well-known previous datasets GENIA, LLL/BI and BB to propose more realistic tasks that considered previously, closer to the actual needs of biological data integration.

The first event in 2009 triggered active research in the community on a specific fine-grained IE task. Expanding on this, the second BioNLP-ST was organized under the theme "Generalization", which was well received by participants, who introduced numerous systems that could be straightforwardly applied to multiple tasks. This time, the BioNLP-ST takes a step further and pursues the grand theme of "Knowledge base construction", which is addressed in various ways: semantic web (GE, GRO), pathways (PC), molecular mechanisms of cancer (CG), regulation networks (GRN) and ontology population (GRO, BB).

As in previous events, manually annotated data are provided for training, development and evaluation of information extraction methods. According to their relevance for biological studies, the annotations are either bound to specific expressions in the text or represented as structured knowledge. Many tools for the detailed evaluation and graphical visualization of annotations and system outputs are available for participants. Support in performing linguistic processing are provided to the participants in the form of analyses created by various state-of-the art tools on the dataset texts.

Participation to the task was open to the academia, industry, and all other interested parties. The access to the on-line evaluation services remains open on each individual task page after the end of the official test period. For instruction on their use, please refer to the participation page.

Tasks

BioNLP-ST 2013 features the six event extraction tasks listed below. Descriptions and sample data are found in the main task page and the individual task pages.

    • [GE] Genia Event Extraction for NFkB knowledge base construction

    • [CG] Cancer Genetics

    • [PC] Pathway Curation

    • [GRO] Corpus Annotation with Gene Regulation Ontology

    • [GRN] Gene Regulation Network in Bacteria

    • [BB] Bacteria Biotopes (semantic annotation by an ontology)

Contact