DownloadsTest evaluation results The BioNLP-ST 2013 is completed. Thanks to all participating teams.The final results are given in the table below. Detailed results are available in the GRN Test Results page.
Goals in IEAssess the performance of information extraction systems to extract a genic regulation network.Motivation in biologyThe gene regulation network task aims at evaluating the quality of the extraction of gene interaction by IE systems with respect to the goals in biology. The automatic design of gene regulation network is one of the main challenges in Biology, because it is a crucial step forward in understanding the cellular regulation system.The goal is to retrieve all the genic interactions of the reference network –at least one occurrence per interaction– independently of where they are mentioned in the literature. Compared to the BI task of BioNLPST’11, the evaluation will measure the capability of the IE systems to reconstruct the reference regulatory network. The corpus is an extension of the BioNLP-ST 2011 BI corpus derived from the LLL challenge corpus. For GRN, it has been expanded in order to cover more extensively the regulation network of a specific cellular function in Bacillus subtilis: the sporulation. This phenomenon is an adaptation of the bacteria to scarce resource conditions (e.g. low nutrients), it has been thoroughly studied in the past and the regulation network is stable and suffers no controversy. The annotation was revised and enriched by a joint effort of the Bibliome team of MIG Laboratory at the Institut National de Recherche Agronomique (INRA) and the Laboratoire d'Informatique de Paris Nord at the Université Paris 13. The annotation has been carried and validated by a senior bioinformatics/Bacillus subtilis specialist, and by a bioinformatics/NLP engineer by using AlvisAE, the Annotation Editor. The annotation guidelines will be available in English. The following picture is the regulation network corresponding to the training data (click for larger image). Representation and Task SettingThe GRN task is a relation extraction task that follows the BioNLP-ST 2013 frame of representation. The participants are provided a manually curated annotation of the training corpus including entities, events and relations, including genic interactions. For training, the participants are provided the genic regulation network that can be reconstructed with interactions mentioned in sentences of the training corpus. The network is a directed graph where vertexes represent genes, and arcs represent interactions between genes extracted from the text. The arcs are labeled with an interaction type following two distinct axes: Effect axis:
Mechanism axis:
When no mechanism or effect can be inferred, then the arc is labelled Regulation. Text-bound entity typesAll text-bound entities are given as input in train and test phases, except for event triggers (Action) that are only given in the train phase. For genic entities, only those belonging to Bacillus subtilis are annotated; genes and proteins of other organisms are not annotated.
Event and relation typesAll event and relations are given in the train phase. In the test phase they are not given, however participants are only evaluated on the prediction of Interaction.* relations. The other events and relations are provided as a guidance during the training of the systems. The following types of event and arguments are given, along with the valid types for each argument.
Gene identifiersIn the annotated corpus, genic entities that can potentially interact are assigned a Gene Identifier. This identifier is the name of the gene, operon or family denoted by the entity. For instance, the Gene Identifier for Protein entities is the name of the gene that encodes for the annotated protein. The provided Gene Identifier saves the participants the inconvenience of searching through nomenclatures of B. subtilis genes. Inference of the regulation networkThe genic regulation network corresponding to a corpus is inferred from the set of Interaction relations (manually annotated or predicted). The inferrence is done in two steps: resolution of Interaction relations, and Removal of redundant arcs. The training data is distributed with a script that automatically performs these two steps. Step 1: Resolution of Interaction relationsThe Agent and the Target of an Interaction relation are not necessarily an entity with a Gene Identifier. They can be secondary events or relations (Action_Target, Transcription_by, or even another Interaction), or auxiliary entities (Promoter). The resolution of an Interaction aims to look for the entity with a Gene Identifier in order to infer the node concerned by the Interaction relation. The resolution of Interaction arguments is performed with the following rules:
These rules are applied iteratively. In other words the resolution of Interaction arguments is a traversal of the graph of annotations; event and relation arguments are walked through, and Promoter entities are walked through according to rules 4 and 5. If the resolution of the Agent or the Target yields more than one node, then the Interaction resolves to as many arcs as the cartesian product of resolved nodes. For instance, if both the Agent and the Target resolve to two nodes, the the Interaction relation resolves into four arcs. Step 2: Removal of redundant arcsIn this step, arcs with the same Agent, Target and type are simplified into a single arc. This means that if the same interaction is annotated several times in the corpus, then it will resolve into a single arc. In terms of prediction, this also means that predicting only one of the interactions in the corpus is enough to reconstruct the arc.Moreover Interaction types are ordered according to the following hierarchy:
For a given arc, if there is another arc for the same node pair with a more specialized type, then it is removed. For instance, the arcs (A, Regulation, B) and (A, Transcription, B) are simplified into (A, Transcription, B). Indeed the former arc conveys no additional information in comparison with the latter. Submission and EvaluationParticipants can submit predictions in two ways:
The predicted network is compared to the reference network using a Slot Error Rate [Makhoul et al, 1999]:
Since this measure is an error rate, the lower is the better: a SER of zero means a perfect prediction. The SER has no upper bound but a value below 1 is expected for decent predictions. The participants are provided a script that performs Interaction resolution and evaluation against a reference. Illustrative examples in BioNLP formatIllustrative examples can be downloaded here: Sample Data Contact
|
Tasks >