Tasks‎ > ‎Bacteria Biotopes (BB)‎ > ‎

BB Test Results

Task 1

Main evaluation results


Participant S I D M P SER Recall Precision F1
LIPN  98.92 136 100
308.08
507
0.661 0.61 0.61 0.61
Boun 112.70 141
89 305.30
520 0.676 0.60 0.59 0.60
LIMSI
 187.66 12
144
175.34
283
0.678 0.35 0.62 0.44
IRISA-TexMex
95.38
331
46
365.62
767
0.932 0.72 0.48 0.57

Legend

  • S: substitutions; see Evaluation algorithm below
  • D: deletions; there is no predicted habitat corresponding to the reference habitat (false negative)
  • I: insertion; there is no reference habitat corresponding to the predicted habitat (false positive)
  • M: matches; see Evaluation algorithm below
  • P: predicted; number of predicted habitats
  • SER = (S + D + I) / N, where N is the number of habitats in the reference
  • Recall = M / N
  • Precision = M / P
  • F1: harmonic mean of Precision and Recall

Evaluation algorithm

The evaluation performs a pairing between each reference habitat to a predicted habitat. The pairing maximizes a score defined as:

J . W
  • J is the Jaccard index between the reference and predicted entity as defined in [Bossy et al, 2012]. J measures the boundaries accuracy of the predicted entity.
  • W is the semantic similarity between ontolgy concepts attributed to the reference entity and to the predicted entity. We use the semantic similarity described in [Wang et al, 2006]. This similarity is exclusively based on the is-a relationships between concepts, we set the wis-a parameter to 0.65 in order to penalize favor ancestor/descendent predictions rather than sibling predictions.
Habitat entities in the reference that have no corresponding entity in the prediction are Deletions (D column).
Habitat entities in the prediction that have no corresponding entity in the reference are Insertions (I column).
The sum of the scores for all successful pairings is the Matches (M column). The difference between the number of pairings and the Matches is the Substitutions (S column).

Entity boundaries evaluation

Participant S M SER Recall Precision F1
LIMSI 80.91 282.09 0.47 0.56 1.00
0.71
Boun 82.71 335.29
0.62 0.66 0.64 0.65
LIPN 82.91 324.09 0.63 0.64 0.64 0.64
IRISA-TexMex 76.77 384.23 0.90 0.76 0.50 0.60

In this evaluation, the Matches are re-defined as the sum of the J component of the score for each pairing. In this way the scores measure the boundaries accuracy of predicted entities, without taking into account the semantic categorization.
Note however that the pairing still maximizes J.W. Therefore columns I, D and P remain unchanged.

Ontology categorization evaluation

Participant S M SER Recall Precision F1
LIPN  42.88 364.12
0.550 0.72 0.72 0.72
Boun  50.95 367.05 0.554 0.72 0.71 0.71
LIMSI  167.13 195.87
0.637 0.39 0.69 0.50
IRISA-TexMex
35.68
425.32
0.814 0.84 0.55 0.67

In this evaluation, the Matches are re-defined as the sum of the W component of the score for each pairing. In this way the scores measure the semantic categorization accuracy of predicted entities, without taking into account the entities boundaries.
Note however that the pairing still maximizes J.W. Therefore columns I, D and P remain unchanged.

In the following evaluations, the semantic weight attributed to the is-a relation has been altered:

w = 1

Participant S M SER Recall Precision F1
Boun  38.64 379.36 0.34 0.75 0.73 0.74
IRISA-TexMex  30.72 430.28 0.34 0.85 0.56 0.68
LIPN  33.19 373.81 0.36 0.74 0.74 0.56
LIMSI 142.05 220.95 0.57 0.44 0.78 0.56

With a weight of 1, the score approches a "Manhattan distance" between the reference category and the predicted category; it is nearly equivalent to step counting semantic distances. It is more forgiving if the prediction is "in the vicinity" of the references, even though it is not an ancestor or descedent. It is more severe for predictions that further from the reference.

w = 0.1

Participant S M SER Recall Precision F1
IRISA-TexMex 46.91 414.09 0.37 0.82 0.54 0.65
Boun 70.78 347.22 0.40 0.68 0.67 0.68
LIPN 57.35 349.65 0.41 0.69 0.69 0.69
LIMSI 187.80 175.20 0.66 0.35 0.62 0.44

With a weight of 0.1, the score favours predictions in the "lineage" of the reference, that is to say ancestors and descendants. It severly penalizes predictions of siblings. However, since the ontology root is the ancestor of all possible concepts, this score does not penalize predictions that are too general.

w = 0.8

Participant S M SER Recall Precision F1
IRISA-TexMex  36.07 424.93 0.35 0.84 0.55 0.67
Boun  44.90 373.10 0.35 0.74 0.72 0.73
LIPN  38.77 368.23 0.37 0.73 0.73 0.73
LIMSI 156.76 206.24 0.60 0.41 0.73 0.52

0.8 is the value recommended by the authors of the semantic distance. It is shown for reference and bears no particular interest for the task.

Task 2

Main evaluation results

Participant Recall Precision F1
TEES-2.1 0.28 0.82 0.42
IRISA-TexMex 0.36 0.46 0.40
Boun 0.21 0.38 0.27
LIMSI 0.04 0.19 0.06

Localization

Participant Recall Precision F1
TEES-2.1 0.35 0.82 0.49
IRISA-TexMex 0.44 0.46 0.45
Boun 0.23 0.38 0.29
LIMSI 0.04 0.29 0.07

This evaluation takes into account only Localization relations.

PartOf

Participant Recall Precision F1
Boun 0.15 0.40 0.22
LIMSI 0.02 0.03 0.02
TEES-2.1 0.01 1.00 0.02
IRISA-TexMex 0.00 0.00 0.00

This evaluation takes into account only PartOf relations.

Task 3

Main evaluation results

Participant Recall Precision F1
TEES-2.1 0.12 0.18 0.14
LIMSI 0.04 0.12 0.06

Evaluation algorithm

The evaluation performs a pairing between each relation in the reference with a predicted relation. The pairing maximizes the following score for relations of type Localization:
B . J
  • B is the Bacterium boundaries match. It is equal to 1 if the Bacterium arguments of the reference and the prediction have the exact same boundaries, otherwise 0.
  • J is the Localization boundaries match. It is the Jaccard index between the Localization arguments of the reference and the prediction.
For relations of type PartOf, the score is 1 if the Host arguments overlaps and if the Part arguments overlap, otherwise 0. Boundaries are not taken into account in PartOf relations in order to not penalize boundaries mismatches twice; boundaries are already factored in the score of Localization relations.

In the case of equivalence between entities, the pairing uses the equivalent entity that maximizes the score. If several equivalent (and redundant) relations are found then the one that has the highest score is used.

The Recall is the sum of the scores of reference to prediction pairing divided by the number of relations in the reference.
The Precision is the sum of the scores of prediction to reference pairing divided by the number of relations in the prediction.
The F1 is the harmonic mean of Recall and Precision.

Alternate evaluations

Description of parameters

  • Localization only: PartOf relations have been removed from the pairing, the evaluation only measures the accuracy of Localization relations.
  • PartOf only: Localization relations have been removed from the pairing, the evaluation only measures the accuracy of PartOf relations.
  • No boundaries: the Recall and the Precision are computed from the number of pairings, as if J has been removed from the score formula. Note however that the pairing still maximizes "B . J"; only the scores are altered.
  • Relaxed bacteria: B has been redifined as: 1 if the Bacterium arguments overlap, otherwise 0. Both scores and pairing are affected. The pairing maximizes the overlap between Bacterium arguments.

No boundaries

Participant Recall Precision F1
TEES-2.1 0.14 0.21 0.17
LIMSI 0.04 0.12 0.06

Relaxed bactreria

Participant Recall Precision F1
TEES-2.1 0.28 0.52 0.36
LIMSI 0.07 0.71 0.10

No boundaries, relaxed bacteria

Participant Recall Precision F1
TEES-2.1 0.37 0.64 0.47
LIMSI 0.07 0.72 0.12

PartOf only

Participant Recall Precision F1
TEES-2.1 0.27 0.77 0.40
LIMSI 0.03 0.17 0.05

Localization only

Participant Recall Precision F1
TEES-2.1 0.05 0.07 0.06
LIMSI 0.04 0.12 0.06

Localization only, no boundaries

Participant Recall Precision F1
TEES-2.1 0.08 0.10 0.09
LIMSI 0.04 0.12 0.06

Localization only, relaxed bacteria

Participant Recall Precision F1
TEES-2.1 0.29 0.47 0.35
LIMSI 0.08 0.81 0.15

Localization, no boundaries, relaxed bacteria

Participant Recall Precision F1
TEES-2.1 0.41 0.61 0.49
LIMSI 0.09 0.82 0.15


Comments