The ivtmetrics library provides a Python implementation of metrics for benchmarking surgical action triplet detection and recognition.
The following are available with ivtmetrics:
- Recognition Evaluation: Provides AP metrics to measure the performance of a model on action triplet recognition.
- Detection Evaluation: Supports Intersection over Union distances measure of the triplet localization with respect to the instruments.
- Flexible Analysis:
- Supports for switching between frame-wise to video-wise averaging of the AP.
- Supports disentangle prediction and obtained filtered performance for the various components of the triplets as well as their association performances at various levels.
To install ivtmetrics use pip
pip install ivtmetrics
conda install -c nwoye ivtmetrics
Python 3.5-3.9 and numpy and scikit-learn are required.
The metrics have been aligned with what is reported by CholecT50 benchmark. ivtmetrics can be imported in the following way:
import ivtmetrics
The metrics implement both recognition and detection evaluation. The metrics internally implement a disentangle function to help filter the triplet components as well as triplet different levels of association.
Recognition ivtmetrics can be used in the following ways:
metric = ivtmetrics.Recognition(num_class)
This takes an argument num_class
which is default to 100
The following function are possible with the Recognition
class:
Name | Description |
---|---|
update(targets, predictions ) |
takes in a (batch of) vector
predictions and their
corresponding groundtruth. vector
size must match num_class in
the class initialization. |
video_end() | Call to make the end of one video sequence. |
reset() | Reset current records. Useful during training and can be called at the begining of each epoch to avoid overlapping epoch performances. |
reset_global() | Reset all records. Useful for switching between training/validation/testing or can be called at the begining of new experiment. |
compute_AP
(component, ignore_null ) |
Obtain the average precision on
the fly. This gives the AP only
on examples cases after the last
reset() call. Useful for
epoch performance during
training. |
compute_video_AP
(component, ignore_null ) |
(RECOMMENDED) compute video-wise AP performance as used in CholecT50 benchmarks. |
compute_global_AP
(component, ignore_null ) |
Compute frame-wise AP performance for all seen samples. |
topK(k, component ) |
Obtain top K performance on
action triplet recognition for
all seen examples. args k can
be any int between 1-99. k =
[5,10,15,20] have been used in
benchmark papers. |
topClass(k, component ) |
Obtain top K recognized classes
on action triplet recognition for
all seen examples. args k can
be any int between 1-99. k = 10
have been used in benchmark
papers. |
- args
component
can be any of the following (‘i’, ‘v’, ‘t’, ‘iv’, ‘it’,‘ivt’) to compute performance for (instrument, verb, target, instrument-verb, instrument-target, instrument-verb-target) respectively. default is ‘ivt’ for triplets. - args
ignore_null
(optional, default=False): to ignore null triplet classes in the evaluation. This option is enabled in CholecTriplet2021 challenge. - the output is a
dict
with keys(“AP”, “mAP”) for per-class and mean AP respectively.
import ivtmetrics
recognize = ivtmetrics.Recognition(num_class=100)
network = MyModel(...) # your model here
# training
for epoch in number-of-epochs:
recognize.reset()
for images, labels in dataloader(...): # your data loader
predictions = network(image)
recognize.update(labels, predictions)
results_i = recognize.compute_AP('i')
print("instrument per class AP", results_i["AP"])
print("instrument mean AP", results_i["mAP"])
results_ivt = recognize.compute_AP('ivt')
print("triplet mean AP", results_ivt["mAP"])
# evaluation
recognize.reset_global()
for video in videos:
for images, labels in dataloader(video, ..): # your data loader
predictions = network(image)
recognize.update(labels, predictions)
recognize.video_end()
results_i = recognize.compute_video_AP('i')
print("instrument per class AP", results_i["AP"])
print("instrument mean AP", results_i["mAP"])
results_it = recognize.compute_video_AP('it')
print("instrument-target mean AP", results_it["mAP"])
results_ivt = recognize.compute_video_AP('ivt')
print("triplet mean AP", results_ivt["mAP"])
Any nan
value in results is for classes with no occurrence in the
data sample.
Detection ivtmetrics can be used in the following ways:
metric = ivtmetrics.Detection(num_class, num_tool, threshold=0.5)
This takes an argument num_class
which is default to 100
and
num_tool
which is default to 6
The following function are possible with the Detection
class:
Name | Description |
---|---|
update(
targets, predictions, format ) |
input: takes in a (batch of)
list/dict predictions and their
corresponding groundtruth. Each
frame prediction/groundtruth can
be either as a list of list
or list of dict . (more
details below). |
video_end() | Call to make the end of one video sequence. |
reset() | Reset current records. Useful during training and can be called at the begining of each epoch to avoid overlapping epoch performances. |
reset_global() | Reset all records. Useful for switching between training/validation/testing or can be called at the begining of new experiment. |
compute_AP(component ) |
Obtain the average precision on
the fly. This gives the AP only
on examples cases after the last
reset() call. Useful for
epoch performance during
training. |
compute_video_AP(component ) |
(RECOMMENDED) compute video-wise AP performance as used in CholecT50 benchmarks. |
compute_global_AP(component ) |
compute frame-wise AP performance for all seen samples. |
- list of list format: [[tripletID, toolID, toolProbs, x, y, w, h],
[tripletID, toolID, toolProbs, x, y, w, h], …], where:
tripletID
= triplet unique identitytoolID
= instrument unique identitytoolProbs
= instrument detection confidencex
= bounding box x1 coordiantey
= bounding box y1 coordinatew
= width of the boxh
= height of the box- The [x,y,w,h] are scaled between 0..1
- list of dict format: [{“triplet”:tripletID, “instrument”:[toolID, toolProbs, x, y, w, h]}, {“triplet”:tripletID, “instrument”:[toolID, toolProbs, x, y, w, h]}, …].
format
args describes the input format with either of the values (“list”, “dict”)component
can be any of the following (‘i’, ‘v’, ‘t’, ‘iv’, ‘it’,‘ivt’) to compute performance for (instrument, verb, target, instrument-verb, instrument-target, instrument-verb-target) respectively, default is ‘ivt’ for triplets.<
- the output is a
dict
with keys(“AP”, “mAP”, “Rec”, “mRec”, “Pre”, “mPre”) for per-class AP, mean AP, per-class Recall, mean Recall, per-class Precision and mean Precision respectively.
import ivtmetrics
detect = ivtmetrics.Detection(num_class=100)
network = MyModel(...) # your model here
# training
format = "list"
for epoch in number of epochs:
for images, labels in dataloader(...): # your data loader
predictions = network(image)
labels, predictions = formatYourLabels(labels, predictions)
detect.update(labels, predictions, format=format)
results_i = detect.compute_AP('i')
print("instrument per class AP", results_i["AP"])
print("instrument mean AP", results_i["mAP"])
results_ivt = detect.compute_AP('ivt')
print("triplet mean AP", results_ivt["mAP"])
detect.reset()
# evaluation
format = "dict"
for video in videos:
for images, labels in dataloader(video, ..): # your data loader
predictions = network(image)
labels, predictions = formatYourLabels(labels, predictions)
detect.update(labels, predictions, format=format)
detect.video_end()
results_ivt = detect.compute_video_AP('ivt')
print("triplet mean AP", results_ivt["mAP"])
print("triplet mean recall", results_ivt["mRec"])
print("triplet mean precision", results_ivt["mPre"])
Any nan
value in results is for classes with no occurrence in the
data sample.
Although, the Detection()
and Recognition()
classes uses the
Disentangle()
internally, this function can still be used
independently for component filtering in the following ways:
filter = ivtmetrics.Disentangle()
Afterwards, each of the component’s predictions/labels can be filtered from the main triplet’s predictions/labels as follows:
i_labels = filter.extract(inputs=ivt_labels, component="i")
v_preds = filter.extract(inputs=ivt_preds, component="v")
t_preds = filter.extract(inputs=ivt_preds, component="t")
iv_labels = filter.extract(inputs=ivt_labels, component="iv")
it_labels = filter.extract(inputs=ivt_labels, component="it")
This assesses the quality of bounding box - triplet ids association. using the following metrics:
- LM: localize and match: percentage of triplets localized at threshold (θ) and matched with correct triplet ids.
- PLM: partially localize and match: percentage of triplets matched with correct triplet ids but localization overlap is less than θ.
- IDS: identity switch: percentage of triplets localized at θ but with swapped ids within the frame.
- IDS: identity miss: percentage of triplets localized at θ but with incorrect ids (not swapped).
- MIL: missed localization: percentage of triplets matched with correct triplet ids with no matching localization bounding boxes.
- RFP: remaining false positive: remaining false alarms after all other scores has been considered.
- RFN: remaining false negative: remaining missed predictions after all other scores has been considered.
The TAS metrics are automatically computed within the Detection class of ivtmetrics. The results are accessed using the TAS metrics acronymns as keys, such as:
import ivtmetrics
detect = ivtmetrics.Detection(num_class=100)
"""
after a series of detect.update() call
"""
results_ivt = detect.compute_video_AP('ivt')
print("triplet matched and localized", results_ivt["ml"])
print("triplet identity switchd", results_ivt["ids"])
print("triplet missed localization", results_ivt["mil"])
coming soon ..
If you use this metrics in your project or research, please consider citing the associated publication:
- Nwoye, C.I. and. Padoy, N. (2022) Data Splits and Metrics for Benchmarking Methods on Surgical Action Triplet Datasets. arXiv PrePrint arXiv:2204.05235.
Bibtex:
@article{nwoye2022data, title={Data Splits and Metrics for Benchmarking Methods on Surgical Action Triplet Datasets}, author={Nwoye, Chinedu Innocent and Padoy, Nicolas}, journal={arXiv preprint arXiv:2204.05235}, year={2022} }
- Nwoye, C. I., Yu, T., Gonzalez, C., Seeliger, B., Mascagni, P., Mutter, D., … & Padoy, N. (2021). Rendezvous: Attention Mechanisms for the Recognition of Surgical Action Triplets in Endoscopic Videos. arXiv preprint arXiv:2109.03223.
- Nwoye, C. I., Gonzalez, C., Yu, T., Mascagni, P., Mutter, D., Marescaux, J., & Padoy, N. (2020, October). Recognition of instrument-tissue interactions in endoscopic videos via action triplets. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 364-374). Springer, Cham.
- http://camma.u-strasbg.fr/datasets
- https://cholectriplet2022.grand-challenge.org
- https://cholectriplet2021.grand-challenge.org
BSD 2-Clause License Copyright (c) 2022, Research Group CAMMA All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.```