Annotating new datasets for machine learning tasks is tedious, time-consuming, and costly. For segmentation applications, the burden is particularly high as manual delineations of relevant image content are often extremely expensive or can only be done by experts with domain-specific knowledge. Thanks to developments in transfer learning and training with weak supervision, segmentation models can now also greatly benefit from annotations of different kinds. However, for any new domain application looking to use weak supervision, the dataset builder still needs to define a strategy to distribute full segmentation and other weak annotations. Doing so is challenging, however, as it is a priori unknown how to distribute an annotation budget for a given new dataset.
We propose a novel approach to determine annotation strategies for segmentation datasets, whereby estimating what proportion of segmentation and classification annotations should be collected given a fixed budget. To do so, our method sequentially determines proportions of segmentation and classification annotations to collect for budget-fractions by modeling the expected improvement of the final segmentation model. We show in our experiments that our approach yields annotations that perform very close to the optimal for a number of different annotation budgets and datasets.
We compared our method in four datasets against different fixed strategies and an estimated best fixed strategy with $\alpha_s = 12$. We can make the following observations:
We assume that segmentation performance grows logarithmically with data, and we impose that in the mean prior of the Gaussian Process: \[ \mu(C, S) = \gamma_c\log(\beta_cC+1) + \gamma_s\log(\beta_sS+1) \]
We see that this assumption holds in general for the four datasets but SUIM, hence the drop in performance for high budgets on this dataset.
Some datasets require different expertise and domain knowledge to annotate, and that is translated in different $\alpha_s$ in our model. We see that our method is robust regardless of the value of $\alpha_s$.
@inproceedings{tejero2023full,
title={Full or Weak annotations? An adaptive strategy for budget-constrained annotation campaigns},
author={Tejero, Javier Gamazo and Zinkernagel, Martin S and Wolf, Sebastian and Sznitman, Raphael and Neila, Pablo M{\'a}rquez},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2023}
}