Transformer Utilization in Medical Image Segmentation Networks

Journal article

Saikat Roy, Gregor Koehler, M. Baumgartner, Constantin Ulrich, Jens Petersen, Fabian Isensee, K. Maier-Hein
arXiv.org, 2023

Semantic Scholar ArXiv DBLP DOI

Cite

APA Click to copy
Roy, S., Koehler, G., Baumgartner, M., Ulrich, C., Petersen, J., Isensee, F., & Maier-Hein, K. (2023). Transformer Utilization in Medical Image Segmentation Networks. ArXiv.org.

Chicago/Turabian Click to copy
Roy, Saikat, Gregor Koehler, M. Baumgartner, Constantin Ulrich, Jens Petersen, Fabian Isensee, and K. Maier-Hein. “Transformer Utilization in Medical Image Segmentation Networks.” arXiv.org (2023).

MLA Click to copy
Roy, Saikat, et al. “Transformer Utilization in Medical Image Segmentation Networks.” ArXiv.org, 2023.

BibTeX Click to copy

@article{saikat2023a,
  title = {Transformer Utilization in Medical Image Segmentation Networks},
  year = {2023},
  journal = {arXiv.org},
  author = {Roy, Saikat and Koehler, Gregor and Baumgartner, M. and Ulrich, Constantin and Petersen, Jens and Isensee, Fabian and Maier-Hein, K.}
}

Abstract

Owing to success in the data-rich domain of natural images, Transformers have recently become popular in medical image segmentation. However, the pairing of Transformers with convolutional blocks in varying architectural permutations leaves their relative effectiveness to open interpretation. We introduce Transformer Ablations that replace the Transformer blocks with plain linear operators to quantify this effectiveness. With experiments on 8 models on 2 medical image segmentation tasks, we explore -- 1) the replaceable nature of Transformer-learnt representations, 2) Transformer capacity alone cannot prevent representational replaceability and works in tandem with effective design, 3) The mere existence of explicit feature hierarchies in transformer blocks is more beneficial than accompanying self-attention modules, 4) Major spatial downsampling before Transformer modules should be used with caution.