Code of "Few-shot microscopy image cell segmentation " https://arxiv.org/abs/2007.01671
Automatic cell segmentation in microscopy images works well with the support of deep neural networks trained with full supervision. Collecting and annotating images, though, is not a sustainable solution for every new microscopy database and cell type. Instead, we assume that we can access a plethora of annotated image data sets from different domains (sources) and a limited number of annotated image data sets from the domain of interest (target), where each domain denotes not only different image appearance but also a different type of cell segmentation problem. We pose this problem as meta-learning where the goal is to learn a generic and adaptable few-shot learning model from the available source domain data sets and cell segmentation tasks. The model can be afterwards fine-tuned on the few annotated images of the target domain that contains different image appearance and different cell type. In our meta-learning training, we propose the combination of three objective functions to segment the cells, move the segmentation results away from the classification boundary using cross-domain tasks, and learn an invariant representation between tasks of the source domains. Our experiments on five public databases show promising results from 1- to 10-shot meta-learning using standard segmentation neural network architectures.READ FULL TEXT VIEW PDF
Increasing data set sizes of digital microscopy imaging experiments dema...
Leveraging datasets available to learn a model with high generalization
While many deep learning methods have seen significant success in tackli...
We propose a Paired Few-shot GAN (PFS-GAN) model for learning generators...
Recent microscopy imaging techniques allow to precisely analyze cell
The representations of the Earth's surface vary from one geographic regi...
Archiving large sets of medical or cell images in digital libraries may
Code of "Few-shot microscopy image cell segmentation " https://arxiv.org/abs/2007.01671
Microscopy image analysis involves many procedures including cell counting, detection and segmentation . Cell segmentation is particularly important for studying the cell morphology to identify the shape, structure, and cell size. Manually segmenting cells from microscopy images is a time-consuming and costly process. For that reason, many methods have been developed to automate the process of cell segmentation, as well as counting and detection.
. Current approaches in cell segmentation deliver promising results based on encoder - decoder network architectures trained with supervised learning. However, collecting and annotating large amounts of images is practically inviable for every new microscopy problem. Furthermore, new problems may contain different types of cells, where a segmentation model, pre-trained on different data sets, may not deliver good performance.
To address this limitation, methods based on domain generalization [12, 19], domain adaptation  and few-shot learning  have been developed. In these approaches, it is generally assumed that we have access to a collection of annotated images from different domains (source domains), no access to the target domain (domain generalisation), access to a large data set of un-annotated images (domain adaptation), or access to a limited number (less than 10) of annotated images from the target domain (few-shot learning). Domain generalisation and adaptation generally involve the same tasks in the source and target domains, such as the same cell type segmentation problem with images coming from different sites and having different appearances. However, our setup is different because we consider that the source and target domains consist of different types of cell segmentation problems. Each domain consists of different cell types such as mitochondria and nuclei. In this way, we form a typical real-life scenario where a variety of microscopy images, containing different cell structures, are leveraged from various resources to learn a cell segmentation model. Therefore, the challenge in our setup is to cope with different image and cell appearances, as well as different types of cell for each domain, as illustrated in Fig. 1. In such setup, we argue that few-shot learning is more appropriate, where we aim to learn a generic and adaptable few-shot learning model from the available source domain data sets. This model can be afterwards fine-tuned on the few annotated images of the new target domain. Such problem can be formulated as an optimization-based meta-learning approach [11, 14, 27].
In this paper, we present a new few-shot meta-learning cell segmentation model. For meta-training the model, we propose the combination of three loss functions to 1. impose pixel-level segmentation supervision, 2. move the segmentation predictions away from the classification boundary using cross-domain tasks and 3. learn an invariant representation between tasks. In our evaluations on five microscopy data sets, we demonstrate promising results compared to the related work on settings from 1- to 10-shot learning by employing standard network architectures[30, 36]. To the best of our knowledge, this is the first work to introduce few-shot task generalisation using meta-learning and to apply few-shot learning to microscopy image cell segmentation.
Cell segmentation is a well-established problem  that is often combined with cell counting [1, 10] and classification . We discuss the prior work that is relevant to our approach, as well as the related work on few-shot learning.
Automatic cell detection and counting in microscopy images has been earlier studied with the support of image processing and computer vision techniques[13, 21, 34]
. In past few years, advances in deep neural networks have changed the type of approaches developed for microscopy images. In particular, fully convolutional neural networks (FCNs) allow to make predictions on the same or similar spatial resolution of the input image. FCN approaches have been widely adapted in medical imaging with applications to nuclei segmentation , brain tumor segmentation from magnetic resonance imaging (MRI)  and of course segmentation in microscopy images . For instance, histology image segmentation relies on FCNs  to perform mitosis detection. Among the FCN models, U-Net 
is a very popular architecture. It has been developed for segmenting neuronal structures in electron microscopy images, but it is presently used for any kind of medical imaging that demands spatial predictions. Similarly, the fully convolutional regression network (FCRN) is another encoder - decoder architecture for cell segmentation and counting in microscopy images. In the evaluation, we consider both U-Net and FCRN architectures.
The main difference between our approach and existing cell segmentation approaches lies in the training algorithm. While the aforementioned approaches deliver promising results on the examined data sets, they all require a large amount of annotated data and a fully supervised training process. In this work, we address the problem as a more realistic few-shot learning problem, where we have access to relatively large data sets of different domain and of several types of cell segmentation problems, but the data set for the target segmentation domain is limited with just a handful of training samples. We present an approach to reach promising segmentation performance regardless of the small target segmentation data set.
Few-shot learning deals with making use of existing knowledge, in terms of data and annotation, to build a generic model that can be adapted to different (but related) target problems with limited training data. Although there are classic approaches from the past [23, 29], deep neural networks and meta-learning have significantly contributed to improve the state of the art of few-shot learning. In meta-learning, the goal is to train a model to learn from a number of tasks using a limited number of training samples per task 
. Meta-learning, in general, consists of a meta-training phase where multiple tasks adapt a base learner to work with different problems (where each task uses a small training set), then the classifiers of those multiple tasks are pooled together to update the base learner. After the meta-training process converges, the model of the base learner is fine-tuned to a limited number of annotated data sampled from the target problem. Meta-learning approaches can be categorized in metric-learning[32, 35], model- [8, 24] and optimization-based learning [14, 27, 28]. The optimization meta-learning approaches function with gradient-based learning, which is less complex to implement and computationally efficient. For that reason, we rely on Reptile , an optimization-based approach, to develop our algorithm.
In this section, we define the few-shot segmentation problem. Afterwards, we pose cell segmentation from microscopy images as few-shot meta-learning approach.
Let be a collection of microscopy cell data sets. Each data set is defined as , where is a pair of the microscopy cell image ( denotes the image lattice with dimensions ) and the respective ground-truth segmentation mask ; and is the number of samples in . Note that each data set corresponds to a different task and domain, each representing a new type of image and segmentation task. All data sets in compose the source data sets. We further assume another data set, which we refer to as target, defined by , where the number of training images with ground-truth segmentation masks is limited, e.g. between 1 and 10 training images and segmentation mask pairs. Also, we assume that the target data set comes from a different task and domain.
Our goal is to perform cell segmentation in the images belonging to the target data set through a segmentation model , where denotes the model parameter (i.e. the weights of the deep neural network). However, the limited number of annotated training images prohibits us from learning a typical fully-supervised data-driven model. This is a common problem in real-life when working with cell segmentation problems, where annotating new data sets, i.e. the target data sets, does not represent a sustainable solution. To address this limitation, we propose to learn a generic and adaptable model from the source data sets in , which is then fine-tuned on the limited annotated images of the target data set .
We propose to learn a generic and adaptable few-shot learning model with gradient-based meta-learning [14, 27]. The learning process is characterized by a sequence of episodes, where an episode consists of the meta-training and meta-update steps. In meta-training, the model parameter (this is known as the meta-parameter) initialises the segmentation model, with in (defined in Sec. 3.1) for each task , where each task is modeled with a training set . Next, the model meta-parameter is meta-updated from the learned task parameters .
The segmentation model that uses the meta parameter , defined as , is denoted by the base learner. To train this base learner, we propose three objective functions that account for segmentation supervision, and two regularisation terms. In our implementation, we rely on the Reptile algorithm  to develop our approach because of its simplicity and efficiency. Our approach is described in Algorithm 1. Next, we present each part of the approach.
During meta-training, i.e. lines 5 to 12 in Algorithm 1, tasks are generated by sampling images from each source domain in . Hence, a task is represented by a subset of images and segmentation maps from . In our experiments, we work from -shot learning problems. After sampling a task, the baser learner is trained with the three objective functions. The first objective function consists of the standard binary cross entropy loss that uses the images and segmentation masks of the sampled task. The second loss function is based on the entropy regularization  that moves the segmentation results away from the classification boundary using a task without the segmentation maps from task – such regularisation makes the segmentation results for task more confident, i.e. more binary like. The third objective loss function consists of extracting an invariant representation between tasks  by enforcing the learning of a common feature representation across different tasks with the knowledge distillation loss . The use of the entropy regularisation and knowledge distillation losses in few-shot meta-learning represent the main technical contribution of our approach. The learned task parameters are optimized using the three objective functions above during meta-training, as follows:
where the loss is a combination of the three objectives that depend on -shot source domain training sets (with , and ) and . In Sec. 3.4
, we present in detail the objective functions. Finally, the parameters of the base learner are learned with stochastic gradient descent and back-propagation.
At last, the meta-update step takes place after iterating over tasks during meta-training. The meta-update, i.e. line 11 in Algorithm 1, updates the model parameter using the following rule:
where is the step-size for the parameter update. The episodic training takes place based on the meta-training and meta-update steps until convergence.
We design three objective functions for the meta-training stage to update the parameters of the base learner.
Every sampled task includes pairs of input image and segmentation mask. We rely on the pixel-wise binary cross entropy (BCE) as the main objective to learn predicting the ground-truth mask. In addition, we weight the pixel contribution since we often observe unbalanced pixel ratio between the foreground and background pixels. Given the foreground probability predictionof the input image and the segmentation mask that belong to the -shot training set for task , the pixel-wise BCE loss is given by:
where denotes to the spatial pixel position in the image lattice and is the weighting factor of the the foreground class probability which equals to the ratio of background to foreground classes in . This is the standard loss function for segmentation-related problems, which we also employ for our binary problem .
The BCE loss can easily result in over-fitting the base-learner to the task . We propose to use Shannon’s entropy regularization on a different task loss to prevent this behavior. More specifically, while minimizing the BCE loss on a sampled task of one source domain, e.g. task , we sample a second task from a different domain, e.g. task ; and seek to minimize Shannon’s entropy loss for task without using that segmentation masks of that task. As a result, while minimizing the BCE loss for task , we are also aiming to make confident predictions for task by minimizing Shannon’s entropy. The regularization is defined as:
where Shannon’s entropy is for the foreground pixel probability , and
. Our motivation for the entropy regularizer originates from the field of semi-supervised learning. The same loss has been recently applied to few-shot classification .
Although we deal with different source domains and cell segmentation problems, we can argue that microscopy images of cells must have common features in terms of cell morphology (shape and texture) regardless of the cell type. By learning from as many source data sets as possible, we aim to have a common representation that addresses the target data set to some extent. We explore this idea during meta-training by constraining the base learner to learn a common representation between different source data sets. To that end, two tasks from two source data sets (, ) are sampled in every meta-training iteration. Then, the Euclidean distance between the representations of two images (one from each task) from the layer of the segmentation model is minimised. This idea is common in knowledge distillation  and network compression  where the student network learns to predict the same representation as the teacher network. Recently, the idea of distillation between tasks has been also used for few-shot learning . We sample an image from data set and from the data set . Then, we define the distillation loss as:
where and correspond to the -th layer activation maps of the base learner for images and , respectively. Furthermore, the layer feature of the base learner representation is the latent code of an encoder - decoder architecture [30, 36].
After the meta-learning process is finished, the segmentation model is fine-tuned using a -shot subset of the target data set, denoted by , with (see line 16 in Algorithm 1). This fine-tuning process is achieved with the following optimisation:
Here, we only need the binary cross entropy loss for fine-tuning. We also rely on Adam optimization and back-propagation for fine-tuning, though we use different hyper-parameters as we later report in the implementation details (Sec. 4.1). At the end, our model with the updated parameters is evaluated on the target test set (see line 17 in Algorithm 1).
We evaluate our approach on five microscopy image data sets using two standard encoder - decoder network architectures for image segmentation [30, 36]. All evaluations take place for 1-, 3-, 5-, 7- and 10-shot learning, where we also compare with transfer-learning. In transfer learning the model is trained on all available source data sets and then fine-tuned on the few shots available from the target data set. This work is the first to explore few-shot microscopy image cell segmentation, so we propose the assessment protocol and data sets.
We implement two encoder - decoder network architectures. The first architecture is the the fully convolutional regression network (FCRN) from 
. We moderately modify the decoder by using transposed convolutions instead of bi-linear up-sampling as we observed better performance. We also rely on sigmoid activation functions for making binary predictions, instead of the heat-map predictions because of the better performance too. The second architecture is the well-established U-Net. We empirically adapted the original architecture to a lightweight variant, where the number of layers are reduced from 23 to 12. We trained them in meta-training with Adam optimizer with learning rate 0.001 and weight decay of 0.0005. In meta-learning, we set from Eq. (2
) to 1.0. Both networks contain batch-normalization. However, batch-normalization is known to face challenges in gradient-based meta-learning due to the task-based learning
. The problem is on learning global scale and bias parameters of batch-normalization based on the tasks. For that reason, we only make use of the mean and variance during meta-training. On fine-tuning, we observed that the scale and bias parameters can be easily learned for the FCRN architecture. For U-Net, we do not rely on the scale and bias parameters for meta-learning. To allow a fair comparison with transfer-learning, we follow the same learning protocol of batch normalization parameters. During fine-tuning, we rely on the Adam optimizer with 20 epochs, learning rate 0.0001, and weight decay 0.0005. The same parameters are used for transfer learning. Our implementation and evaluation protocol is publicly available111https://github.com/Yussef93/FewShotCellSegmentation.
We selected five data sets of different cell domains, i.e. . First, we rely on the Broad Bioimage Benchmark Collection (BBBC), which is a collection of various microscopy cell image data sets . We use BBBC005 and BBBC039 from BBBC, which we refer to as B5 and B39. B5 contains 1200 fluorescent synthetic stain cells, while B39 has 200 fluorescent nuclei cells. Second, the Serial Section Transmission Electron Microscopy (ssTEM)  database has 165 images of mitochondria cells in neural tissues. Next, the Electron Microscopy (EM) data set 
contains 165 electron microscopy images of mitochondria and synapses cells. Finally, the Triple Negative Breast Cancer (TNBC) database has 50 histology images of breast biopsy. We summarize the cell type, image resolution and number of images for data sets in Table 1. Moreover, we crop the training images to pixels, while during testing, the images are used in full resolution.
We rely on the mean intersection over union (IoU) as the evaluation metric for all experiments. This is a standard performance measure in image segmentation[11, 20]. We conduct a leave-one-dataset-out cross-validation , which is a common domain generalization experimental setup. This means that we use the four microscopy image data sets for meta-training and the remaining unseen data set for fine-tuning. From the remaining data set, we randomly select samples for fine-tuning. Since the selection of the -shot images can vary significantly the final result, we repeat the random selection of the
-shot samples ten times and report the mean and standard deviation over these ten experiments. The same evaluation is performed for 1-,3-,5-,7- and 10-shot learning. Similarly, we evaluate the transfer learning approach as well.
We examine the impact of each objective, as presented in Sec. 3.4, to the final segmentation result. In particular, we first meta-train our model only with the binary cross entropy from Eq. 3. Second, we meta-train the binary cross entropy jointly with entropy regularization from Eq. 4. Next, we meta-train the binary cross entropy and distillation loss from Eq. 5. At last, we meta-train with all loss functions together, where we weight with and with . We also noticed that the values of these two hyper-parameters depend on the complete loss function. As a result, we empirically set when the complete loss is composed of and ; and keep when the complete loss is and . We later analyze the sensitivity of the hyper-parameter in Sec. 4.4. Overall, the values have been found with grid search. A visual comparison of our objective is shown in Fig. 2.
We present the results of our approach with different combinations of the objective function as described in Sec. 4.3, as well as the complete model as presented in Algorithm 1. The quantitative evaluation is summarized in Fig. 3 and Fig. 4. In Fig. 3 we present the mean intersection over union (IoU) over all data sets while in Fig. 4 we present the mean IoU and the standard deviation over our ten random selections of the -shot samples for each data set individually. We also provide a visual comparison with transfer learning in Fig. 1.
At first, relying only on the binary cross entropy loss for meta-training represents the baseline result for our approach. Adding the entropy regularization has a positive impact on some -shot learning experiments. This can be seen in Fig. 4, which depicts meta-training with and meta-training with . Similarly, the use of the distillation loss together with the binary cross entropy, i.e. in Fig. 4, generally improves the mean IoU. The complete loss function gives the best results on average. This is easier to be noticed in Fig. 3(a) and Fig. 3(b) where we compare the different objective combinations; and the results show that the complete loss produces the best results for most shot problems. In addition, it is clear (Fig. 3) that our contributions, the and objectives, have a positive impact on the outcome when more -shot samples are added.
Comparing the FCRN with U-Net architectures in Fig. 3(a) and Fig. 3(b), we observe that U-Net delivers slightly better results for the complete loss combination. Nevertheless, the overall behavior is similar for both architectures. The standard binary cross entropy without our regularization and distillation objectives is already more accurate than transfer learning. Fig. 4 also shows that the transfer learning performance can be comparable to ours (complete model) for 10-shot learning in some cases, e.g. Fig. 4(g) and Fig. 4(h) for FCRN and U-Net. On 10-shot learning, transfer-learning benefits from the larger number of training data, resulting in performance closer to meta-learning. Other than the better performance, we also make two consistent observations when comparing to transfer learning. First, we argue that the main reason for the better performance is the parameter update rule. Transfer-learning averages gradients over shuffled samples, while meta-learning relies on task-based learning and then on averaging over the tasks. Second, meta-learning shows faster convergence. In particular, meta-learning converges 40x faster for FCRN and around 60x faster for U-Net. This is an important advantage when working with multiple training sets. The disadvantage of meta-learning is the selection of the hyper-parameters. Our experience in the selection of the optimizer, including its hyper-parameters, as well as the selection of and demonstrates that we need to be careful in this process. However, the optimizer and hyper-parameters are consistent for all tasks of cell segmentation. On the other hand, transfer learning involves only the selection of the optimizer.
We can conclude the following points from the experiments. Meta-learning yields an overall stronger performance than transfer-learning as we average across our data sets, in addition, the new loss functions boosts the meta-learning performance in many cases, as shown in Figures 3(a) and 3(b). Both approaches, have low standard deviation when we add more shots in the fine-tuning step, as shown in Fig. 4. The same overall behavior is observed regardless the network architecture. Moreover, both architectures exhibit similar mean IoU. Besides, we notice that our regularization loss functions are not necessary for results around 90% or more. In this case, only the additional shots make an impact. Finally, we notice the difference in performance across the data sets. For example, the mean IoU in TNBC target domain at 10-shot learning is within 50% range while in EM we are able to reach around 70% range. The reason could be that the source domains are more correlated with the EM target domain in terms of features than TNBC. Overall, our approach produces promising results when having at least 5-shots, i.e. annotated images, from the target domain for fine-tuning. This is a reasonably low amount of required to annotation to reach good generalization.
We have presented a few-shot meta-learning approach for microscopy image cell segmentation. The experiments show that our approach enables the learning of a generic and adaptable few-shot learning model from the available annotated images and cell segmentation problems of the source data sets. We fine-tune this model on the target data set using a limited annotated images. In the context of meta-training, we proposed the combination of three objectives to segment and regularize the predictions based on cross-domain tasks. Our experiments on five microscopy data sets show promising results from 1- to 10-shot meta-learning. As future work, we will include more data sets in our study to explore their correlation and impact on the target data set. Also, we plan to pose our problem in the context of domain generalization where the target data sets lacks of annotation.
This work was partially funded by Deutsche Forschungsgemeinschaft (DFG), Research Training Group GRK 2203: Micro- and nano-scale sensor technologies for the lung (PULMOSENS), and the Australian Research Council through grant FT190100525. G.C. acknowledges the support by the Alexander von Humboldt-Stiftung for the renewed research stay sponsorship.
de Brebisson, A., Montana, G.: Deep neural networks for anatomical brain segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. pp. 20–28 (2015)
Dijkstra, K., van de Loosdrecht, J., Schomaker, L., Wiering, M.A.: Centroidnet: A deep neural network for joint object localization and counting. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. pp. 585–601. Springer (2018)
Li, D., Yang, Y., Song, Y.Z., Hospedales, T.M.: Learning to generalize: Meta-learning for domain generalization. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)