Genetic Neural Architecture Search for automatic assessment of human sperm images

09/20/2019 ∙ by Erfan Miahi, et al. ∙ University of Guilan 21

Male infertility is a disease which affects approximately 7 morphology analysis (SMA) is one of the main diagnosis methods for this problem. Manual SMA is an inexact, subjective, non-reproducible, and hard to teach process. As a result, in this paper, we introduce a novel automatic SMA based on a neural architecture search algorithm termed Genetic Neural Architecture Search (GeNAS). For this purpose, we used a collection of images called MHSMA dataset contains 1,540 sperm images which have been collected from 235 patients with infertility problems. GeNAS is a genetic algorithm that acts as a meta-controller which explores the constrained search space of plain convolutional neural network architectures. Every individual of the genetic algorithm is a convolutional neural network trained to predict morphological deformities in different segments of human sperm (head, vacuole, and acrosome), and its fitness is calculated by a novel proposed method named GeNAS-WF especially designed for noisy, low resolution, and imbalanced datasets. Also, a hashing method is used to save each trained neural architecture fitness, so we could reuse them during fitness evaluation and speed up the algorithm. Besides, in terms of running time and computation power, our proposed architecture search method is far more efficient than most of the other existing neural architecture search algorithms. Additionally, other proposed methods have been evaluated on balanced datasets, whereas GeNAS is built specifically for noisy, low quality, and imbalanced datasets which are common in the field of medical imaging. In our experiments, the best neural architecture found by GeNAS has reached an accuracy of 92.66 acrosome abnormality detection, respectively. In comparison to other proposed algorithms for MHSMA dataset, GeNAS achieved state-of-the-art results.



There are no comments yet.


page 1

page 6

page 8

page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Approximately 15% of couples suffer from infertility, which is in 30 to 40% of the cases due to the male sperm abnormalities [24, 62]

. One of the key methods for male infertility diagnosis is sperm morphology analysis (SMA) which consists of classifying sperm head, vacuole, and acrosome as normal or abnormal.

In this paper, we propose an automatic SMA system that is based on Convolutional Neural Networks (CNN). CNN is a natural choice for such a task for they have shown very good performances for image classification  [33, 64, 63]. Unfortunately, CNNs usually exhibit complex architectures and need a large amount of expert knowledge and computer resources to tune. Recently, several methods have been proposed to optimize CNN architectures that go beyond brute force grid search. A large part of these methods fall into three categories: Evolutionary [50, 67, 61, 43, 45]

, Reinforcement Learning  

[40, 49, 47, 11], and Bayesian Optmization [59, 57, 60, 27]. The success of most of the existing algorithms is based on high computational power (e.g., hundreds of GPUs) [72, 50]

. So, using them is very hard for small-sized companies or individual researchers. Besides, most of them were applied to balanced datasets such as ImageNet 

[16] or CIFAR-10 [32].

The algorithm proposed in this paper, that we call Genetic Neural Architecture Search (GeNAS), is a genetic algorithm for searching through architectures of CNN . In this framework, an individual is a specific CNN architecture (its hyper-parameters and structure of its layers) that is described by a string, which constitutes the genome of the individual. The initial population is made of randomly generated genomes of different lengths, which will be combined via three genetic operations: tournament selection, crossover, and mutation. The crossover operation explores the depth of neural architectures by combining the parents genomes, while mutation explores the search space of filter-size and stride-size of layers of each neural network by randomly selecting new genome values for each individual. The fitness of an individual is computed by training it on the training set and computing its accuracy on the validation set. After each iteration, utilizing the obtained validation accuracies by our proposed weighting technique, the final fitness value is computed.

Our experiments show that GeNAS can find CNN architectures which are capable of reaching state-of-the-art accuracy, precision, and score on the head, acrosome, and vacuole classification for MHSMA dataset  [25], with fewer parameters and layers, compared to hand-designed models. Besides, our proposed method works automatically without any human intervention.

The salient features of GeNAS are:

  1. A neural architecture encoding which can explore an optimal constrained search space of CNN architectures.

  2. A crossover operation for exploring the depth of neural architecture.

  3. A hashing method, for saving pairs of architecture and fitness of each chromosome; and then, reusing them in the fitness evaluation stage to speed up the algorithm.

  4. The ability to find the optimal architecture with just 1 Nvidia GPU in less than 10 days which is a great improvement in comparison to other proposed methods.

  5. A pruning algorithm during genotype to phenotype conversion to prevent phenotype (neural architecture) from having negative output height and weight values.

  6. A new fitness computation method called GeNAS-WS which is specially designed to work with noisy, low quality, and imbalanced datasets.

  7. A neural architecture search algorithm specially designed to work with a challenging dataset which is highly imbalanced; that does not have enough training examples; which images are not stained and are noisy and which details are not clear and recorded with a low-magnification microscope.

The paper is structured as follows: in section II, previous neural architecture search and sperm assessment methods are introduced. Our proposed algorithm is presented in details in section III. In the fourth section, MHSMA dataset, augmentation technique, our sampling method, and the results of our experiments both on Random Search and GeNAS are reported and compared to other results on this dataset. In the last section, the conclusion of our proposed method and its results on MHSMA dataset are summarized.

Ii Related Work

Existing researches in neural architecture search and automatic sperm processing are reviewed in this section. Recent research approaches in neural architecture search have produced an excessive amount of methods for automating the design of neural network architectures [72, 5, 27, 26, 39, 62, 61, 43, 45, 63, 55]. Most of these approaches can be divided into three groups of Reinforcement Learning, Evolutionary, and Bayesian Optimization. We will describe them in the following subsections and conclude with a discussion on automated sperm processing methods.

Ii-a Reinforcement Learning

Our proposed method is inspired by Neural Architecture Search (NAS) [72]

. NAS applies a reinforcement learning algorithm, considering accuracy on the validation set as the reward value, to train a recurrent neural network which is capable of generating a variable-length string. This variable-length string describes the structure and connectivity of neural architecture. Another line of research in this group is the MetaQNN 

[5] algorithm, which employs A model-free reinforcement learning algorithm, Q-learning, with -greedy exploration strategy to find the optimal convolutional neural architecture.

While these methods are searching for the whole architecture of the convolutional neural network, other methods were proposed which search for the best convolutional cell, i.e., a set of convolutional and pooling layers which have different connections with each other and arranged in a certain way [71, 73]. For evaluating each convolutional cell, they repeated it for a specified number of times to create a convolutional neural architecture; then, they used the accuracy of this architecture on the validation set to determine its objective value. One of these methods termed BlockQNN [71] which also uses reinforcement learning and Q-learning with -greedy exploration.

One of the problems of the previous methods is that they need a high computational power (hundreds of GPUs) to work properly. Recent methods have tried to solve this problem in various ways. One of these methods tried to solve this problem by using regression models to predict the final accuracy on the validation set of partially trained models [5]. Other proposed algorithms applied new techniques such as early stopping [6], layer-level network transformation  [11], path-level network transformation [12], and parameter sharing [47]. In this paper, a hashing method, which saves each generated neural architecture and its corresponding accuracy, and a special pruning technique are proposed to reduce the computational power.

Ii-B Bayesian Optimization

Bayesian optimization algorithms has also been used for neural architecture search and hyperparameter optimization 

[60]. For speeding up the search and minimizing the computation power, researchers proposed to use optimal transport program [27] and network morphism [66, 26]. The main drawback of Bayesian optimization methods is that they are not scalable, and cannot search through a large number of hyperparameters; while, our proposed method could search through any number of hyperparameters.

Ii-C Evolutionary

During the 1980s and 1990s, some researchers used Genetic algorithm for searching through neural architectures to find the optimal neural architecture and its weights [45, 29, 54]. But to the best of our knowledge, most of these algorithms were used to find a neural network which could , only, fit a simple function like XOR function [45]

, and they failed to compete with hand-designed neural networks, back then. After a couple of years, in 2002, an evolutionary algorithm, termed NEAT, was proposed to incrementally grow a neural network and updates its weights 


Recently, new evolutionary algorithms have been proposed to search through the more complex architecture of neural networks like convolutional and recurrent neural networks. Their results were competitive with respect to the state-of-the-art hand designed and generated neural architectures by reinforcement learning [43, 67, 50, 40, 49]. One of these methods called AmoebaNet-A [49] modified the tournament selection of the evolutionary algorithm with an age property, which considers younger genotypes more valuable, and sets a new state-of-the-art accuracy on ImageNet and is competitive with reinforcement learning-based methods. While most of these methods need a tremendous amount of computational power to reach the optimal neural architecture [50, 49], another paper proposed an efficient neural architecture search [40], which uses less computational power to discover the near-optimal neural architecture; but, its efficiency is not comparable to the efficiency of algorithms which applied reinforcement learning.

All the introduced methods were evaluated on balanced datasets like CIFAR-10, CIFAR-100, and ImageNet. To the best of our knowledge, there is not any neural architecture search algorithm evaluated on an imbalanced dataset. As a result, our main goal, in this paper, is to propose a novel efficient genetic algorithm which is specifically designed to find the best neural architecture for imbalanced datasets, very common in the field of medicine.

Ii-D Automatic Sperm Processing

Automatic selection of sperms has been the object of may researches. In one of these studies, [48] combined computerized karyometric image analysis (CKIA) system and DNA-specific stain (Feulgen) for evaluation of ICSI-selected epididymal sperms. They have used a high magnification (1000) microscope.

In another research, the fraction of boar spermatozoa heads was measured and a pattern for this part was trained [53, 52]. In this method, a deviation model is proposed and calculated for each sperm’s head. After that, an optimal value is obtained for classification of each sperm. Then, sperms tails, by utilizing morphological closing, were removed and the holes in the contours of the heads were filled. At the end, by applying Otsu’s method [46], the head of each sperm is separated from background.

In [65], sperm nuclear morphometric subpopulations of different species including goat, sheep, pig, and cattle were processed using ImageJ [3] and the results were used for multivariate cluster analyses. There is also another work in which the effects of different staining methods on human sperm head were reported [41]. In their study, stained and fresh sperms were compared together. ImageJ also used in another method in order to assess ram sperm morphology on the stained images [68]. [70]

has also proposed a novel method for animal sperm morphology analysis. Different algorithms such as active contour model, K-means, thinning algorithm, and image moment have been utilized in this method.

In another research, the Bayesian classifier was applied in order to extract different parts of sperm: acrosome, nucleus, midpiece, and tail [8]

. This segmentation was done using Markov random field model and the entropy-based expectation-maximization algorithm on the stained human semen smear. The images were captured with a high-magnification (1000

) microscope.

[2] proposed a method for classification of sperms into normal and abnormal classes. Their method proceeds in four steps: 1) image preprocessing: RGB to grayscale conversion and noise removal by applying median filter; 2) sperm detection and extraction using Sobel edge detection algorithm; 3) segmentation of each sperm; and 4) applying classification to detect normal and abnormal sperms.

In another study, Combining learning vector quantization (LVQ) and digital image processing was used for classification of boar sperm acrosome

[4]. The images were captured using a phase-contrast microscope. This method works on stained images and the experimental results have shown 6.8% of error on the classification task.

Combination of histogram statistical analysis and clustering techniques is another method that has been applied in sperm detection and segmentation [14]

. In another study, principal component analysis (PCA) was also applied in order to extract features from sperm images


. K-nearest neighbors (KNN) was also used for classification of normal sperms. There are also some methods that focus on microscopic videos for sperms segmentation and calculation of their motilities

[20, 10, 23].

One of the successful methods for normal sperm selection, which is able to work with fresh human sperms, is the algorithm of [19]. This method works with images from low-magnification microscope (400 and 600) and the images are non-stained. One of the other advantages of this method is its real-time processing time.

To the best of our knowledge, there are only few researchers that applyed deep learning for normal sperms classification. In one of these studies, the sperms DNA integrity have been predicted from sperm images applying deep CNN [42]. They have trained CNN on a collection of about 1,000 sperm cells of known DNA quality, for prediction of DNA quality from brightfield images. A pre-trained CCN architecture (i.e., VGG16) has been used in this study and some additional layers were added after the last convolutional layer. The achieved results were acceptable in terms of DNA integrity prediction.

In another deep learning method, sperm images were classified into World Health Organization (WHO) shape-based categories (i.e., Normal, Tapered, Pyriform, Small, and Amorphous)  [51]. The authors also used VGG16 in order to avoid excessive neural network computation. They have applied their method on two freely-available sperm head datasets (HuSHeM [58] and SCIAN[13]). The achieved results on sperm classification were superior to the other existing methods on these two datasets, however, this method cannot classify different parts of each sperm.

One of the most successful deep learning algorithms in sperm classification is the work of [25]. In their method, after applying data augmentation techniques and a sampling method, a deep neural network architecture was designed and trained. This architecture is able to detect morphological abnormalities in different parts of human sperm (i.e., acrosome, head, and vacuole). This algorithm was trained and evaluated on the MHSMA dataset and the trained models were so accurate. It should be noted that GeNAS is far more precise than this method, which is discussed in more detail in section IV-F.


In this section, we first present the overall algorithm of GeNAS, then we focus on the chromosome structure, and the fitness function. Finally, the primary operations of selection, crossover, and mutation are described.

Iii-a The overall structure of GeNAS

The overall scheme of GeNAS is shown in Fig. 1. The algorithm starts with initializing the population by generating number of chromosomes. Process of generating each chromosome is consist of two steps: first, we set the length of the chromosome by sampling a random value from the feasible set; then, the value of each gene of the chromosome will be selected from constrained search space, will be discussed in subsection IV-D

. It should be mentioned that since every four consecutive genes encode a convolutional layer, the length of each generated chromosome should be a factor of four; and also, the length of chromosomes could be different. Each chromosome phenotype corresponds to a CNN architecture that consists of convolutional cells. Besides, every convolutional cell is made of a convolution layer followed by a max-pooling layer.

During the genotype phenotype translation process, a pruning operation will take place if the genotype is not feasible, i.e., the output of the corresponding convolutional neural architecture has negative height and weight values. Briefly, the pruning operation will cut some of the convolutional cells in the chromosome head to make it feasible.

Next, the phenotype of each individual is trained on number of mini-batches , and accuracy on the validation set is evaluated and saved, after training on each mini-batch. These accuracies will be used, at a later stage, to compute the fitness of the individual. Then parents selection is performed on the population using tournament selection with a tournament size of

. At last, a special crossover operation with a probability of

, followed by a mutation operation with a probability of , is applied to the selected parents to produce a new population. The crossover operation can change the length of each child and helps to explore the search space of neural architectures with different length; and also, it can help us exploit the best individuals in the population. On the other hand, the mutation operation will change genes value in genotype, so it is responsible for exploring the different number of filters and stride-size in the phenotype.

These steps will be repeated for number of times, then the architecture with the best fitness among all populations will be chosen, as the optimal neural architecture.

The overall structure of GeNAS is summarized as follow:

  1. Randomly initialize each individual of the first generation from the constrained search space

  2. Prune the genotype of each individual and translate it to the corresponding phenotype, as shown in Fig. 2.

  3. Train the individual for number of mini-batches, then use GeNAS-WF method to compute its fitness value.

  4. If generation is reached, go to step 8.

  5. Perform Tournament Selection with tournament size of to select parents for crossover operation.

  6. Perform special crossover with probability of .

  7. Perform mutation with probability of , then go to step 2.

  8. Select the individual with the maximum fitness value among all populations as the optimal individual.

  9. Train the optimal individual for number of mini-batches, and during training, save the model which has the maximum accuracy on the validation set.

  10. Search for classification threshold which can maximize score of the trained model on the validation set.

  11. Evaluate the optimal trained model with the selected threshold on the test set and report the test measures on the test set.

Fig. Fig. 1: Structure of GeNAS

Iii-B Chromosome Structure

The proposed chromosome is a linear chromosome with discrete genes values. Each chromosome consists of multiple convolutional cells with different features. A convolutional cell is composed of a convolutional layer following by a max-pooling layer. In our solution, every four genes of the chromosome represent the features of a convolutional cell. So, a chromosome of length , contains number of convolutional cells. Therefore, every chromosome’s length should be a factor of four. The first three genes of a convolutional cell, respectively, encode the number of filters, the filter-size (every filter has the same width and height), and the stride-size of a convolutional layer. The fourth gene represents the stride-size (stride-size and max-pool window size are both equal) of a max-pooling layer. The structure of each genotype and its translation to phenotype is shown in  Fig. 2 .

Due to the definition of the chromosome, our proposed linear chromosome can describe any plain convolutional neural architecture which is constrained by our search space. For example, if we want to have consecutive max-pooling layers in our convolutional neural architecture, we could allow our search space to assign 0 value to the filter height and width size which is equal to not having a convolutional layer. Also, when we want to have consecutive convolutional layers in our convolutional neural architecture, we could allow our search space to assign value 1 to the stride-size (pooling-size) of a max-pooling layer which is equal to not having a max-pooling layer.  Fig. 3 shows an example of genotype decoding.

Fig. Fig. 2: Translation of genotype to phenotype in GeNAS
Fig. Fig. 3: Translation of genotype to phenotype of a chromosome with 2 convolutional cells. Note that the last max-pooling layer is removed because its stride-size is 1.

Iii-C Pruning

Before fitness evaluation step, to make sure that the chromosome is feasible, a pruning process will proceed. A chromosome considered infeasible when by stacking its corresponding convolutional cells, we get a negative output from the constructed CNN. In the process of genotype to phenotype translation, this phenomenon happens, because as we add a convolutional or a max-pooling layer in each step to the phenotype, output dimensions of phenotype is decreased, according to equations 1, 2, 3, and 4. For clarification, an example of the pruning process on a chromosome with four convolutional cells is illustrated in the Fig. 4

. In this example, the padding size employed is zero for all convolutional cells, so equations

3 and 4 are only used for calculating the output dimensions.

In simple terms, when the output of a phenotype gets negative, we cut enough convolutional cells from its head, until we get positive output. For taking this process in parallel with genotype to phenotype translation into account, before adding each convolutional or max-pooling layer, we calculate the output dimension, with the help of equations 1, 2, 3, and 4; then, if we get a negative output, the translation process will be stopped and constructed CNN will be sent for fitness evaluation; otherwise, we add the layer to the top of phenotype and repeat this process. It is good to mention that as this process reduces the number of layers, it accelerates the fitness evaluation procedure. As a result, it speeds up the whole algorithm.


Where and are the new weight and height size, and are the current weight and height size, F is the filter size, S is the stride size, and P is the padding size. These equations work both for calculating the dimension of convolutional and max-pooling layer output. The equations 1 and 2 take place when we use padding, otherwise, the equations 3 and 4 is used.

Fig. Fig. 4: Pruning of a chromosome with four convolutional cells

Iii-D Fitness Evaluation

One of the indispensable components of a genetic algorithm paradigm is formulating its fitness function. In single-objective optimization problems, a good fitness function acts as an objective function that maps a feasible solution to a scalar value, which is a summarization of its closeness to a set of aims: we want to achieve. In the field of machine learning, the common scalar value, have been used to evaluate the generalization performance of a neural network, is the accuracy of the learned neural network model on the validation set. So, as we want to achieve a model with high generalization power and utilization of validation accuracy worked well for the prior researches, in the NAS domain; we have opted to employ the accuracy on the validation set, as the fitness value.

For obtaining this fitness value, for each solution, first, we train it on number of mini-batches. In the course of the training phase, after training on each mini-batch, accuracy of the model on the validation set will be accumulated in a vector, named B. Just to clarify, the first element of vector B carries validation accuracy after training on the first mini-batch, the second element carries validation accuracy after training on the second mini-batch; and it follows this pattern until the last element, which is element.

Our experiments revealed that validation accuracy inordinately fluctuate, during the training phase; and the root of this fluctuation is due to the training on a noisy dataset and oversampling method, which we proposed for generating balance mini-batches from the imbalanced dataset. For reducing the effects of these fluctuations, we proposed a method named GeNAS Weighting Factor (GeNAS-WF), which can level off these fluctuations, by mapping vector B to another smoother vector named G. For clarification, both vector B and G visualized in Fig. 5, for a single training phase.

Fig. Fig. 5: Fluctuations of a sample CNN accuracy on validation set through iterations on head label

In details, GeNAS-WF works as follows: first, after training the model and acquiring vector B, we decide on a customized window, named W, which is a vector of size ; then, the cross-correlation between B and W will be calculated. This operation will result in a smoother vector G. For illustration, the first element of G is computed by calculating the weighted mean of one to elements of vector B, the second element is computed by calculating the weighted mean of two to elements of vector B and so on, using W as the weights. At last, the maximum element of vector G assigned to the fitness value of the model. The equations are as follows:


In the equations above, characters in the square brackets refer to the specific element of their corresponding vectors. For example, in the G[i] notation, the character i indicates the element of vector G.

Fig. Fig. 6: GeNAS-WF: An example of calculating the fitness of a trained neural architecture

It should be noted that based on our experiments, we assigned one to all elements of the vector W; but, it has the potential to get other values for improving the performance on other datasets.

For the purpose of increasing the GeNAS speed and removing the redundant evaluation of individuals, before the translation process takes place, a hashing method is also proposed to check if the fitness of the genotype has already been computed, in which case, the previously computed fitness is retrieved; otherwise, the phenotype is produced, and its fitness will be computed. It is good to mention that the time-complexity of retrieving the fitness of previously evaluated chromosome is O(1).

Iii-E Selection

The parent selection step is performed using tournament selection with a tournament size of , where is the population size. In this selection method, first, individuals are randomly chosen from the population, among which the individual with maximum fitness value is selected.

Tournament selection has been chosen because it allows controlling the selection pressure by means of tournament size. As tournament size gets larger, the selection pressure gets higher and vice versa. Tournament size, therefore, allow changing the degree of exploration and exploitation. Also, tournament selection is frequently applied in coincidence with noisy fitness functions [44]. Noisy fitness functions are functions which with the same input values give different output values, in subsequent evaluation steps. So, the fitness function used in this paper is noisy, as well; because there is no guarantee that training the same convolutional neural architecture, multiple times, would give us the one unified fitness value.

Iii-F Crossover

The crossover operation combines the genotypes of two parents to form the genotype of an offspring. The main contribution of the crossover operation of our proposed algorithm is that it helps GeNAS to change the length of the chromosomes of the new population, i.e., changing the number of convolutional cells. In our case, the crossover operation combines genotypes of different sizes and produce a genotype of yet another size.

Crossover is applied on a pair of genotypes and selected through tournament selection. First, a number between zero and one is randomly chosen. If it is less than threshold , then the crossover operation is performed on and ; otherwise, and are added to the new generation, after a mutation step.

With having in mind that each number of genes in each generated child should be coefficient of four, the crossover designed as follows: a crossover point is randomly chosen for . Next, a point is computed for in accordance to equation 7:


Where RandomInteger function will generate a random integer between 0 and

from random uniform distribution. Equation  

7 will guarantee that, in the further steps, crossover generates children which their length, i.e, the number of their genes, will be coefficient of four.

Once and are defined, the lengths of the two children can be computed as follows:


If the computed lengths exceed the maximum and minimum of individuals length in constrained search space, will be calculated again, until both children length are valid. When valid values of and have been determined, each genome is cut in two with respect to its crossover point. The left part of is concatenated to the right part of to produce the first child and the left part of is concatenated to the right part of . An example of this operation is shown in Fig. 7.

Fig. Fig. 7: A crossover example which generates children with different lengths.

Iii-G Mutation

The mutation operation allows modifying the number of filters, filter-size and stride size of the convolutional and max-pooling layers for each individual. The mutation operation will take place for each individual with mutation probability (i.e., ).

Considering every 4 consecutive genes represents a convolutional cell, we grouped genes to four distinct groups: number of filters, filter-size, convolutional stride-size, and pooling stride-size genes. So, if the mutation takes place for a specific chromosome, first, one of the genes from the selected chromosome will be picked randomly. Then, if the selected gene belongs to the number of filters or filter-size group, a random element will be selected from the respective feasible values associated with the group of the selected gene. Otherwise, if the selected gene match with the convolutional or pooling stride-size group, first,

will be calculated as follows: adding current stride-size to a floating-point value, randomly selected from a normal distribution with a mean of zero and variance of one; subsequently, computing the minimum of obtained value, in previous step, and respective gene group maximum constrained value. At last, for achieving the final mutated stride-size, maximum of

and respective gene group minimum constrained value will be calculated. The equations are as follows:


Where RandomNormal function will generate a random value with mean 0 and variance 1 from the normal distribution. MaxConstraintStrideSize and MinConstraintStrideSize are the maximum and minimum values permitted to use as stride size, respectively.

Iv Experiments and Results

In the following sections, the experimental part of our work is described. First, the Modified Human Sperm Morphology Analysis dataset (MHSMA), which contains annotated images of human sperm cells, is introduced. Next, the data augmentation techniques and an oversampling method, which we have designed, is illustrated. Then, constrained search space along with the modules, which we utilized in our search space, is explicated, subsequently. At last, The details of GeNAS implementation and the results of GeNAS, random search, and previous benchmarks are discussed.

Iv-a Dataset

The MHSMA dataset [25] is composed of 1,540 grayscale images of sperms with both size of 64x64 and 128x128. This dataset is made from Human Sperm Morphology Analysis dataset (HSMA-DS) [19], introduced in 2015. All images have been labeled by specialists, with four binary labels: tail and neck, vacuole, head, and acrosome which value can be either normal (positive), or abnormal (negative). The distribution of negative and positive values with respect to the four labels is shown in Table 1. This table reveals that the data is highly imbalanced in favor of the positive class, which accounts for up to of the data, depending on the label.

Label # Positive # Negative % positive % Negative
Tail and neck
TABLE Table 1: Data distribution of the MHSMA dataset

We have used the split of the data proposed by [25] in three subsets: training, validation, and test which contain, respectively, 1000, 240, and 300 images.

Fig. Fig. 8: Sample images from MHSMA dataset. As it is visibile, the images are non-stained and low-resolution.

Iv-B Data augmentation

In this task, each trained convolutional neural network should map an input sperm image to a 1-bit label (i.e., abnormal and normal). Most generated neural architectures, by GeNAS, have parameters in the order of millions. So, as the number of the neural architecture parameters increases, more training examples are needed to properly tune the parameters. But, the act of collecting more human sperm images is extremely costly and arduous. To the best of our knowledge, MHSMA is one of the largest datasets in the field of sperm morphology analysis.

As mentioned before, MHSMA dataset has only 1000 training examples. For solving the problem of training examples shortage, a data augmentation technique is used to prevent overfitting and virtually increasing training set size. Before feeding each training example (sperm image) to the model, a 64x64-pixel crop will be extracted from each 128x128-pixel image. After crop extraction step, we apply random modifications to each training example which they are as follows:

  • Flipping: Every image was flipped vertically and horizontally with the probability of 0.5.

  • Rotating: The cropped area of every image was rotated by degrees, where is randomly selected regarding the uniform distribution of .

  • Shifting: The crop region was shifted along the vertical and horizontal axis by y and x pixels, where y and x were randomly chosen regarding the uniform distribution of .

  • Scaling: Pixel values of each image was multiplied by , where is randomly chosen regarding the uniform distribution of .

The output of these random modifications on one training example will be a 64x64 gray-scale image. Ultimately, the image will be normalized by subtracting it from its mean and dividing the result by 255, as shown in equation 12 (x is a single sperm image):


It is good to mention that our augmentation settings is the same as the augmentation settings used in [25] work.

Iv-C Sampling

As shown in Table 1, MHSMA dataset is highly imbalanced. So, a special oversampling method used which was proposed by [25], for confronting this challenge. The main goal of this oversampling method is generating balanced mini-batches from an imbalanced dataset like MHSMA. In this oversampling method, negative and positive samples will be divided into two distinct shuffled lists. The process of adding one sample to each mini-batch is as follows: first, we choose one of the lists by probability; next, the first sample at the top of the chosen list will be selected; and it will be added to mini-batch. At last, the selected sample will be moved to the end of the chosen list. For generating each mini-batch, this process will be repeated, until the mini-batch fulfilled. Also, after all of the samples in a list is used, the list will be shuffled. By using this approach, most likely, classes in every mini-batch will be balanced.

Iv-D GeNAS Search Space

Our search space contains plain convolutional architectures (max-pooling and convolutional layers), with Scaled Exponential Linear Units (SELUs) [30]

as non-linearities. SELU is employed because it will keep the neuron activations close to unit variance and zero mean; so, it let us increase the depth of our convolutional architecture without considering vanishing and exploding gradients problem.

As our search space gets bigger, finding the optimal convolutional architecture will be harder and needs more time and computational power. So, with inspiration from previous popular convolutional architectures [36, 34, 69] a constrained search space is designed.

For each convolutional layer, the meta-controller (Genetic Algorithm) should select a filter-size in {1, 3, 5, 7, 11}, number of filters in {4, 8, 16, 32, 64, 128, 256}, convolutional cell stride-size in range of [1, 2]. Also, for each max-pooling layer, it should select stride-size in range of [1, 2]. In addition to these constraints, 2 up to 50 number of convolutional cells (i.e., an individual with a minimum length of 8 up to 200 genomes) is permitted. It should be noted that the same constrained search space is employed to discover a sub-optimal neural architecture for each label.

Iv-E GeNAS Implementation Details

The initial population is consists of 30 individuals, which each of them is generated with both random length and random genes value from the constrained search space. During genotype to phenotype transformation of each individual, after input layer and convolutional cells created, one average pooling layer with stride-size of 2 followed by two fully connected layers with 1024 neurons with SELU as activation function; and then, 1 neuron with sigmoid as activation function (output layer) will be added to the end of the phenotype. SELU activation function, defined in equation 

13, is an Exponential Linear Unit (ELU) activation function, defined in equation 14, which is scaled so that the variance and mean of the inputs are maintained in its original state between two consecutive layers. For GeNAS-WF, a window with a size of 5 and value of is used, same as the window shown in the Fig. 6.


Afterward, each chromosome phenotype weights will be initialized using the LeCun normal initializer [35]

and its biases will be initialized to zero. In the LeCun normal initializer, the samples will be drawn from a truncated normal distribution with a standard deviation of

and zero as the center, where is the number of inputs to the weight matrix. Then, it will be trained for 2000 mini-batches. For optimization, ADAM optimizer [28] with a constant learning-rate of

, exponential decay rates for the moment estimates of

, and

is employed. ADAM is an algorithm for first-order gradient-based optimization of stochastic objective functions, rooted in adaptive estimates of lower-order moments. For loss function, binary cross-entropy is applied and each mini-batch contains

training images. Next, the fitness of each individual by GeNAS-WF method is calculated. Then, tournament selection with tournament-size of 10 will select 30 favorable parents for the next generation. Next, the crossover and mutation operation will be applied to the selected individuals. After experimenting with various mutation and crossover probabilities for each task, the best mutation and crossover probability, which we came up to, were 0.7 () and 0.3 (), respectively. These steps will be repeated for 20 iterations. The same algorithm with the same settings will be used to do all three tasks which are finding the optimal convolutional architectures that can predict abnormality in the sperm head, vacuole, and acrosome on MHSMA dataset.

Our experiments were done, using Keras 


with Tensorflow 

[1] backend on one Nvidia GPU.

Iv-F Results

Before talking about GeNAS, we want to address measurements we employed to evaluate the models found by GeNAS. In our experiments, we took accuracy, recall, precision, and

score into account as metrics for our evaluations. The formulations of these evaluation metrics are shown in equation

15-18, where FN, FP, TN, and TP indicate false negative (i.e., regular sperms predicted wrongly), false positive (i.e., irregular sperms predicted wrongly), true negative (i.e., irregular sperms predicted correctly), and true positive (i.e., regular sperms predicted correctly), respectively. It should be mentioned that in our experiment, we consider the value of in equation 18 equal to (i.e., ) for two reasons: first, previous papers practiced this same metric for their evaluations; second, measure is more dependent on the precision rather than recall. So, as in sperm morpholohy, precision is more important than recall, this value is a good metric to be used.


We have employed GeNAS to discover the best architectures, which could predict abnormality in the sperm head, vacuole, and acrosome, independently. After the meta-controller trained and evaluated on 600 architectures (i.e., 30 individuals in 20 generations), we extracted the architecture with the best fitness. The searching process for each label shown in the Fig. 9

. According to this figure, for each label, the overall fitness has gradually increased during the evolution process. For clarification, the linear regressions of all the points, for each chart in

Fig. 9, are calculated and shown. With respect to these charts, GeNAS took approximately , , and hours on just one GPU to run for the head, vacuole, and acrosome, respectively. Therefore, considering that designing these architectures will take plenty of time for a human expert, it is definitely less time consuming and less arduous to use GeNAS instead of hand designing these architectures.

(a) Head
(b) Vacuole
(c) Acrosome
Fig. Fig. 9: Fitness of individuals through each generation and time, on the head, vacuole, and acrosome label, vertically (Colors are just for clarification and they do not represent any value)

After the best chromosome extracted, it trained for 20,000 mini-batches. The validation accuracies, for each label over iterations of training, are illustrated in Fig. 10. It should be noted that in Fig. 10, curves are smoothed using a technique termed Simple Exponential Smoothing [18].

(a) Acrosome
(b) Head
(c) Vacuole
Fig. Fig. 10: Training and validation accuracy over iterations of training

During the training stage, after training on each mini-batch, accuracy on the validation set was computed, and the model weights checkpoint with the highest validation accuracy was saved. Then, a search for the best classification threshold in the range of which could maximize the score of the saved model on the validation set was conducted; and in result, a proper threshold was selected. It should be considered that increasing the threshold value will raise the probability of detecting sperms as abnormal and eventually reduce the recall; but, in the sperm morphology analysis main concern is increasing the precision value; so, ultimately, this will cause no harm to the robustness of our models. Also, we employed the range of because a broader range of values could be biased toward the precision eventually and extremely harm the recall, and a smaller range of values is not good enough to face the challenge of favoring precision over recall. The classification thresholds discovered for vacuole, head, and acrosome abnormality detection models are , , and , respectively

The best discovered architectures are shown in Fig. 11, Fig. 12, and Fig. 13. For head label, illustrated in Fig. 11, the neural architecture found by GeNAS is consist of 12 convolutional and three max-pooling layers.

Fig. Fig. 11: Architecture of best model for sperm head label

For the acrosome label, represented in Fig. 12, the neural architecture discovered by GeNAS includes 18 convolutional and five max-pooling layers.

Fig. Fig. 12: Architecture of best model for sperm acrosome label

For the vacuole label, shown in Fig. 13, the neural architecture ascertained by GeNAS is formed of 10 convolutional and five max-pooling layers.

Fig. Fig. 13: Architecture of best model for sperm vacuole label

At last, the discovered neural architectures were evaluated on the test set; The confusion matrix was generated and represented in

Table 2.

Label Actual class Predicted class
Normal Abnormal
Acrosome 197 true positives 16 false positives Normal
51 false negatives 36 true negatives Abnormal
Head 255 true positives 7 false positives Normal
15 false negatives 23 true negatives Abnormal
Vacuole 203 true positives 16 false positives Normal
52 false negatives 29 true negatives Abnormal
TABLE Table 2: Confusion matrix for evaluation of best models found by GeNAS on test set, for each label

To the best of our knowledge, the only paper designed a convolutional neural network for the MHSMA dataset is [25] paper. As shown in Table 3, in this paper, the same neural architecture composed of two max-pooling and 24 convolutional layers followed by one average-pooling layer and two fully-connected layers is proposed, for predicting the abnormality on each label. On the other hand, GeNAS identified distinct neural architectures for each label. Also, In comparison with [25] paper, the best architecture found by GeNAS consists of less than half the number of convolutional layers and three more max-pooling layers, for the vacuole label; half the number of convolutional layers and one more max-pooling layers, for the head label; and 6 less number of convolutional layers and three more max-pooling layers, for the acrosome label. At last, it is good to mention that same number of average-pooling and fully-connected layers with the same number of neurons are used in all architectures. The results of this method are shown as “Javadi et al.” in the Table 4.

Label method convolutional max-pooling fully-connected average-pooling total
Acrosome GeNAS 18 5 2 1 26
Javadi et al. 24 2 2 1 29
Head GeNAS 12 3 2 1 18
Javadi et al. 24 2 2 1 29
Vacuole GeNAS 10 5 2 1 18
Javadi et al. 24 2 2 1 29
TABLE Table 3: Comparison of number of different layers in GeNAS and Javadi et al. paper [25]

Another experiment that we conducted was about running a random search to find the best architectures which can predict abnormality in the sperm head, vacuole, and acrosome. In the beginning, random search trained on 600 architectures (similar to our proposed method), randomly generated from constrained search space. Then, every trained architecture was evaluated on the validation set, and the architecture with the highest validation accuracy selected. Next, the selected architecture was trained for 20,000 mini-batches (i.e., the same as GeNAS). After training on each mini-batch, accuracy on the validation set computed and the checkpoint with the highest validation accuracy saved. Finally, the trained model evaluated on the test set and test accuracy, precision, recall, and score calculated. The results are shown in the Table 4.

Another paper proposed an image processing based algorithm to predict abnormality on the human sperm morphology analysis dataset (HSMA-DS), which is the dataset that MHSMA based on  [19]. Their algorithm has been assessed on two of the labels: head and vacuole. The results of this method is shown as “Ghasemian et al.” in the Table 4.

Label Method Accuracy Precision Recall score
Acrosome GeNAS 77.66 93.67 81.09 90.85 5,756,553
Random Search 69.66 86.8 74.5 84.09 1,185,209
Javadi et al. 76.67 85.93 80.28 84.74 5,637,649
Ghasemian et al. N/A N/A N/A N/A N/A
GeNAS 77.33 92.69 79.60 89.74 1,908,261
Random Search 76.00 88.58 80.49 86.83 3,032,401
Javadi et al. 77.00 83.48 85.39 83.86 5,637,649
Head Ghasemian et al. 61.00 76.71 71.79 75.68 N/A
GeNAS 92.66 97.32 94.44 96.73 2,211,461
Random Search 89.00 93.12 94.20 93.34 4,715,861
Javadi et al. 91.33 94.36 95.80 94.65 5,637,649
Vacuole Ghasemian et al. 80.33 83.21 93.56 85.09 N/A
TABLE Table 4: Comparison of results of best models found by GeNAS with other proposed methods on test set (all values are in percent except parameters)

Eventually, as shown in Table 4

, GeNAS found convolutional architectures with higher accuracy, precision, and recall on the test set, for all of three labels; and furthermore, achieved better recall on acrosome label, in comparison with random search and previous methods. Besides, the discovered models, by GeNAS, in both Vacuole and Head labels, have extremely fewer parameters, in comparison with other methods. However, the number of parameters for the random search is much more inferior than GeNAS; its accuracy, recall, precision, and

score can not compete with GeNAS accuracy. As mentioned before, it should be considered that critical measure for this dataset is precision; so, the more inferior value of recall on the head and vacuole labels will not harm the performance of our models.

Based on our experiments, inference-time of all discovered models are less than 1 second, which is a proper time for treatment purposes.

Iv-G Visual explanation

To make sure that the best neural architectures, found by GeNAS, pay attention to the relevant parts of sperm images for making predictions, we applied a visual explanation technique named Gradient-weighted Class Activation Mapping (Grad-CAM) [31, 56]. This visualization technique will employ class-specific gradient information to generate a heatmap, which emphasizes the areas of the input image that are important for prediction. This visual explanation will provide us with a better understanding of the function of our neural architectures. Visual explanations of models discovered by GeNAS are shown in Fig. 14, for 3 different sample images which were classified, accurately. These visual explanations illustrate, discovered models have certainly learned to consider the sperm image areas which are, in fact, important for sperm abnormality prediction task. For clarfiction, if we compare the visual explanation shown in Fig. 14 with the diagram of a sperm represented in Fig. 15, we will be noticed that our models pay attention to the exact relevant fragments of sperm image for each classification task, i.e., parts of the head, acrosome, and vacuole of sperms.

(a) Head (b) Vacuole
(c) Acrosome
Fig. Fig. 14: Grad-CAM visual explanations for each label (warmer colors illustrate more attention)
Fig. Fig. 15: Diagram of distinct parts of a human sperm cell  [25].

Iv-H Discussion

Not only we achieved better accuracy test on three distinct labels of MHSMA dataset but also reached an outstanding precision, in comparison with prior proposed models by [19], [25], and Random Search. Furthermore, GeNAS found neural architectures with extremely fewer parameters for vacuole and head labels and approximately the same number of parameters on acrosome compared with the neural architecture introduced in [25] paper. However, discovered models have reached less recall value on the head and vacuole labels compared with previous papers; but, it should be considered that precision is remarkably more important than recall on this dataset.

In the domain of sperm morphology analysis, especially in abnormality detection of different segments of sperms, precision is more significant compared with other measurements, because by finding even one normal sperm and using it in intracytoplasmic sperm injection (ISCI) process, one can say that treatment process can be completed succesfully. Therefore, as discovered models gained exceptional precision value on all three labels, they could be used to take sperm morphology analysis a step further; and with collecting more data through clinical procedures, we could use this approach to tackle real-world sperm abnormality detection problems. Furthermore, our discovered models could be utilized to work with cheaper medical tools like microscopes which could only take low-quality images; because our models are trained to work with low-quality and noisy images. Besides, we could inject GeNAS to an Automl pipeline [21] which is particularly designed for imbalanced datasets to enhance its functionality. It is good to consider imbalanced datasets are mainly found in the medical industry [17]; so, Automl algorithm for imbalanced datasets eventually could be employed to solve medical-related classification problems.

In the light of known literature, there is no other neural architecture search algorithm designed and benchmarked on imbalanced datasets, specifically for sperm morphology analysis. Broadly, to the best of our knowledge, this is the first paper which introduces neural architecture search to the field of medical imaging [38]. Therefore, in future GeNAS and new neural architecture search algorithms can be utilized to tackle various problems in the medical imaging domain such as: breast cancer diagnosis [7]; tissue classification of interstitial lung diseases [9]; classification and detection of tumor cells [22].

GeNAS can be employed to tackle any binary and multi class image classification problem, however, it would be better to customize the constrained search space of GeNAS concerning computational power and dataset characteristics. In details, for datasets with high-resolution images, it is recommended to expand the range of stride-size both for convolutional and max-pooling layers; and for low-resolution images, the opposite is the case. Furthermore, due to our experiments, we hypothesis that increasing the number of generations result in discovering better architectures; therefore, with more computational power and time, better neural architectures could be achieved. Besides, it should be noted that for multi-class image classification problems, the softmax function should be used instead of the sigmoid function, in the last layer.

For future works in this area, multi-objective neural architecture search algorithms could be designed to maximize both accuracy and precision on the validation set; and also, for applications in the medical industry which should be done in real-time, inference time should be one of the objectives too. Furthermore, new modules could be added to search space such as skip-connections, or new and more efficient search spaces could be designed to tackle problems concentrated in medical imaging domain. Besides, further work could be done to automate the other components of the image classification process such as data cleaning and model selection, on imbalanced datasets. Finally, in addition to classification problems, new neural architecture search algorithms could be designed to tackle medical imaging such as image segmentation.

V Conclusions

We proposed a powerful and efficient algorithm termed Genetic Neural Architecture Search (GeNAS) for sperm abnormality detection. Besides, a novel fitness (objective) function named GeNAS weighting factor (GeNAS-WF) introduced to evaluate the appropriateness of each generated architecture by GeNAS. This objective function tends to work well with all neural architecture search algorithms designed to work with noisy, low quality, and imbalanced datasets. Furthermore, GeNAS could be employed to discover the best convolutional neural network architecture capable of ultimately tackling any image classification problem, specifically on noisy, low quality, and imbalanced datasets, which primarily appear in the medical imaging domain. Empirically, we proved that GeNAS can ascertain better state-of-the-art architectures, in terms of accuracy, precision, and measure, compared with prior hand-designed architectures, image processing approaches and random search with less amount of computational power and human effort, on all three acrosome, head, and vacuole labels of MHSMA dataset. Additionally, The architectures discovered by GeNAS have exceptionally fewer parameters on the head and vacuole labels. Finally, concerning the lack of Automl and neural architecture search algorithms in the field of medical imaging, we recommend that further research should be done in the emerging field of these domains.


  • [1] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al. (2016) Tensorflow: a system for large-scale machine learning. In OSDI, Vol. 16, pp. 265–283. Cited by: §IV-E.
  • [2] V. Abbiramy and V. Shanthi (2010) Spermatozoa segmentation and morphological parameter analysis based detection of teratozoospermia. International Journal of Computer Applications 3 (7), pp. 19–23. Cited by: §II-D.
  • [3] M. D. Abràmoff, P. J. Magalhães, and S. J. Ram (2004) Image processing with ImageJ. Biophotonics International 11 (7), pp. 36–42. Cited by: §II-D.
  • [4] E. Alegre, M. Biehl, N. Petkov, and L. Sánchez (2008) Automatic classification of the acrosome status of boar spermatozoa using digital image processing and LVQ. Computers in Biology and Medicine 38 (4), pp. 461–468. Cited by: §II-D.
  • [5] B. Baker, O. Gupta, N. Naik, and R. Raskar (2016) Designing neural network architectures using reinforcement learning. arXiv preprint arXiv:1611.02167. Cited by: §II-A, §II-A, §II.
  • [6] B. Baker, O. Gupta, R. Raskar, and N. Naik (2017) Accelerating neural architecture search using performance prediction. arXiv preprint arXiv:1705.10823. Cited by: §II-A.
  • [7] E. A. Bayrak, P. Kırcı, and T. Ensari (2019) Comparison of machine learning methods for breast cancer diagnosis. In 2019 Scientific Meeting on Electrical-Electronics & Biomedical Engineering and Computer Science (EBBT), pp. 1–3. Cited by: §IV-H.
  • [8] A. Bijar, A. P. Benavent, M. Mikaeili, et al. (2012) Fully automatic identification and discrimination of sperm’s parts in microscopic images of stained human semen smear. Journal of Biomedical Science and Engineering 5 (07), pp. 384. Cited by: §II-D.
  • [9] N. Bondfale and D. Bhagwat (2018) Convolutional neural network for categorization of lung tissue patterns in interstitial lung diseases. In 2018 Second International Conference on Inventive Communication and Computational Technologies (ICICCT), pp. 1150–1154. Cited by: §IV-H.
  • [10] K. Boumaza, A. Loukil, and K. Aarizou (2018) Automatic human sperm concentrartion in microscopic videos. Medical Technologies Journal 2 (4), pp. 301–307. Cited by: §II-D.
  • [11] H. Cai, T. Chen, W. Zhang, Y. Yu, and J. Wang (2018) Efficient architecture search by network transformation. Cited by: §I, §II-A.
  • [12] H. Cai, J. Yang, W. Zhang, S. Han, and Y. Yu (2018) Path-level network transformation for efficient architecture search. arXiv preprint arXiv:1806.02639. Cited by: §II-A.
  • [13] V. Chang, A. Garcia, N. Hitschfeld, and S. Härtel (2017) Gold-standard for computer-assisted morphological sperm analysis. Computers in biology and medicine 83, pp. 143–150. Cited by: §II-D.
  • [14] V. Chang, J. M. Saavedra, V. Castañeda, L. Sarabia, N. Hitschfeld, and S. Härtel (2014) Gold-standard and improved framework for sperm head segmentation. Computer Methods and Programs in Biomedicine 117 (2), pp. 225–237. Cited by: §II-D.
  • [15] F. Chollet et al. (2015) Keras. Note: Cited by: §IV-E.
  • [16] J. Deng, W. Dong, R. Socher, and L. Li (2009-06) ImageNet: a large-scale hierarchical image database. In

    2009 IEEE Conference on Computer Vision and Pattern Recognition

    Vol. , pp. 248–255. External Links: Document, ISSN 1063-6919 Cited by: §I.
  • [17] A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui, G. Corrado, S. Thrun, and J. Dean (2019) A guide to deep learning in healthcare. Nature medicine 25 (1), pp. 24. Cited by: §IV-H.
  • [18] E. S. Gardner Jr (1985) Exponential smoothing: the state of the art. Journal of forecasting 4 (1), pp. 1–28. Cited by: §IV-F.
  • [19] F. Ghasemian, S. A. Mirroshandel, S. Monji-Azad, M. Azarnia, and Z. Zahiri (2015) An efficient method for automatic morphological abnormality detection from human sperm images. Computer methods and programs in biomedicine 122 (3), pp. 409–420. Cited by: §II-D, §IV-A, §IV-F, §IV-H.
  • [20] T. B. Haugen, J. M. Andersen, O. Witczak, H. L. Hammer, S. A. Hicks, R. J. Borgli, P. Halvorsen, and M. A. Riegler (2019) VISEM: a multimodal video dataset of human spermatozoa. Cited by: §II-D.
  • [21] X. He, K. Zhao, and X. Chu (2019) AutoML: a survey of the state-of-the-art. arXiv preprint arXiv:1908.00709. Cited by: §IV-H.
  • [22] E. Hossain and M. A. Rahaman (2018) Detection & classification of tumor cells from bone mr imagery using connected component analysis & neural network. In 2018 International Conference on Advancement in Electrical and Electronic Engineering (ICAEEE), pp. 1–4. Cited by: §IV-H.
  • [23] H. O. Ilhan and N. Aydin (2018) A novel data acquisition and analyzing approach to spermiogram tests. Biomedical Signal Processing and Control 41, pp. 129–139. Cited by: §II-D.
  • [24] A. Isidori, M. Latini, and F. Romanelli (2005) Treatment of male infertility. Contraception 72 (4), pp. 314–318. Cited by: §I.
  • [25] S. Javadi and S. A. Mirroshandel (2019) A novel deep learning method for automatic assessment of human sperm images. Computers in biology and medicine 109, pp. 182–194. Cited by: §I, §II-D, Fig. Fig. 15, §IV-A, §IV-A, §IV-B, §IV-C, §IV-F, §IV-H, TABLE Table 3.
  • [26] H. Jin, Q. Song, and X. Hu (2018) Efficient neural architecture search with network morphism. arXiv preprint arXiv:1806.10282. Cited by: §II-B, §II.
  • [27] K. Kandasamy, W. Neiswanger, J. Schneider, B. Poczos, and E. Xing (2018) Neural architecture search with bayesian optimisation and optimal transport. arXiv preprint arXiv:1802.07191. Cited by: §I, §II-B, §II.
  • [28] D. P. Kingma and J. Ba (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980. Cited by: §IV-E.
  • [29] H. Kitano (1990) Designing neural networks using genetic algorithms with graph generation system. Complex systems 4 (4), pp. 461–476. Cited by: §II-C.
  • [30] G. Klambauer, T. Unterthiner, A. Mayr, and S. Hochreiter (2017) Self-normalizing neural networks. In Advances in neural information processing systems, pp. 971–980. Cited by: §IV-D.
  • [31] R. Kotikalapudi et al. (2017) Keras-vis. GitHub. Note: Cited by: §IV-G.
  • [32] A. Krizhevsky, G. Hinton, et al. (2009) Learning multiple layers of features from tiny images. Technical report Citeseer. Cited by: §I.
  • [33] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105. Cited by: §I.
  • [34] A. Krizhevsky, I. Sutskever, and G. E. Hinton (2012) ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, F. Pereira, C. J. C. Burges, L. Bottou, and K. Q. Weinberger (Eds.), pp. 1097–1105. External Links: Link Cited by: §IV-D.
  • [35] Y. A. LeCun, L. Bottou, G. B. Orr, and K. Müller (2012) Efficient backprop. In Neural Networks: Tricks of the Trade: Second Edition, G. Montavon, G. B. Orr, and K. Müller (Eds.), pp. 9–48. External Links: ISBN 978-3-642-35289-8, Document, Link Cited by: §IV-E.
  • [36] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner (1998) Gradient-based learning applied to document recognition. In Proceedings of the IEEE, pp. 2278–2324. Cited by: §IV-D.
  • [37] J. Li, K. Tseng, H. Dong, Y. Li, M. Zhao, and M. Ding (2014) Human sperm health diagnosis with principal component analysis and k-nearest neighbor algorithm. In Medical Biometrics, 2014 International Conference on, pp. 108–113. Cited by: §II-D.
  • [38] G. J. S. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. W. M. van der Laak, B. van Ginneken, and C. I. Sánchez (2017) A survey on deep learning in medical image analysis. CoRR abs/1702.05747. External Links: Link, 1702.05747 Cited by: §IV-H.
  • [39] C. Liu, B. Zoph, J. Shlens, W. Hua, L. Li, L. Fei-Fei, A. Yuille, J. Huang, and K. Murphy (2017) Progressive neural architecture search. arXiv preprint arXiv:1712.00559. Cited by: §II.
  • [40] H. Liu, K. Simonyan, O. Vinyals, C. Fernando, and K. Kavukcuoglu (2017) Hierarchical representations for efficient architecture search. arXiv preprint arXiv:1711.00436. Cited by: §I, §II-C.
  • [41] L. Maree, S. Du Plessis, R. Menkveld, and G. Van der Horst (2010) Morphometric dimensions of the human sperm head depend on the staining method used. Human Reproduction 25 (6), pp. 1369–1382. Cited by: §II-D.
  • [42] C. McCallum, J. Riordon, Y. Wang, T. Kong, J. B. You, S. Sanner, A. Lagunov, T. G. Hannam, K. Jarvi, and D. Sinton (2019) Deep learning-based selection of human sperm with high dna integrity. Communications biology 2 (1), pp. 250. Cited by: §II-D.
  • [43] G. Meyer-Lee, H. Uppili, and A. Z. Zhao (2017) Evolving deep neural networks. CoRR abs/1703.00548. External Links: Link Cited by: §I, §II-C, §II.
  • [44] B. L. Miller, B. L. Miller, D. E. Goldberg, and D. E. Goldberg (1995) Genetic algorithms, tournament selection, and the effects of noise. Complex Systems 9, pp. 193–212. Cited by: §III-E.
  • [45] G. F. Miller, P. M. Todd, and S. U. Hegde (1989) Designing neural networks using genetic algorithms.. In ICGA, Vol. 89, pp. 379–384. Cited by: §I, §II-C, §II.
  • [46] N. Otsu (1979) A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics 9 (1), pp. 62–66. Cited by: §II-D.
  • [47] H. Pham, M. Y. Guan, B. Zoph, Q. V. Le, and J. Dean (2018) Efficient neural architecture search via parameter sharing. arXiv preprint arXiv:1802.03268. Cited by: §I, §II-A.
  • [48] L. Ramos, P. de Boer, E. J. Meuleman, D. D. Braat, and A. M. Wetzels (2004) Evaluation of ICSI-selected epididymal sperm samples of obstructive azoospermic males by the CKIA system. Journal of Andrology 25 (3), pp. 406–411. Cited by: §II-D.
  • [49] E. Real, A. Aggarwal, Y. Huang, and Q. V. Le (2018) Regularized evolution for image classifier architecture search. arXiv preprint arXiv:1802.01548. Cited by: §I, §II-C.
  • [50] E. Real, S. Moore, A. Selle, S. Saxena, Y. L. Suematsu, J. Tan, Q. Le, and A. Kurakin (2017) Large-scale evolution of image classifiers. arXiv preprint arXiv:1703.01041. Cited by: §I, §II-C.
  • [51] J. Riordon, C. McCallum, and D. Sinton (2019) Deep learning for the classification of human sperm. Computers in Biology and Medicine, pp. 103342. Cited by: §II-D.
  • [52] L. Sánchez, N. Petkov, and E. Alegre (2005) Statistical approach to boar semen head classification based on intracellular intensity distribution. In International Conference on Computer Analysis of Images and Patterns, pp. 88–95. Cited by: §II-D.
  • [53] L. Sanchez, N. Petkov, and E. Alegre (2006) Statistical approach to boar semen evaluation using intracellular intensity distribution of head images. Cellular and Molecular Biology 52 (6), pp. 38–43. Cited by: §II-D.
  • [54] J. D. Schaffer, D. Whitley, and L. J. Eshelman (1992) Combinations of genetic algorithms and neural networks: a survey of the state of the art. In Combinations of Genetic Algorithms and Neural Networks, 1992., COGANN-92. International Workshop on, pp. 1–37. Cited by: §II-C.
  • [55] M. Schrimpf, S. Merity, J. Bradbury, and R. Socher (2017) A flexible approach to automated rnn architecture generation. arXiv preprint arXiv:1712.07316. Cited by: §II.
  • [56] R. R. Selvaraju, A. Das, R. Vedantam, M. Cogswell, D. Parikh, and D. Batra (2016) Grad-CAM: why did you say that?. arXiv preprint arXiv:1611.07450. Cited by: §IV-G.
  • [57] B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. De Freitas (2016) Taking the human out of the loop: a review of bayesian optimization. Proceedings of the IEEE 104 (1), pp. 148–175. Cited by: §I.
  • [58] F. Shaker (2017) Human sperm head morphology dataset (hushem). Mendeley Data. Cited by: §II-D.
  • [59] J. Snoek, H. Larochelle, and R. P. Adams (2012) Practical bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pp. 2951–2959. Cited by: §I.
  • [60] J. Snoek, O. Rippel, K. Swersky, R. Kiros, N. Satish, N. Sundaram, M. Patwary, M. Prabhat, and R. Adams (2015) Scalable bayesian optimization using deep neural networks. In International Conference on Machine Learning, pp. 2171–2180. Cited by: §I, §II-B.
  • [61] K. O. Stanley and R. Miikkulainen (2002) Evolving neural networks through augmenting topologies. Evolutionary computation 10 (2), pp. 99–127. Cited by: §I, §II-C, §II.
  • [62] K. Stouffs, H. Tournaye, J. Van der Elst, I. Liebaers, and W. Lissens (2008) Is there a role for the nuclear export factor 2 gene in male infertility?. Fertility and sterility 90 (5), pp. 1787–1791. Cited by: §I, §II.
  • [63] M. Suganuma, S. Shirakawa, and T. Nagao (2017)

    A genetic programming approach to designing convolutional neural network architectures

    CoRR abs/1704.00764. External Links: Link, 1704.00764 Cited by: §I, §II.
  • [64] C. Szegedy, S. Ioffe, and V. Vanhoucke (2016)

    Inception-v4, inception-resnet and the impact of residual connections on learning

    CoRR abs/1602.07261. External Links: Link, 1602.07261 Cited by: §I.
  • [65] S. Vicente-Fiel, I. Palacin, P. Santolaria, and J. Yániz (2013) A comparative study of sperm morphometric subpopulations in cattle, goat, sheep and pigs using a computer-assisted fluorescence method (CASMA-F). Animal Reproduction Science 139 (1-4), pp. 182–189. Cited by: §II-D.
  • [66] T. Wei, C. Wang, Y. Rui, and C. W. Chen (2016) Network morphism. In International Conference on Machine Learning, pp. 564–572. Cited by: §II-B.
  • [67] L. Xie and A. L. Yuille (2017) Genetic cnn.. In ICCV, pp. 1388–1397. Cited by: §I, §II-C.
  • [68] J. Yániz, S. Vicente-Fiel, S. Capistrós, I. Palacín, and P. Santolaria (2012) Automatic evaluation of ram sperm morphometry. Theriogenology 77 (7), pp. 1343–1350. Cited by: §II-D.
  • [69] M. D. Zeiler and R. Fergus (2013) Visualizing and understanding convolutional networks. CoRR abs/1311.2901. External Links: Link, 1311.2901 Cited by: §IV-D.
  • [70] Y. Zhang (2017) Animal sperm morphology analysis system based on computer vision. In 2017 Eighth International Conference on Intelligent Control and Information Processing (ICICIP), pp. 338–341. Cited by: §II-D.
  • [71] Z. Zhong, J. Yan, W. Wu, J. Shao, and C. Liu (2018) Practical block-wise neural network architecture generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2423–2432. Cited by: §II-A.
  • [72] B. Zoph and Q. V. Le (2016) Neural architecture search with reinforcement learning. arXiv preprint arXiv:1611.01578. Cited by: §I, §II-A, §II.
  • [73] B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le (2017) Learning transferable architectures for scalable image recognition. arXiv preprint arXiv:1707.07012 2 (6). Cited by: §II-A.