Probing the Effect of Selection Bias on NN Generalization with a Thought Experiment
Learned networks in the domain of visual recognition and cognition impress in part because even though they are trained with datasets many orders of magnitude smaller than the full population of possible images, they exhibit sufficient generalization to be applicable to new and previously unseen data. Although many have examined issues regarding generalization from several perspectives, we wondered If a network is trained with a biased dataset that misses particular samples corresponding to some defining domain attribute, can it generalize to the full domain from which that training dataset was extracted? It is certainly true that in vision, no current training set fully captures all visual information and this may lead to Selection Bias. Here, we try a novel approach in the tradition of the Thought Experiment. We run this thought experiment on a real domain of visual objects that we can fully characterize and look at specific gaps in training data and their impact on performance requirements. Our thought experiment points to three conclusions: first, that generalization behavior is dependent on how sufficiently the particular dimensions of the domain are represented during training; second, that the utility of any generalization is completely dependent on the acceptable system error; and third, that specific visual features of objects, such as pose orientations out of the imaging plane or colours, may not be recoverable if not represented sufficiently in a training set. Any currently observed generalization in modern deep learning networks may be more the result of coincidental alignments and whose utility needs to be confirmed with respect to a system's performance specification. Our Thought Experiment Probe approach, coupled with the resulting Bias Breakdown can be very informative towards understanding the impact of biases.
READ FULL TEXT