1 Introduction
In this study, we focus on Alzheimer’s disease (AD) as outcome of interest, which is a progressive neurodegenerative disease, appears lateonset and sporadic in most cases, and is the main cause of dementia in the elderly. As the cognitive symptoms emerge years after the appearance of brain atrophy and exhibit close correlation with the structural changes, brain magnetic resonance imaging (MRI) scans provide a direct way to obtain informative quantitative traits, and fast automated approaches are necessary for largescale studies. AD has a high estimated heritability of 74%
gatz1997 and a prevalence of 4.4% in Europe lobo2000. However, the biological pathways underlying AD have not been wellunderstood and there is yet no known cure. Hence, the identification of AD markers for early detection and as targets for treatment is important.For the detection of causal genetic loci, recent sequencing efforts allow indepth analyses of rare variants in large cohorts, and kernelbased genelevel tests have been proposed for the analysis wu2011rare; lee2012optimal; listgarten2013powerful; lippert2014greater; urrutia2015
. They derive similarity scores between samples in the form of a kernel matrix which is computed on a particular genomic locus or functional unit in the genome. Then, kernelbased variancecomponent test statistics are derived that yield robust and powerful tests. Kernel functions provide a highly flexible way to model genetic variation. However, their full capabilities have not been leveraged and existing approaches still provide suboptimal performance for the analysis of sequencing data
konigorski2017comparison, where the overwhelming majority of genetic variation is extremely rare. Hence extensions to the existing methods are warranted that leverage the full power of kernels to aggregate the signal of very rare single nucleotide variants (SNVs).Our contributions in this paper are in two areas. First, we use a convolutional neural network (CNN) to derive quantitative traits from MRI scans, and evaluate if precise traits are obtained on data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI), where also traits obtained by the popular yet computationally expensive FreeSurfer software fischl2002 are available. Second, we propose novel kernels for association tests of rare genetic variants that incorporate prior biological knowledge from annotations and multiomics measures. We perform association analyses between these novel kernels computed on sequencing data and CNNderived traits to identify genetic loci associated with AD.
1.1 Related work
The association of a set of genetic markers with a quantitative trait with
observations can be tested in a linear mixed model of the form
(1) 
where is a covariate design matrix with fixed effects , are random effects of the SNVs in design matrix accounting for population stratification,
is the identity matrix,
are random effects of the SNVs of interest in the design matrix , and are error terms. Hence(2) 
where the kernel matrix describes the similarity between individuals based on the SNVs of interest. The association of the SNVs (i.e. or vs. ) can be tested using score or likelihood ratio tests. Binary or count traits can be analyzed similarly.
Popular kernelbased tests include FaSTLMMSet listgarten2013powerful; lippert2014greater, the sequence kernel association test (SKAT, wu2011rare) and optimal SKAT (SKATO, lee2012optimal), which are based on weighted linear kernels wu2011rare; listgarten2013powerful; lippert2014greater, or a linear combination of weighted linear and collapsing kernels lee2012optimal. Newer approaches urrutia2015 derive further dataadaptive combinations of linear, quadratic, IBS, and collapsing kernels. However, all these kernels provide suboptimal performance for the analysis of very rare genetic variants. Here, linear and quadratic kernels yield uninformative similarity measures (i.e., diagonal kernel matrices for singletons, which are variants with only one observed copy of the minor allele) and collapsing kernels often yield unspecific signals and aggregate noise.
2 New kernelbased tests for very rare genetic variants
To leverage the full power of kernels computing similarities in highdimensional Hilbert space, whereto genetic variants are mapped through a potentially infinitedimensional basis function , we consider the more general linear mixed model
(3) 
Here, and
are normally distributed random effects. After integrating out
, and , it follows that is normally distributed with covariance , where we have defined the kernel matrix . In this model, established tests lee2012optimal; listgarten2013powerful; lippert2014greater can be used to test the association between sets of SNVs and the phenotype, see Figure 1 for an illustration.2.1 Examples of new kernels
Let be the matrix of the SNVs of interest. We define a class of kernel matrices as
(4) 
where different instances are obtained by setting the weight and similarity matrices , to the identity, to the matrices outlined below, or any combination of these. See Appendix A for details.
Incorporate annotations
Set where is the numeric matrix encoding characteristics of the SNVs, such as the minor allele frequency (MAF), genomic position, or functional annotations from PolyPhen2 adzhubei2010, RegulomeDB boyle2012, or others. Set the elements of to (i) describe the similarity of SNVs and in terms of genomic closeness, or (ii) indicate whether SNVs and have a (or the same) functional annotation.
Incorporate information from available omics data
Set where is the matrix (i) containing log pvalues of association tests of the SNVs with omics data e.g. gene expression levels of genes or (ii) indicating for each of these pvalues if they are , where is prespecified constant. Set the elements of to be indicators whether SNVs and both have pvalue .
3 Application: analysis of ADNI study
In the application, we analyzed wholegenomesequencing data, gene expression measures, MRI data as well as AD biomarkers in
participants from ADNI, which is a longitudinal study to detect biomarkers and risk factors for AD
weiner2010; weiner2012.In a first step, we designed a 3dimensional CNN comprising seven convolutional layers followed by a max pooling layer and a final fullyconnected layer to predict the volume of the 3
ventricle from the MRI scans (see Figure 2 for an illustration and see Appendix B, Figure C for details). To evaluate the approach, we chose the 3 ventricle, as we found that the ventricular regions were displayed with a higher contrast and presumably easier to identify. The CNN predicted volume was then used as a quantitative trait in the following genetic association analyses, and evaluated against the predictions by the FreeSurfer software. Both models where trained on a dual Intel Xeon 6148 workstation equipped with an NVidia TitanV graphics card.In the main genetic association analysis we analyzed 17,013 (qualitycontrolled, biallelic, missingness 5%, of any MAF) SNVs in 125 genes in the 1Mbp region around the APOE gene on chromosome 19, similar to the study in nho2017, to investigate rare variants in a genomic region where several common variants have been associated with AD. We performed crosssectional association tests of these 125 genes with 9 different AD traits (peptides CSF A, ttau, ptau, and the provided brain volumes of entorhinal cortex, hippocampus, medial temporal lobe, ventricles, 3 ventricle predicted by FreeSurfer, and 3 ventricle predicted from the CNN) adjusting for the covariates age, gender, education, ethnicity, and APOE4 allele. The association tests were performed based on different combinations of the new proposed kernels (in Appendix A) and using standard SKAT and SKATO.
3.1 Results
The 125 genes contained on average 220 SNVs (, ). Of the 17,013 SNVs, 7575 were singletons, 1740 doubletons, and 12,337 SNVs had MAF . 24 participants had dementia, 338 mild cognitive impairment, 194 were cognitive normal (see Table C
for descriptive statistics).
In an evaluation of the predicted volume of the 3 ventricle, CNN and FreeSurfer predictions showed a high correlation (Pearson , see Figure C). For small/large volumes, compared to FreeSurfer, CNN slightly over/underestimated the volume, which we expect to disappear with larger training data. On the other hand, CNN was much faster (1 second versus 16 hours per scan).
In the main genetic association analyses, a first comparison showed that analyses using the CNNpredicted trait as outcome generally yielded similar and often smaller pvalues compared to the FreeSurferpredicted trait (Figures CC). Preliminary comparisons of all new kernels indicated that the three kernels reported in Table 1 yielded often the smallest pvalues in genebased tests, hence they are reported here. Tests based on the new kernel 1 yielded consistently smaller or similar pvalues for the top genes compared to SKAT and SKATO for 8 out of 9 traits (Table 1). More detailed comparisons (Figure C) indicated that while often the same genes were identified with smallest pvalue by tests based on the new kernel 1 and by SKAT or SKATO, the new kernel 1 also yielded different candidate genes that would not have been identified by SKAT or SKATO (and vice versa). The new kernels 2 and 3 yielded sometimes larger but also sometimes much smaller pvalues.
Using a Bonferroni correction (for the 125 tests) of the pvalues of the new kernelbased tests, we identified 3 candidate genes for AD with adjusted pvalues 0.007, 0.05, 0.07: PVR for CSF ttau, SIX5 for entorhinal cortex and PVRL2 for hippocampus.
Trait  SKATO  SKAT  New Kernel 1  New Kernel 2  New Kernel 3 

CSF ttau  9.1  5.8  5.2  6.4  4.0 
CSF ptau  1.5  9.3  8.5  1.2  1.9 
CSF A  4.9  9.9  4.9  2.1  1.7 
Entorhinal cortex  7.0  3.3  4.2  1.6  3.4 
Hippocampus  6.6  3.7  3.8  5.6  2.5 
Medtemporal lobe  1.1  3.7  1.4  2.3  4.6 
Ventricles  1.5  9.3  1.0  9.6  5.5 
FreeS 3 Ventricle  6.8  6.5  8.0  3.4  5.1 
CNN 3 Ventricle  1.9  5.9  5.9  1.7  2.0 
4 Discussion
The empirical analyses indicated that (i) CNNs provide a precise, fast and scalable tool to derive quantitative traits from MRI scans and that (ii) new kernels integrating domain knowledge and omics data constitute a promising approach for the analysis of very rare variants. There is previous evidence for the association of the identified genes with AD Marioni2018; kwok2018; Hao2018; Beecham2014 to support our findings, and of note, the pvalues are much smaller using the new kernels here compared to regular kernels nho2017. Limitations of the current analyses are that only few functional annotations are available for rare SNVs, and that only a basic control for population stratification was used. In the interpretation of the results regarding their biological relevance, it can be noted that the analyses were adjusted for the risk factor APOE4, so that the identified genes and SNVs represent markers with independent effects on AD. Future research can investigate kernels measuring the similarity between the bivariate allelic sequences directly, and dataadaptive optimal combinations of different kernels.
References
Acknowledgments
Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database adni.loni.usc.edu. As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wpcontent/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf. Data collection and sharing of ADNI was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; BioClinica Inc; Biogen Idec Inc; BristolMyers Squibb Company; Eisai Inc; Elan Pharmaceuticals Inc; Eli Lilly and Company; F. HoffmannLa Roche Ltd and its affiliated company Genentech Inc; GE Healthcare; Innogenetics N.V.; IXICO Ltd; Janssen Alzheimer Immunotherapy Research & Development LLC; Johnson & Johnson Pharmaceutical Research & Development LLC; Medpace Inc; Merck & Co Inc; Meso Scale Diagnostics LLC; NeuroRx Research; Novartis Pharmaceuticals Corporation; Pfizer Inc; Piramal Imaging; Servier; Synarc Inc; and Takeda Pharmaceutical Company. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (http://www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Disease Cooperative Study at the University of California, San Diego. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California. Samples from the National Cell Repository for AD (NCRAD), which receives government support under a cooperative agreement grant (U24 AG21886) awarded by the National Institute on Aging (AIG), were used in this study. Funding for the WGS was provided by the Alzheimer’s Association and the Brin Wojcicki Foundation.
Supplementary material
A Details on new kernels
Let be the number of observations and the number of SNVs of interest. Define the kernel as in equation (4) by setting the matrices and to or the following.
Consider the weight matrices where is

a vector with entries , where
is the probability density function of the beta distribution with parameters 1 and 25, and
is the minor allele frequency of SNV . 
a vector where each entry is an indicator whether SNV has a functional annotation in, for example, the PolyPhen2 database.

a vector where each entry is a numeric encoding of the functional annotation of SNV , for example, in the PolyPhen2 database:

a vector where each entry is the log pvalue from a hypothesis test of the association between SNV and a variable providing relevant information about its biological function, e.g. where is the gene expression of the gene in which the SNV lies.

a vector where each entry is the sum of 1 and an indicator variable whether SNV is associated with a variable providing relevant information about its biological function as in the bullet point above, e.g. evaluating whether the pvalue from a hypothesis test of the association between SNV with a variable is smaller than 0.05.

the Hadamard product of the vectors in bullet points (1 and 4) or (1 and 5).

the sum of the vectors in bullet points (2 or 3) and (4 or 5).

the sum of the vectors in bullet points (2 or 3) and 6.
Consider the matrices describing the similarity of SNVs where

similarity of SNVs and in terms of genomic closeness:
where is the genomic distance between SNVs and in base pairs.

indicator whether SNVs and both have a functional annotation:

indicator whether SNVs and have the same functional annotation.

indicator whether SNVs and have pvalue specified cutoff value .
where the pvalue of SNV is from an association test with a variable that provides relevant information about its biological function, e.g. where is the gene expression of the gene in which the SNV lies.

is the product of the matrices in bullet points (1 and (2 or 3)), (1 and 4), or (4 and (2 or 3))

is the product of the matrices in bullet points 1 and (2 or 3) and 4.
B Details on convolutional neural networks
Model architecture
The model architecture is illustrated in Figure 2, and in more detail in Figure C
. We designed a CNN made of a sequence of seven convolutional layers followed by a max pooling layer and a fullyconnected layer. We used two types of convolutional layers: Regular and DownConvolution. Regular convolutional layers comprised a 3 x 3 x 3 convolutional operation with 1 x 1 x 1 strides. The downconvolutional layers comprised a 2 x 2 x 2 convolutional operation with 2 x 2 x 2 strides. Each convolutional layer was followed by a Rectified Linear Unit nonlinearity
[nair2010]. After the last convolutional layer, we used a max pooling layer with a filter size of 2 x 2 x 2. Subsequently, this layer was converted into a fullyconnected layer, followed by the output layer containing a single node with a linear activation function.
Model implementation
The MRI scans were standardized to the spatial resolution of 1 x 1 x 1 millimeters and the size of 256 x 256 x 256 voxels. Additionally, for computational efficiency, they were cropped and downsampled to 96 x 109 x 96 voxels.
The model was trained on 2100 MRI scans (from 411 subjects) for 200 epochs with the loss function set to the mean absolute error using the Adaptive Moment Estimation optimizer
[kingma2014], a learning rate of and a 3D spatial drop out regularization of 0.9.Hyperparameter tuning was carried out on a validation dataset comprising 550 scans of 129 subjects that all had MRI data but did not have genetic data available so that they could not be included in the main analysis. The final evaluation was done on the test set including the 556 subjects of the main analysis that had all MRI, genetic, and gene expression data available. The model performance on the test set is visualized in Figure C.
Computational comparison with FreeSurfer
Both models where trained on a dual 20 core Intel Xeon 6148 workstation with 768GB RAM equipped with an NVidia TitanV graphics card. CNN computations made use of GPU optimization, taking 1 second for the prediction of the volume of the third ventricle per MRI scan. FreeSurfer, which did not utilize the GPU, took 16 hours per MRI scan.
C Supplementary tables and figures
[!h]
individuals in the analyzed sample from the ADNI study. Shown are absolute frequencies for categorical variables, and mean (standard deviation) for quantitative measures.
Measures Descriptive Statistics Sample size 556 Age, years 72.9 (7.0) Gender female 250 male 306 Ethnic group hispanic/latino 10 not hispanic/latino 545 unknown 1 Education length, years 16.1 (2.8) APOE 4 allele homozygot minor allele 331 heterozygot 186 homozygot major allele 39 Cognitive status cognitive normal 194 mild cognitive impairment 338 dementia 24Graphical visualization of the 3D convolutional neural network model in Keras. Shown are input and output of the different layers, and the respective voxels and channels. For example, the input volume had 96
109 96 voxels and 1 channel. As all computations were done in one batch, the batch size was not specified (noted as "None" in the graph).
[!b]
[!b]
[!b]