Biological sex classification with structural MRI data shows increased misclassification in transgender women

11/24/2019
by   Claas Flint, et al.
0

Transgender individuals show brain structural alterations that differ from their biological sex as well as their perceived gender. To substantiate evidence that the brain structure of transgender individuals differs from male and female, we use a combined multivariate and univariate approach. Gray matter segments resulting from voxel-based morphometry preprocessing of N = 1753 cisgender (CG) healthy participants were used to train (N = 1402) and validate (20 As a second validation, we classified N = 1104 patients with depression. A third validation was performed using the matched CG sample of the transgender women (TW) application sample. Subsequently, the classifier was applied to N = 25 TW. Finally, we compared brain volumes of CG-men, women and TW pre/post treatment (CHT) in a univariate analysis controlling for sexual orientation, age and total brain volume. The application of our biological sex classifier to the transgender sample resulted in a significantly lower true positive rate (TPR-male = 56.0 (TPR-male = 86.9 analysis of the transgender application sample revealed that TW pre/post treatment show brain structural differences from CG-women and CG-men in the putamen and insula, as well as the whole-brain analysis. Our results support the hypothesis that brain structure in TW differs from brain structure of their biological sex (male) as well as their perceived gender (female). This finding substantiates evidence that transgender individuals show specific brain structural alterations leading to a different pattern of brain structure than CG individuals.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 9

02/08/2022

Model and predict age and sex in healthy subjects using brain white matter features: A deep learning approach

The human brain's white matter (WM) structure is of immense interest to ...
05/31/2018

Effect of antipsychotics on community structure in functional brain networks

Schizophrenia, a mental disorder that is characterized by abnormal socia...
03/10/2021

BrainNetGAN: Data augmentation of brain connectivity using generative adversarial network for dementia classification

Alzheimer's disease (AD) is the most common age-related dementia. It rem...
05/08/2022

Accelerated functional brain aging in major depressive disorder: evidence from a large scale fMRI analysis of Chinese participants

Major depressive disorder (MDD) is one of the most common mental health ...
02/01/2014

What Is It Like to Be a Brain Simulation?

We frame the question of what kind of subjective experience a brain simu...
01/26/2017

Structural Connectome Validation Using Pairwise Classification

In this work, we study the extent to which structural connectomes and to...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Being transgender describes the stable feeling to belong to the opposite rather than the biological sex assigned at birth, while the term cisgender (CG) describes the feeling of coherence between biological sex and perceived gender. If a transgender individual suffers from distress due to incoherence between perceived gender and biological sex, the DSM 5 allows the diagnosis of gender dysphoria [AmericanPsychiatricAssociation2013]. Distinguishing between being a transgender individual, and suffering from gender dysphoria, is important to destigmatize transgender individuals. A diagnosis of gender dysphoria allows transgender individuals access to psychiatric treatment if distress is experienced.

Although there is an ongoing social and political debate regarding the terms and phrases used to describe gender, little is known about how a divergence between biological sex and perceived gender emerges. A popular view is that sexual brain differentiation and body development are incoherent in transgender individuals [Zhou1995]. Evidence for this comes from studies in female infants with congenital adrenal hyperplasia, who develop male playing behavior [Meyer-Bahlburg1996, Mathews2009]. Due to prenatally circulating testosterone, the brain of such female infants is structurally organized as a male brain, while their body development is female [Meyer-Bahlburg1996, Mathews2009]. Female infants with this condition often identify themselves as a member of the male gender, even if congenital adrenal hyperplasia is treated after birth. According to this view, the brain of transgender individuals would be organized incoherently to their body already at birth [Bao2011, VanGoozen1995, Zhou1995].

Previous research provides extensive information on how brain structure differs as a function of biological sex. Briefly, sex differences in CG-individuals are most notable in areas concerning emotion perception/regulation, reward, and motor control [Ruigrok2014]. While CG-men show higher gray matter volume in general, CG-women show larger volume of limbic structures. However, sexual differentiation seems less prominent in the brain compared to physical appearance [Cahill2006, McCarthy2011]. While sexual development appears to be dimorphic, the brain is responsible for many functions that are shared by males and females [Spizzirri2018]. Hence, brains cannot easily be classified into dimorphic categories, as is the case for physical appearance [Joel2015].

Multivariate and univariate analysis of brain structure in transgender individuals have been used to substantiate evidence towards gender- and sex-specific brain structural alterations. However, the investigation of structural brain alterations in transgender individuals is challenging, since it is difficult to control for important confounders, such as hormonal treatment (CHT), sexual orientation or comorbid psychiatric disorders (e.g. Major Depressive Disorder (MDD)). Low sample sizes minimize statistical power to detect structural brain changes.

Few ROI-based approaches have studied how brain structure of transgender individuals differs from CG. Compared to CG-men, transgender women (biological sex male, perceived gender female, TW) show structural alterations of the putamen [Luders2009], the temporo-parietal junction, the inferior frontal cortex and the insulae [Savic2011], as well as the angular gyrus and the inferior parietal lobulus [Simon2013]. Elevated cortical thickness in transgender individuals compared to CG is the only result replicated in three studies [Luders2012, Zubiaurre-Elorza2014]. However, all previous results fit with the idea that structural brain changes in areas involved in body perception (e.g. insula, putamen) are associated with the feeling of incoherence between biological sex and perceived gender. The reported studies only investigated individuals before cross-sex hormone treatment (CHT). Comparisons between TW pre/post CHT with CG individuals exhibited heterogeneous results [Mueller2017, Altinay2019, Seiger2016, Nguyen2018, Nguyen2018a, WhiteHughto2016, Spizzirri2018]. CHT in TW combines treatment with anti-androgens and estradiol that are associated with region-specific structural alterations of the brain [Kranz2017]. CHT has been associated with volume and cortical thickness decreases e.g. in regions associated with emotional learning [Mueller2017a, Seiger2016]

. However, longitudinal studies are scarce and a recent large study did not find any differences between TW pre and post CHT

[Nguyen2018, Spizzirri2018]. Next to univariate analyses, multivariate approaches offer new insights into the similarities and differences between cisgender and transgender individuals [Hoekzema2015, Baldinger-Melich2019]. In two studies, pattern classification was used to investigate whether transgender adolescents could be separated from CG-adolescents by their patterns of volumetric differences. Both cases show decreased accuracy in biological sex classification in transgender individuals compared to CG individuals. Specifically, in one study, a trained biological sex classifier reached accuracy in the CG-sample, but the accuracy for transgender adolescents was considerably lower (below ), whether treated or not. However, it has been recently criticized that classifiers trained with small sample sizes lead to high accuracies, but low external validity, especially when applied to small samples [Varoquaux2018].

Hence, in the present study, we trained and validated a biological sex classifier with large samples of cisgender controls without any psychiatric comorbidities. We then applied the classifier to a smaller sample of TW and CG-men and -women, whose data were recorded at the same time and in the same scanner. To ensure that observed misclassification is not caused or biased by psychiatric comorbidity, we performed a second validation of the classifier in an additional large validation sample with MDD patients. A third validation was performed in a matched CG sample of the TW application sample. Our hypotheses for the multivariate analysis are:

  • The classifier trained on healthy CG-participants shows significantly worse performance when applied to a sample of TW

  • The classifier trained on healthy CG-participants performs equally well in a validation sample of CG-patients suffering from major

Following our multivariate approach, we used a univariate analysis in two regions of interest that have been associated with brain structural alterations in TW. We investigated local structural brain alterations in the putamen and the insula [Kranz2015, Burke2018, Spizzirri2018, Savic2011, Luders2009, Zubiaurre-Elorza2014] corrected for total intracranial volume, age and sexual orientation. Previous results regarding brain structural alterations of TW in the respective regions have been heterogeneous. Since TW differ in brain structure from CG-men and –women, we hypothesize that

  • CG-women show lower volume in comparison to CG-men in both regions of interest [Ruigrok2014].

  • TW pre and post CHT show increased volume in comparison to CG-women (perceived gender of TW)

  • TW pre and post CHT show lower volume in comparison to CG-men (biological sex of TW)

  • Since we expect CHT to lead to a further feminization of brain structure and hence reduced volume, we hypothesize that TW pre CHT show higher volume in comparison to TW post CHT.

2 Materials and Method

2.1 Procedure

To obtain a predictor for biological sex based on structural MRI brain scans, a pipeline was created that optimizes a support vector machine (SVM). This classifier was trained on a large sample of CG-individuals without any psychiatric disorder. To achieve an optimal training result, the parameters of the SVM (hyperparameters) were refined using a Bayesian method with nested 10-fold cross-validation. An independent random sample of

, drawn from the population in advance, served as the first validation set, to avoid the risk of overfitting during hyperparameter optimization (supplementary Figure 4). To rule out that depressive symptoms influence the performance of the predictor in our TW group, we used a second validation sample with MDD patients. Next, the classifier was applied to data from TW individuals, and to a third validation group whose data were acquired at the same time and with the same scanner as the TW sample. This control group was also included in the univariate region-of-interest analysis that followed the multivariate analyses. Two regions previously associated with changes in TW relative to CG individuals were examined: the putamen and the insula.

2.2 Data

2.2.1 Cisgender training sample and first validation set

The data from a sample of N = 1753 CG participants without any evidence of previous psychiatric disorders served as the basis for the training. History of psychiatric disorders was ruled out using the Structured clinical interview following DSM-IV criteria [Wittchen1997]. The participants were taken from three different cohorts: the Muenster Neuroimaging Cohort (MNC, N = 666 [Dannlowski2015]), the BiDirect (BD, N = 434 [Teuber2017]) study and the FOR2107 study (N = 653 [Kircher2018]). Exclusion criteria for the MNC were presence or history of major internal or neurological disorder, dependence on or recent abuse of alcohol or drugs, hypertension, and general MRI contraindications. BD and FOR2107 have similar exclusion criteria; details are described in supplementary table 3 and elsewhere [Teismann2014, Kircher2018].

2.2.2 Second, clinical validation sample – patients suffering from major depressive disorder (MDD)

To exclude that potential differences in classification true-positive rate are due to comorbid depressive symptoms in TW, data from a clinical sample (N = 1404) of patients diagnosed with MDD were used as second validation-test sample. Diagnoses were again verified with the structural clinical interview according to DSM-IV criteria [Wittchen1997]. The MDD sample consisted of N = 285 participants from the MNC, N = 591 from the BD study and N = 528 from the FOR2107 study (supplementary table 3). Additional exclusion criteria were presence of bipolar disorder, schizoaffective disorders and schizophrenia, substance-related disorders, current benzodiazepine treatment (wash out of at least three half-lives before study participation), and recent electroconvulsive therapy. Nearly all patients were under psychopharmacological antidepressant treatment and/or received psychotherapy.

2.2.3 Application: transgender application sample including third validation sample

To test for a different classification of CG and TW individuals, we used an independent sample of N = 26 TW. Data for this transgender sample were collected in conjunction with a set of cisgender controls that serve as the third validation sample of N = 19 CG-women and N = 15 CG-men (Transgender study (TSS)). Data of TW and CG were recorded under equal conditions (e.g. scanner, timeframe, study protocol, investigator), ruling out possible confounding of the SVM due to scanner variability. The TW were in different treatment states, with 18 already treated with hormones (supplementary table 4). Further details can be found in the original study [Schoning2010].

2.3 Image acquisition and structural preprocessing

T1-weighted high-resolution anatomical images of the MNC and TSS were acquired at a 3T MRI (Gyroscan Intera 3T, Philips Medical Systems, the Netherlands) using a three-dimensional fast gradient echo sequence (turbo field echo), repetition time = , echo time = , flip angle = , two signal averages, inversion pre-pulse every , acquired over a field of view of 256 (feet-head) 204 (anterior-posterior) (right-left), frequency encoding in feet to head direction, phase encoding in anterior-posterior and right-left direction, reconstructed to voxels of [Dannlowski2015b, Dannlowski2015a].

The 3D T1-weighted turbo field echo images of the BD study were collected in the same scanner with repetition time = , echo time = , flip angle, 160 sagittal slices, matrix dimension , FOV = , slice thickness (reconstructed to ) resulting in a voxel size of . The FOR2107 study was conducted at two different sites [Vogelbacher2018]. In Münster, data were collected with a 3T Siemens PRISMA using 3D T1-weighted magnetization prepared rapid acquisition gradient echo (MPRAGE) with repetition time = , echo time = , inversion time = , flip angle, 192 sagittal slices, slice gap, resulting in a voxel size of . In Marburg, data were collected in a 3T Siemens Magnetom Trio Tim syngo MR B17 using a 3D T1-weighted magnetization prepared rapid acquisition gradient echo (MPRAGE) with repetition time = , echo time = , inversion time = , flip angle, 176 sagittal slices, slice lap, resulting in a voxel size of . The structural images were preprocessed using the CAT12-toolbox [Gaser] (version r1184) in all four cohorts (MNC, FOR2107, BiDirect, TSS) following published protocols. Briefly, images were bias-corrected, tissue classified and normalized to MNI-space [Tzourio-Mazoyer2002]. For the univariate analysis, images were additionally smoothed with a Gaussian kernel of full width half maximum (FWHM). Absolute threshold masking with a threshold value of 0.1 was used for all univariate second-level analyses (http://www.neuro.uni-jena.de/cat12/CAT12-Manual.pdf). We carefully checked the sample for poor image quality detected by visual inspection and with the check homogeneity using covariance function implemented in CAT12.

2.4 Analyses

2.4.1 Multivariate analysis

Individualized prediction of the biological sex was assessed with a support vector classifier, implemented in the Scikit-learn toolbox [Pedregosa2012]. CAT12 whole-brain gray matter images were used as a classifier input. Gray matter images were resliced to a voxel size of , to reduce dimensionality while preserving maximal localized morphometric differences. The training process was strictly separated from the evaluation, by selecting a random validation set of

(N = 351, female = 219, male = 132), which was not used during classifier training and testing. The remaining data set of N = 1402 subjects was balanced for sex with a random undersampling procedure (N = 1218, female = 609, male = 609), and used in a 10-fold split procedure resulting in balanced training sets of 1096 subjects in each fold. A principal-component analysis was performed next, to further reduce the dimensionality of the data. The maximum number of principal components is limited to 1096, the number of subjects resulting from the 10-fold split. We carried out a Bayes-statistic-based hyperparameter optimization for the SVC (Scikit-Optimize

[Head2018]

), nested in the 10-fold cross-validation. The parameter search included choice of the kernel (radial basis function (rbf) or linear), the

parameter ( to , non-discrete log-scale), which influences penalties for misclassification, and the parameter ( to 10, non-discrete log-sale), influencing the curvature of the decision boundary. In this iterative Bayes approach, a total of 100 parameter combinations were evaluated. Quality and classifier performance are reported by area under the ROC curve (AUC). The classifier resulting from the best combination of hyperparameters was finally determined using our first validation set, the drawn in advance from the original sample. To exclude potential effects of comorbid depression, this step was repeated with the sample of MDD subjects, as a second validation sample (Figure 1).

Figure 1: Application of the trained classifier for biological sex prediction. Abbreviations: CG - cisgender TW - transgender women MDD - major depression disorder

The final trained and validated classifier was then applied to the application sample with transgender individuals. To test if classification results differ between CG-men and TW (same biological sex), we applied the true positive rate (TPR). Since balanced accuracy is a measure not applicable to one-group-only scenarios. Fisher’s exact test was used to clarify whether TPR differs statistically between samples. Interpretation of TPR is based on the hypothesis that TW belong to the category of male biological sex.

2.4.2 Univariate analysis

The TSS sample (TW group and matched CG controls, supplementary table 4) were used in the univariate analysis. Statistical parametric mapping (SPM12, Wellcome Trust Centre for Neuroimaging, London, http://www.fil.ion.ucl.ac.uk/spm/) was used for univariate gray matter analysis. The putamen and insula were defined as a priori regions of interest (ROIs) using the aal-atlas [Tzourio-Mazoyer2002] implemented in the Wake Forest University Pickatlas (http://fmri.wfubmc.edu/software/PickAtlas). We investigated the relationship between groups (CG-men, -women, TW-pre and -post CHT) and gray-matter volume with an ANCOVA, with age, total intracranial volume and sexual orientation as nuisance regressors in all analyses. Sexual orientation was indicated by the participants as a continuous variable (0 indicating homosexuality, 50 indicating bisexuality and 100 indicating heterosexuality). The terminology was chosen according to the natal biological sex of TW, i.e. homosexuality indicated sexual interest in men. We calculated a priori defined t-contrasts according to our hypothesis: CG-men > women, CG-men > TW-pre, TW-pre > CG-women, CG-men > TW-post, TW-post > CG-women and TW pre > TW post. An additional whole brain analysis further explored possible regions with volume differences between the groups. To determine statistical significance of putative clusters in each of the two bilateral ROIs (insula, putamen) and the whole brain analysis, the non-parametric approach of Threshold-Free Cluster Enhancement was used, as implemented in the TFCE toolbox (http://dbm.neuro.uni-jena.de/tfce, version 167). Rigorous alpha correction was applied at a threshold of family-wise-error-corrected obtained by 5000 permutations per test.

3 Results

3.1 Multivariate analysis

3.1.1 Cisgender training and first validation sample

The training of the classifier led to two results. The first result was the estimation of a hyperparameter set, determined with the Bayes optimization method. The hyperparameter optimization estimated a rbf kernel,

and for the SVM as optimal approximation for the present problem. Based on the estimated hyperparameters, the second result was the classification outcome of the validation set, which provided a performance indication for the trained classifier. The balanced accuracy for the validation set classification was

. The confusion matrix (supplementary table

6) revealed that our classifier assigns the female biological sex (TPR = ) more accurately than the male biological sex (TPR =

). These results are visualized by a ROC curve, based on the probabilities for a classification as male (supplements figure

2a), with a calculated area under the curve (AUC) of 0.99.

3.1.2 MDD second validation sample

To rule out that MDD comorbidity had any influence on the classifier, we used a second validation set consisting of 1404 MDD subjects (853 CG-women, 551 CG-men). Our classifier reached a balanced accuracy of , and a TPR of for CG-men in this sample (supplementary table 7). The results of the classifier, the corresponding ROC curve (supplementary figure 2d), and the AUC of 0.99 are similar to the results of the first validation set. Fisher’s exact test revealed no significant differences between the distribution of results of the first and second validation sample (supplementary table 8).

3.1.3 Transgender application sample and cisgender third validation sample

The balanced accuracy for the third validation sample was (CG part of the transgender sample). The TPR for CG-men was and for CG-women . However, the TPR for the TW was remarkably low at (supplementary table 6); see visualization by ROC curves (supplementary figure 2b, c). The corresponding AUC differed as a function of group between 0.99 (CG-men) and 0.95 (TW). This difference in TPR was significant, as Fisher’s exact test showed a statistically significant difference between TPR of CG-men and TW with hormone treatment (Table 1). The output probabilities of the classifier are represented descriptively in figure 2, as a box plot.

TPR in % Fisher’s Exact Test
Group N (N correct/total) against CG men
CG-women 19 94.74 (18/19)
CG-men 15 93.33 (14/15) -
TW 25 56.00 (14/25) ***
TW (treatment naive) 9 77.77 (7/9)
TW (post CHT) 18 50.00 (9/18) ***

Table 1: Classification results in the application sample. Classification results in percentage of true positive rate identified biological sex.
Abbreviations: TPR - true positive rate (sensitivity) CG - cisgender TW - transgender women CHT - cross-sex-hormone treatment

CG women

CG men

TW

0.0

0.2

0.4

0.6

0.8

1.0

probability of being classified as male

Figure 2: Boxplot for the predicted probabilities of male sex based on the application sample and the third validation sample, including transgender and cisgender individuals. Abbreviations: CG - cisgender TW - transgender women

3.2 Univariate analysis

The region of interest analysis is summarized in table 2 and figure 3 (see coordinates and detailed statistics there). Briefly, using rigorous alpha correction, our analysis revealed no differences between TW-post CHT and CG-women in the bilateral putamen. In the insula, TW-post CHT showed higher volume than CG-women. TW-post CHT and CG-women both showed lower volume of the insula and putamen compared to CG-men. In contrast, TW-pre CHT showed larger volume in both ROI analyses compared to CG-women. Interestingly, TW pre CHT also showed higher volume in the putamen compared to CG-men. TW post-CHT showed lower volume of both regions of interest compared to TW pre-CHT in both regions of interest. CG-men showed larger volume in both regions of interest compared to CG-women. Detailed results of our exploratory whole-brain analysis can be found in the supplementary table 9. The analysis revealed higher volumes in TW-pre compared to CG-women and TW post treatment in areas such as the precuneus, and medial cingulate cortex while TW-post treatment showed higher volumes compared to CG-women in the precuneus and lingual gyrus, but also lower volume compared to CG-women in the postcentral gyrus as well as lower volumes compared to CG-men in the precentral and frontal inferior gyrus.


region MNI-space
compared groups of interest side TFCE p-FWE k x y z
TW-pre > TW-post insula L 91.50 .012 76 -38 -3 -12
R 54.96 .033 23 32 10 -16
putamen L 466.55 <.001 2005 -21 16 8
R 395.31 <.001 1409 27 -8 15
TW-pre > CG-women insula L 63.21 <.001 1926 -39 -3 -12
R 52.58 <.001 2299 34 15 -10
putamen L 274.31 <.001 2381 -21 10 23
R 257.58 <.001 2316 26 -4 14
TW-pre > CG-men putamen L 203.55 <.001 892 -21 15 9
R 183.13 <.001 576 28 -3 15
TW-post < CG-men insula L 38.96 .005 303 -42 14 -6
L 30.99 .010 124 -42 -8 4
R 21.37 .001 131 30 -18 20
putamen L 100.64 .001 1050 -14 9 -2
R 70.60 .001 1429 26 4 -8
TW-post < CG-women insula R 114.58 .021 99 34 -15 9
CG-men > CG-women insula L 49.7 <.001 1199 -44 14 -8
L 13.07 .004 48 -44 -14 8
R 109.23 <.001 1789 39 16 3
putamen L 81.13 <.001 1972 26 6 -4
R 100.11 <.001 1429 26 4 -8

Table 2: Results of the univariate gray matter region of interest analysis of the insula and putamen.
Note. Table reports respective statistics of significant clusters of the group comparisons between transgender and cisgender individuals. Clusters resulted from group comparisons corrected for total intracranial volume, age and sexual orientation. For reasons of brevity no results below a threshold of k = 22 voxel have been reported.

Abbreviations: TW - Transgender Women CG - cisgender pre/post - before/after hormone treatment L/R - left/right k - cluster size TFCE - Threshold-Free-Cluster-Enhancement with subsequent Family-Wise-Error-Correction.
Figure 3: Significant results of the univariate gray matter analysis. Color-bar represents t-values of the extracted clusters. Image shows the cluster at the respective peak voxel as reported in table 2. A. Alterations of the insula between groups (cisgender men, cisgender women and transgender women before vs. after hormone treatment) B. Alterations of the putamen between groups (cisgender men, cisgender women transgender women before vs. after hormone treatment)

4 Discussion

In the present study, we developed an SVM using hyperparameter optimization resulting in an accurate classification of biological sex based on structural MRI images. The classifier, trained on a large training set of healthy CG individuals, performed equally well in three independent validation samples of healthy CG individuals, and CG participants suffering from MDD. When applying the same classifier to structural MRI data of TW, the SVM shows a much lower TPR, resulting in significantly more misclassifications of the biological sex of TW (male) in favor of their perceived gender (female). Moreover, the descriptive statistics of classification probabilities regarding TW (Figure

2) indicate a pattern of prediction uncertainty that is not seen in CG. Hence, our results shed light on two important aspects in biological psychiatry of transgender individuals: 1) The impact of hormonal treatment on brain structure, 2) the separation of psychological distress (i.e. depression), hormonal treatment and trait characteristics of being a transgender individual. Our results replicate the finding that biological sex is increasingly misclassified in transgender individuals, previously described by Hoekzema and colleagues (2015) [Hoekzema2015]. This might encourage further investigations into the cause for increased misclassifications in TW. Most notably and in contrast to previous studies, we could rule out that our findings are biased by comorbid depression. Given that the results of the first validation sample of healthy CG participants were replicated in a large clinical sample of CG psychiatric patients suffering from major depression, the classifier appears to be reliable and robust to noise even from psychiatric disorders such as MDD, which have been associated with structural brain changes [Redlich2018, Zaremba2018]. Our biological sex classifier shows a higher external validity than other biological sex classifiers. First, it has been tested on controls and patients with MDD, with high and very similar accuracy. Second, the SVM has been trained on large samples that have been collected at different sites. Hence, our SVM can be regarded as more generalizable while preserving performance and accuracy, indicating its robustness to noise. In the present work, we focused on the first application of this SVM on TW. We observed that our SVM was increasingly inaccurate in TW, compared to healthy CG controls. The explorative analysis revealed that this inaccuracy was particularly increased in TW who had hormonal treatment. Although our TW pre CHT sample size was low, we aimed to differentiate structural brain alterations between TW pre and TW post CHT as well as in comparison to CG-women and -men. TW showed brain structural alterations dependent on their treatment state. Volumes of the insula and putamen were larger in TW pre CHT than in CG-women, while TW post CHT showed lower volumes of the right insula compared to CG-women. Our whole-brain analysis also revealed that TW pre resemble large clusters with higher gray matter volume than CG-women, while TW post CHT showed more pronounced brain structural alterations compared to CG-women. While TW post CHT showed higher volume in a few brain areas compared to CG-women, they showed lower volume in the postcentral gyrus. In comparison to CG-men, TW-pre CHT showed larger volumes of the putamen, while TW-post CHT showed lower volumes of both insula and putamen. Accordingly, the whole-brain analysis revealed that TW-pre CHT showed higher volumes in comparison to CG-men in several large clusters, while TW-post CHT showed lower volumes in comparison to CG-men. Thus, TW independent of treatment state show brain structural alterations in our regions of interest and across the brain in comparison to both, CG-men and –women. Detailed analysis of TW-pre compared to -post CHT revealed a less pronounced pattern of structural brain alterations in TW-post CHT compared to CG-women. Comparing TW-pre with TW-post CHT revealed lower volume of TW post CHT in both regions of interest, as well as the whole-brain analysis. This implies that CHT induces a further feminization of brain structure in TW. This result fits with previous longitudinal studies that have shown reductions of cortical thickness in TW pre to post CHT [Zubiaurre-Elorza2014]. Structural and functional alterations of the insula have consistently been associated with transgender compared to CG individuals [Manzouri2019, Kranz2015, Burke2018, Spizzirri2018, Savic2011]. The insula is associated with body and self-perception. Behaviorally, TW perceive an incoherence between their biological sex and perceived gender that is accompanied by altered insula activity in response to bodily sensations [Case2017].

Brain structural alterations of the putamen have been associated with TW across multiple studies and independent of treatment state (pre, post CHT) [Savic2011, Luders2009, Mueller2017]. We examined the putamen volume across different treatment states. Our study reveals that TW-pre show a higher volume of the putamen compared to CG-men and CG-women, while TW-post show lower volume of the putamen compared to CG-men, but not to CG-women. However, it remains unknown how CHT influences these structural and functional brain alterations of TW. Longitudinal examinations are required to reveal region specific structural alterations especially in the insula and putamen to estimate the impact of CHT of brain structure.

Our combined univariate and multivariate approach revealed associations of CHT with lower accuracy in detecting the biological sex (male) in TW. Our results imply that brain structure of TW (especially post CHT) does neither resemble a feminine nor masculine brain. In line with this idea, hormonal processes, brain-structural development and the development of gender identity are intertwined [Nguyen2018a]. Intrauterine hormones (e.g. testosterone) drive the development of gender identity, rather than social learning processes [Bao2005, Swaab2009]. The male physical appearance is formed in the first trimester, due to effects of testosterone, and the female body develops due to the lack of androgens in this period [Shamim2000]. While the maturation of reproductive organs is more or less limited to the first trimester, brain development is continuing throughout pregnancy [VanGoozenS.H.Slabbekoorn2002, Bao2011]. Hormonal influences after the first trimester do not change the biological sex, but the experience of gender. Thus, hormonal influences after the first trimester might be responsible for the incoherence between biological and experienced sex. Since hormonal influences change gender perception as well as brain structure, CHT may lead to misclassifications in the TW group after treatment. Our univariate data indeed show that CHT is associated with structural brain alterations comparing TW pre and post CHT to CG individuals. A previous study showed increased misclassification of biological sex even in untreated TW, which we could not statistically support due to the small sample size of our untreated group (N = 8). Therefore, further studies should follow up on this effect, with higher sample sizes of untreated TW to increase power. An extension of the design with a second control group (women with hormonal treatment) should be used to clarify whether misclassification is an effect of treatment only, due to the combination of being transgender and CHT.

Structural brain alterations in TW compared to CG individuals are also discussed in the context of psychiatric comorbidity and distress. In the present study, we addressed this by testing our SVM on CG-patients with MDD and obtained the same accuracy as for CG-individuals without MDD. Presumed that distress experienced by MDD patients is comparable to distress in TW, we found no evidence that structural alterations from distress are responsible for the decreased accuracy of our classifier in TW. The present SVM approach provides a new tool for research in biological psychiatry. Prevalence of many psychiatric disorders is often higher for one biological sex than for the other. For example, prevalence in autism is higher for biological men than for biological women. Hence, it was hypothesized that female patients with autism might be similar in their brain structure to men. A previous study that developed a biological sex classifier using structural MRI scans and applied it to patients with autism [Ecker2017] indeed showed increased misclassifications of biological sex in female patients with autism. Therefore, biological sex misclassifications (as found in autism, but not in our depressed validation sample reported here) might point to involvement of aberrant biological sex development in the onset of such neurodevelopmental disorders. Future studies could use our trained classifier (https://photon-ai.com/model_repo/bsc_mri) to test for misclassifications in other clinical diagnoses with high gender imbalance in prevalence rates, such as eating disorders, substance use disorders, or anxiety disorders.

4.1 Limitations

Due to our small sample size of transgender individuals, replication of the prediction failure of our SVM in transgender individuals pre and post CHT is needed. To verify that our effect is due to hormonal treatment, larger samples and studies in transgender men (biological sex female) are needed. Future studies should further dissect effects of gender dysphoria from depression, and effects of hormonal treatment from the state of being a transgender individual. Finally, a sample of transgender men would have been desirable to investigate whether the current finding generalizes to transsexual individuals with female biological sex.

5 Conclusion

In this study, we present a highly accurate biological sex classifier in CG individuals that shows a significantly decreased accuracy in transgender individuals after CHT. Our results underline that the brain structure of transgender individuals is similar to both, the brain structure of their perceived gender and biological sex. This implies that brain structure of TW differs from both cg men and women. Based on our brain structural data, we suggest a dimensional rather than binary gender construct which will contribute to the destigmatization of transgender individuals.

Acknowledgements

For2107

This work is part of the German multicenter consortium “Neurobiology of Affective Disorders. A translational perspective on brain structure and function”, funded by the German Research Foundation (Deutsche Forschungsgemeinschaft DFG; Forschungsgruppe/Research Unit FOR2107).

Principal investigators (PIs) with respective areas of responsibility in the FOR2107 consortium are: Work Package WP1, FOR2107/MACS cohort and brain imaging: Tilo Kircher (speaker FOR2107; DFG grant numbers KI 588/14-1, KI 588/14-2), Udo Dannlowski (co-speaker FOR2107; DA 1151/5-1, DA 1151/5-2), Axel Krug (KR 3822/5-1, KR 3822/7-2), Igor Nenadic (NE 2254/1-2), Carsten Konrad (KO 4291/3-1). WP2, animal phenotyping: Markus Wöhr (WO 1732/4-1, WO 1732/4-2), Rainer Schwarting (SCHW 559/14-1, SCHW 559/14-2). WP3, miRNA: Gerhard Schratt (SCHR 1136/3-1, 1136/3-2). WP4, immunology, mitochondriae: Judith Alferink (AL 1145/5-2), Carsten Culmsee (CU 43/9-1, CU 43/9-2), Holger Garn (GA 545/5-1, GA 545/7-2). WP5, genetics: Marcella Rietschel (RI 908/11-1, RI 908/11-2), Markus Nöthen (NO 246/10-1, NO 246/10-2), Stephanie Witt (WI 3439/3-1, WI 3439/3-2). WP6, multi method data analytics: Andreas Jansen (JA 1890/7-1, JA 1890/7-2), Tim Hahn (HA 7070/2-2), Bertram Müller-Myhsok (MU1315/8-2), Astrid Dempfle (DE 1614/3-1, DE 1614/3-2). CP1, biobank: Petra Pfefferle (PF 784/1-1, PF 784/1-2), Harald Renz (RE 737/20-1, 737/20-2). CP2, administration. Tilo Kircher (KI 588/15-1, KI 588/17-1), Udo Dannlowski (DA 1151/6-1), Carsten Konrad (KO 4291/4-1).

Data access and responsibility: All PIs take responsibility for the integrity of the respective study data and their components. All authors and coauthors had full access to all study data.

Acknowledgements and members by Work Package (WP): WP1: Henrike Bröhl, Katharina Brosch, Bruno Dietsche, Rozbeh Elahi, Jennifer Engelen, Sabine Fischer, Jessica Heinen, Svenja Klingel, Felicitas Meier, Tina Meller, Torsten Sauder, Simon Schmitt, Frederike Stein, Annette Tittmar, Dilara Yüksel (Dept. of Psychiatry, Marburg University). Mechthild Wallnig, Rita Werner (Core-Facility Brainimaging, Marburg University). Carmen Schade-Brittinger, Maik Hahmann (Coordinating Centre for Clinical Trials, Marburg). Michael Putzke (Psychiatric Hospital, Friedberg). Rolf Speier, Lutz Lenhard (Psychiatric Hospital, Haina). Birgit Köhnlein (Psychiatric Practice, Marburg). Peter Wulf, Jürgen Kleebach, Achim Becker (Psychiatric Hospital Hephata, Schwalmstadt-Treysa). Ruth Bär (Care facility Bischoff, Neunkirchen). Matthias Müller, Michael Franz, Siegfried Scharmann, Anja Haag, Kristina Spenner, Ulrich Ohlenschläger (Psychiatric Hospital Vitos, Marburg). Matthias Müller, Michael Franz, Bernd Kundermann (Psychiatric Hospital Vitos, Gießen). Christian Bürger, Fanni Dzvonyar, Verena Enneking, Stella Fingas, Janik Goltermann, Hannah Lemke, Susanne Meinert, Jonathan Repple, Kordula Vorspohl, Bettina Walden, Dario Zaremba (Dept. of Psychiatry, University of Münster). Harald Kugel, Jochen Bauer, Walter Heindel, Birgit Vahrenkamp (Dept. of Clinical Radiology, University of Münster). Gereon Heuft, Gudrun Schneider (Dept. of Psychosomatics and Psychotherapy, University of Münster). Thomas Reker (LWL-Hospital Münster). Gisela Bartling (IPP Münster). Ulrike Buhlmann (Dept. of Clinical Psychology, University of Münster). WP2: Marco Bartz, Miriam Becker, Christine Blöcher, Annuska Berz, Moria Braun, Ingmar Conell, Debora dalla Vecchia, Darius Dietrich, Ezgi Esen, Sophia Estel, Jens Hensen, Ruhkshona Kayumova, Theresa Kisko, Rebekka Obermeier, Anika Pützer, Nivethini Sangarapillai, Özge Sungur, Clara Raithel, Tobias Redecker, Vanessa Sandermann, Finnja Schramm, Linda Tempel, Natalie Vermehren, Jakob Vörckel, Stephan Weingarten, Maria Willadsen, Cüneyt Yildiz (Faculty of Psychology, Marburg University). WP4: Jana Freff, Silke Jörgens, Kathrin Schwarte (Dept. of Psychiatry, University of Münster). Susanne Michels, Goutham Ganjam, Katharina Elsässer (Faculty of Pharmacy, Marburg University). Felix Ruben Picard, Nicole Löwer, Thomas Ruppersberg (Institute of Laboratory Medicine and Pathobiochemistry, Marburg University). WP5: Helene Dukal, Christine Hohmeyer, Lennard Stütz, Viola Schwerdt, Fabian Streit, Josef Frank, Lea Sirignano (Dept. of Genetic Epidemiology, Central Institute of Mental Health, Medical Faculty Mannheim, Heidelberg University). WP6: Anastasia Benedyk, Miriam Bopp, Roman Keßler, Maximilian Lückel, Verena Schuster, Christoph Vogelbacher (Dept. of Psychiatry, Marburg University). Jens Sommer, Olaf Steinsträter (Core-Facility Brainimaging, Marburg University). Thomas W.D. Möbius (Institute of Medical Informatics and Statistics, Kiel University). CP1: Julian Glandorf, Fabian Kormann, Arif Alkan, Fatana Wedi, Lea Henning, Alena Renker, Karina Schneider, Elisabeth Folwarczny, Dana Stenzel, Kai Wenk, Felix Picard, Alexandra Fischer, Sandra Blumenau, Beate Kleb, Doris Finholdt, Elisabeth Kinder, Tamara Wüst, Elvira Przypadlo, Corinna Brehm (Comprehensive Biomaterial Bank Marburg, Marburg University).

The FOR2107 cohort project (WP1) was approved by the Ethics Committees of the Medical Faculties, University of Marburg (AZ: 07/14) and University of Münster (AZ: 2014-422-b-S).

Financial disclosures/Conflict of Interest

This work was funded by the German Research Foundation (DFG, grant FOR2107 DA1151/5-1 and DA1151/5-2 to UD; SFB-TRR58, Projects C09 and Z02 to UD) and the Interdisciplinary Center for Clinical Research (IZKF) of the medical faculty of Münster (grant Dan3/012/17 to UD). The BiDirect Study is supported by a grant of the German Ministry of Research and Education (BMBF) to the University of Muenster (01ER0816 and 01ER1506).

Biomedical financial interests or potential conflicts of interest: Tilo Kircher received unrestricted educational grants from Servier, Janssen, Recordati, Aristo, Otsuka, neuraxpharm. The other authors (Claas Flint, Katharina Förster, Sophie A. Koser, Carsten Konrad, Pienie Zwitserlood, Klaus Berger, Marco Hermesdorf, Igor Nenadic, Axel Krug, Bernhard T. Baune, Katharina Dohm, Ronny Redlich, Nils Opel, Tim Hahn, Xiaoyi Jiang, Udo Dannlowski, Dominik Grotegerd) declare no conflicts of interest.

References

Appendix A Supplements

Male Female Significance test
Muenster Neuroimaging Cohort
Healthy Controls (; )
Age 36.4 (11.7) 35.8 (12.6) ,
Major Depressive Disorder (; )
Age 37.4 (11.3) 38.1 (12.4) ,
BDI 23.4 (9.5) 26.5 (10.9) ,
HDRS-17 18.5 (3.8) 19.5 (4.5) ,
IQ (MWBT) 111.5 (13.3) 110.9 (14.1) ,
BiDirect
Healthy Control Group (; )
Age 51.3 (8.1) 53.0 (8.0) ,
Major Depressive Disorder (; )
Age 48.1 (7.4) 49.6 (7.3) ,
HDRS-17 12.6 (6.8) 14.3 (6.5) ,
FOR2107
Healthy Control Group (; )
Age 32.6 (11.4) 32.6 (13.0) ,
Major Depressive Disorder (; )
Age 36.6 (13.8) 37.8 (13.4) ,
HDRS-17 8.9 (6.7) 8.5 (6.8) ,
Table 3:

Descriptive statistics of the trainings and validation samples. Table reports means and standard deviations of the individual cohorts used for the training of the support vector machine. Significance test was univariate ANOVA without covariates.

CG men CG women TW pre TW post
significance test
Age 34 32 33.9 33.1 ,
in yeares (8.6) (6.3) (14.1) (31.3)
Highest 4.9 4.8 5.0 5.1 ,
Education (0.9) (0.8) (0.0) (0.5)
Table 4: Descriptive statistics of the application sample (transgender and cisgender individuals). Table reports means and standard deviations of the transgender individuals and controls from a similar measurement period used for the test of the support vector machine in TW. Significance test was univariate ANOVA without covariates. From 15 out of 29 TW were requested whether they had a depressive episode and 8 from the 15 TW indicated that they had a depressive episode. TW = transgender women (biological sex male, perceived sex female), Highest Education = Education was measured according to educational attainment in numbers from 1 = special school to 6 = universal degree.
Figure 4: Overview of training, validation and application procedure of the biological sex classifier. Abbreviations: SVM - support vector machine PCA - principal component analysis
actual
group female male

predicted

female 202 17
(TPR = ) (TNR = )
male 1 131
(TNR = ) (TPR = )
Accuracy
Balanced Accuracy
Precision
Recall
F1-Score 0.9357
Table 5: Results of the validation set (; ; ). Classification results in absolute numbers and percentage of accurately identified biological sex. Abbreviations: TPR - true positive rate (sensitivity) TNR - true negative rate (specificity)
actual
group CG women CG men TW

predicted

female 18 1 11
(TPR = ) (TNR = ) (TNR = )
male 1 14 14
(TNR = ) (TPR = ) (TPR = )
The following metrics are
related to the CG groups only:
Accuracy
Balanced Accuracy
Precision
Recall
F1-Score 0.9333
Table 6: Results of the application set (; ; ; ). Classification results in absolute numbers and percentage of accurately identified biological sex. Abbreviations: TPR - true positive rate (sensitivity) TNR - true negative rate (specificity) CG - cisgender TW - transgender women
actual
group female male

predicted

female 829 72
(TPR = ) (TNR = )
male 24 479
(TNR = ) (TPR = )
Accuracy
Balanced Accuracy
Precision
Recall
F1-Score 0.9206
Table 7: Results of the second validation set (; ; ). Classification results in absolute numbers and percentage of accurately identified biological sex. Abbreviations: TPR - true positive rate (sensitivity) TNR - true negative rate (specificity)
validation set 1 validation set 2 Fisher’s Exact Test
correct/incorrect correct/incorrect -value
CG women 131/17 479/72 .7386
CG men 202/1 829/24 .9955

Table 8: Comparison of the distribution of classification results between the first and second validation sets, using Fisher’s exact test. (CG - cisgender)

0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True Positive Rate

Baseline

AUC = 0.99

(a) First validation sample: CG men

0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True Positive Rate

Baseline

AUC = 0.99

(b) Third validation sample: CG men

0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True Positive Rate

Baseline

AUC = 0.95

(c) Application sample: TW (biological sex male)

0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

0.0

0.2

0.4

0.6

0.8

1.0

True Positive Rate

Baseline

AUC = 0.99

(d) Second Validation sample: MDD CG men
Figure 5: Receiver Operation Characteristics for the classification as biological sex male for cisgender men in first, second and third validation sample as well transgender women of our application sample.
CG = cisgender
MDD = Major Depressive Disorder
TW = transgender women
MNI-Space
compared groups region of interest TFCE p-FWE k x y z
TW-pre > TW-post R medial cingulate cortex 1643.35 .005 15117 27 -9 16
and caudate nucleus
L caudate nucleus 1514.98 .005 13228 -22 18 12
L precentral and middle frontal gyrus 766.08 .029 470 -36 4 54
L precuneus 743.23 .029 651 -14 -56 52
L postcentral gyrus 668.59 .040 410 -63 -14 39
R cerebellum 632.84 .045 505 48 -54 -27
R cerebellum 611.18 .047 258 40 -38 -46
TW-pre > CG-women L precuneus, R medial cingulate Cortex, 982.71 <.001 108742 22 -3 15
L + R lingual gyrus
L cerebellum 328.77 .003 4657 -30 -33 -50
R precentral, frontal inferior gyrus 127.46 .041 360 62 16 27
TW-post > CG-women R calcarine/lingual gyrus, precuneus 1312.08 <.001 3875 4 -50 3
L cuneus, superior occipital gyrus 605.11 .023 498 -10 -96 36
TW-post < CG-women R postcentral gyrus 1057.71 .026 562 45 -21 32
TW-pre > CG-men L caudate nucleus, putamen, hippocampus 745.18 .009 3018 -21 18 9
R caudate nucleus, putamen 713.97 .010 2336 28 -4 16
R Precuneus, Mid Cingulum 673.63 .013 3185 8 -28 39
L Pre-, Postcentral 654.18 .015 822 -69 -15 -32
R Hippocampus, Parahippocampus 567.93 .028 713 20 -8 -30
R Calcarine, Lingual gyrus 528.98 .002 416 26 -75 3
TW-post < CG-men L middle temporal lobe, cerebellum 939.61 <.001 111736 32 -56 -38
R middle temporal lobe, cerebellum
R precentral and frontal inferior gyrus 260.12 .016 628 57 6 12
Table 9: Results of the whole-brain analysis. For reasons of brevity only significant clusters voxels are reported, we did not calculate a contrast comparing cisgender men and women. The reported significant clusters resulted from group comparisons within a full factorial model corrected for total intracranial volume, age and sexual orientation.

Abbreviations: TW - Transgender Women CG - cisgender pre/post - before/after hormone treatment L/R - left/right k - cluster size TFCE - Threshold-Free-Cluster-Enhancement with subsequent Family-Wise-Error-Correction.