Anatomical landmark localization is a challenging problem that appears in many medical image analysis problems . One particular realm where the localization of landmarks is of high importance is the analysis of knee plain radiographs at different stages of osteoarthritis (OA) – the most common joint disorder and highest disability factor in the world .
In knee OA research field, as well as in the other domains, two sub-tasks that form a typical pipeline for landmark localization can be defined: the region of interest (ROI) localization and the landmark localization itself . In knee radiographs, the former one is typically applied in the analysis of the whole knee images [3, 4, 28, 36, 38], while the latter is used for bone shape and texture analyses [6, 19, 34]. Furthermore, Tiulpin et al. also used the landmark localization for image standardization applied after the ROI localization step [36, 37].
Manual annotation of knee landmarks is not a trivial problem without the knowledge of knee anatomy, and it becomes even more challenging when the severity of OA increases. In particular, it makes the annotation process of fine-grained bone edges and tibial spines intractable and time consuming. In Fig. 2, we show the examples of annotations of the landmarks for each stage of OA severity graded according to the gold-standard Kellgren-Lawrence system (grading from to ) . It can be seen from this figure that when the severity of the disease progresses, bone spurs (osteophytes) and the general bone deformity affect the appearance of the image. Other factors, such as X-ray beam angle are also known to have impact on the image appearance .
In this paper, we propose a novel Deep Learning based framework for localization of anatomical landmarks in knee plain radiographs and validate its generalization performance. First, we train a model to localize ROIs in a bilateral radiograph using low-cost labels, and subsequently, train a model on the localized ROIs to predict the location ofanatomical landmarks in femur and tibia. Here, we utilize transfer learning and use the model weights from the first step of our pipeline for initialization of the second-stage model. The proposed approach is schematically illustrated in Fig. 1.
Our method is based on the hourglass convolutional network 
that localizes the landmarks in a weakly-supervised manner and subsequently uses the soft-argmax layer to directly estimate the location of every landmark point. To summarize, the contributions of this study are the following:
We leverage recent advances in landmark detection using hourglass networks and combine the best design choices in our method.
For the first time, we propose to use MixUp  data augmentation principle for anatomical landmark localization and perform a thorough ablation study for the knee radiographs.
We demonstrate an effective strategy of enhancing the performance of our landmark localization method by pre-training it on low-budget landmark annotations.
We evaluate our method on two independent datasets and demonstrate better generalization ability of the proposed approach compared to the current state-of-the-art baseline.
The pre-trained models, source code and the annotations performed for the Osteoarthritis Initiative (OAI) dataset are publicly available at http://will.be.placed.after.review.
2 Related Work
In the literature, there exist only a few studies specifically focused on localization of landmarks in plain knee radiographs. Specifically, the current state-of-the-art was proposed by Lindner et.al [24, 25]
and it is based on a combination of random forest regression voting (RFRV) with constrained local models (CLM) fitting.
There are several methods focusing solely on the ROI localization. Tiulpin et al.  proposed a novel anatomical proposal method to localize the knee joint area. Antony et al.  used fully convolutional networks for the same problem. Recently, Chen et al.  proposed to use object detection methods to measure the knee OA severity.
The proposed approach is related to the regression-based methods for keypoint localization . We utilize an hourglass network which is an encoder-decoder model initially introduced for human pose estimation  and address both ROI and landmark localization tasks. Several other studies in medical imaging domain also leveraged a similar approach by applying U-Net  to the landmark localization problem [12, 31]
. However, the encoder-decoder networks are computationally heavy during the training phase since they regress a tensor of high-resolution heatmaps which is challenging for medical images that are typically of a large size. It is notable that decreasing the image resolution could negatively impact the accuracy of landmark localization. In addition, most of the existing approaches use a refinement step which makes the computational burden even harder to cope with. Nevertheless, hourglass CNNs are widely used in human pose estimation due to a possibility of lowering down the resolution and the absence of precise ground truth.
More similar to our approach, Honari et al.  recently leveraged deep learning and applied soft-argmax layer to the feature maps of the full image resolution to improve landmark localization performance leading to remarkable results. However, such strategy is computationally heavy for medical images due to their high resolution. In contrast, we first moderately reduce the image resolution by embedding it into a feature space, utilize an hourglass module to process the obtained feature maps at all scales, and eventually apply the soft-argmax operator that makes the proposed configuration more applicable to high-resolution images allowing to get sub-pixel accurate landmark coordinates.
3.1 Network architecture
Our model comprises several architectural components of modern hourglass-like encoder-decoder models for landmark localization. In particular, we utilize the hierarchical multi-scale parallel (HMP) residual block  which improves the gradient flow compared to the traditional bottleneck layer described in: [17, 27]. The HMP block structure is illustrated in Fig. 3.
The architecture of the proposed model is represented in Fig. 4
. In general, our model comprises three main components: entry block, hourglass block, and output block. The whole network is parameterized by two hyperparameters – widthand depth
, where the latter is related to the number of max-pooling steps in the hourglass block. In our experiments we found the width ofand the depth of to be optimal to maintain both high accuracy and speed of computations.
Similar to the original hourglass model  we apply a
convolution with stride
and zero padding ofand pass the results into a residual module. Further, we use a max-pooling and utilize three residual modules before the hourglass block. This block allows to simultaneously downscale the image times and obtain representative feature embeddings suitable for multi-scale processing performed in the hourglass block.
This block starts with a max-pooling and recursively repeats dual-path structure times as can be seen in Fig. 4. In particular, each level of the hourglass block starts with a max-pooling subsequently followed by HMP residual blocks. At the next stage, the representations from the current level are passed to the next hourglass’ level and also passed forward to be summed with the up-sampled outputs of the hourglass level . Since spatial resolution of the feature maps at level and is different, the nearest-neighbours up-sampling is used . At level , we simply feed the representations into the HMP block instead of the next hourglass level due to the reached limit of hourglass’ depth.
The final block of the model uses the representations coming from the hourglass module and sequentially applies two blocks of dropout () andconvolution and soft-argmax  are utilized to regress the coordinates of each landmark point.
Since soft-argmax is an important component of our model, we review its formulation in this paragraph. This operator can be defined as a sequence of two steps, where the first one calculates the spatial softmax for pixel :
At the next stage, the obtained spatial softmax is multiplied by the expected value of landmark coordinate at every pixel:
3.2 Loss function
We assessed various loss functions for training our model and finalized our choice at wing loss  that is closely related to loss. However, in the case of wing loss, the errors in a small vicinity of – are better amplified due to the logarithmic nature of the function:
where – is a ground truth, – prediction, (, ) – range of non-linear part of the loss, – constant smoothly linking the linear and non-linear parts.
3.3 Training techniques
We use a MixUp technique  to improve the performance of our method. In particular, MixUp mixes the data inputs and , the corresponding keypoint arrays and :
thereby augmenting the dataset with the new interpolated examples. Our implementation of mixup does not differ from the one proposed in the original work111https://github.com/facebookresearch/mixup-cifar10 and we do not compute the mixed targets . In contrast, we rather optimize the following loss function calculated mini-batch-wise:
where and are the outputs of the network for and , respectively. Here, the points for every point are generated by a simple mini-batch shuffling.
Medical images can vary in appearance due to different data acquisition settings or patient-related anatomical features. To tackle the issue of limited data, we applied the data augmentation. We use geometric and textural augmentations similarly to to the face landmark detection problem 
. The former included all classes of homographic transformations while the latter included gamma correction, salt and pepper, blur (both median and gaussian) and the addition of a gaussian noise. Interestingly, the homographic transformations were shown effective in improving, for example, self-supervised learning[23, 26], however only more narrow class of transformation (affine) has been applied to the landmark localization  in faces.
Transfer learning from low-budget annotations.
As shown in Fig. 1, the problem of localizing the landmarks comprises two stages: identification of the ROI and the actual landmark localization. We previously mentioned the two classes of labels that are needed to train such a pipeline: low-cost ( points / image) and high-cost labels ( points). The low-cost labels can be noisy / inaccurate and are quick to produce, while the high-cost labels require the expert knowledge. In this work, we first train the ROI localization model ( landmark per leg) on the low-cost labels – knee joint centers (see Fig. 1) and then re-use the pre-trained weights from this stage to train the landmark localization model ( landmarks per knee joint).
For all the following datasets, we applied the same annotations process. Firstly, for all the images in all the datasets we run BoneFinder tool (see Sec. 4.2). At the second stage, for every image, a person experienced in knee anatomy and OA manually refine all the landmark points. In Fig. 1, we highlight the numbering of the landmarks that we use in this paper. Specifically, we marked the corner landmarks in tibia from to and in femur from to (lateral to medial). To perform the annotations, we used VGG image annotation tool .
We trained our model and performed model selection using the images from Osteoarthritis Initiative (OAI) dataset222https://oai.epi-ucsf.org/datarelease/. Roughly knee joint images per KL grade were sampled to be included into the dataset. The final dataset size comprised knee joints in total. In the case of the ROI localization, we used a half of the image that corresponded to each knee.
These data were collected at our hospital , and thus, it comes from a completely different population than OAI (from USA). It includes the images from subjects, and KL grade-wise the data have the following distribution: 4 knees with KL , knees with KL , knees with KL , knees with KL 3 and knees with KL . From this dataset, we excluded knee due to an implant, thereby using knees for testing of our model.
This dataset was also acquired from our hospital and included originally subjects. Out of these, 5 knee joints were excluded, thereby making a dataset of knees ( implants and due to error during the annotation process). With respect to OA severity, these data had cases with KL , with KL , with KL , with KL and with KL . This dataset was also used solely for testing of our model.
4.2 Baseline methods
We used several baseline methods at the model selection phase and one strong pre-trained baseline method at the test phase. In particular, we used Active Appearance Models  and Constrained Local Models  with both Image Gradient Orientations (IGO)  and Local Binary Patterns Features (LBP) . Our implementation is based on the available methods with default hyperparameters from the Menpo library .
At the test phase, we used pre-trained RFRV-CLM method  implemented in BoneFinder tool. Here, the RFRV-CLM model was trained on images from OAI dataset. However we did not have access to the train data to assess which samples were used for training this method, therefore, we used this tool only for testing on datasets A and B.
4.3 Implementation Details
All our ablation experiments were conducted on the same -fold patient-wise cross-validation split stratified by a KL grade to ensure equal distribution of different stages of OA severity. Both ROI and landmark localization models were trained using the same split.
During the training, we used exactly the same hyperparameters for all the experiments. In particular, we used and for our network. The learning rate and the batch size were fixed to and , respectively. In some of our experiments where the weight decay was used, we set it to . All the models were trained with Adam optimizer . The pixel spacing for ROI localization was set to mm and for the landmark localization to mm. We used bi-linear interpolation for image resizing.
All the ablation experiments were conducted solely on landmark localization task and eventually, after selecting the best configuration, we used it for training the ROI localization model due to the similarity of the tasks. We used the ground truth annotations to crop the mm ROIs around the tibial center (landmark in Fig. 1) to create the data for model selection and training the landmark localization model. In our experiments, we flipped all the left ROI images to look like the right ones, however this strategy was not applied for the ROI localization task.
When performing the fine-tuning of landmark localization model using the pre-trained weights of the ROI localization model, we simply initialized all the layers of the former with the weights of the latter one. We note here that the last layer was initialized randomly and we did not freeze the pre-trained part for simplicity.
Evaluation and Metrics
To assess the results of our method, we used multiple metrics and evaluation strategies. Firstly, we performed the ablation experiments and used the landmarks for evaluation of the results (see Fig. 1). At the test time, when comparing the performance of the full system, we used an extended set of landmarks for evaluation – . The intuition here is to compare the landmark methods on those landmark points that are the most crucial in applications (tibial corners for landmark localization as well as tibial and femoral centers for the ROI localization). Besides, we excluded all the knees with implants from the evaluation.
As as the main metric for comparison, we used Percentage of Correct Keypoints (PCK) to compare the landmark localization methods. This metric shows the percentage of points that fall within the neighborhood of a ground truth landmark having the radius (recall at different precision thresholds). In our experiments, we used of mm, mm, mm and mm for quantitative comparison.
Finally, we also assessed the amount of outliers in the landmark localization task. An outlier was defined as a landmark that do not fall within themm neighbourhood of the ground truth landmark. This value was computed for all the landmark points in contrast to PCK.
4.4 Ablation Study
In the initial experiments with our model we assessed different loss functions ( see Tab. 1). In particular, we used ,, wing  and elastic loss (sum of and losses). Besides, we also utilized a recently introduced general adaptive robust loss with the default hyperparameters . Our experiments showed that wing loss with the default hyperparameters as in the original paper ( and ), produces the best results.
Effect of Multi-scale Residual Blocks.
The experiments done for loss functions were conducted using the HMP block. However, it is worth to assess the added value of this block compare to the bottleneck residual block. Tab. 1 demonstrates that the bottleneck residual block (”Wing + regular res. block” of the Table) fell behind of HMP (”Wing loss”) in terms of PCK.
MixUp vs. Weight Decay
After observing that the wing loss and HMP block yield the best default configuration, we experimented with various forms of regularization. In this series of experiments, we used our default configuration and applied MixUp with different . Our experiments showed that using MixUp the default configuration and weight decay degrades the performance (Tab. 1). However, MixUp itself is also a powerful regularizer, therefore, we conducted the experiments without weight decay (marked as no wd in Tab. 1). Interestingly, setting weight decay to increases the performance of our model with any . To assess the strength of regularization, we also conducted an experiment with (best) and without dropout. We observed that having dropout helps MixUp.
CutOut vs. Target Jitter
Besides MixUp, we tested two other data augmentation techniques – cutout 
and noise addition to the ground truth annotations during the training (uniform distribution,pixel). We observed that the latter did not improve the results of our configuration with MixUp, however the former helped to lower down the amount of outliers twice while yielding nearly the same localization performance. This configuration had a cutout of of the image. These results are also presented in Tab. 1.
Transfer Learning from Low-cost Labels.
At the final stage of our experiments, we used the best configuration that included the wing loss, MixUp with , weight decay of and cutout to train the ROI localization model. Essentially, both of these methods are landmark localization approaches, therefore, in our cross-validation experiments, we also assessed the performance of ROI localization using PCK. In our experiments, we found that pre-training of the landmark localization model on the ROI localization task significantly increases the performance of the former (see the last row of Tab. 1). The performance of both these models on cross-validation is presented in Fig. 5. Quantitatively, ROI localization model yielded PCK of , , , at mm, mm, mm and mm thresholds, respectively and had outliers.
|Setting||1 mm||1.5 mm||2 mm||2.5 mm||% out|
|AAM (IGO )|
|AAM (LBP )|
|CLM (IGO )|
|CLM (LBP )|
|Robust loss |
|Wing loss |
|Wing + regular res. block|
|Wing + mixup|
|Wing + mixup|
|Wing + mixup|
|Wing + mixip|
|Wing + mixup (no wd)|
|Wing + mixup (no wd)|
|Wing + mixup (no wd)|
|Wing + mixup (no wd)|
|Wing + mixup (no wd, no dropout)|
|Wing + mixup + jitter (no wd)|
|Wing + mixup + cutout 5% (no wd)|
|Wing + mixup + cutout 10% (no wd)|
|Wing + mixup + cutout 25% (no wd)|
|Wing + mixup + cutout 10% (no wd, finetune)|
Results of the model selection for high-cost annotations on the OAI dataset. The values of PCK/recall (%) at different precision are shown as average and standard deviation for the landmarks, , , , while the amount of outliers is calculated for all the landmarks. The comparison is done at mm image resolution (pixel spacing). Best results are highlighted in bold.
4.5 Test datasets
Testing on the full datasets
Testing of our model was conducted on datasets A and B, respectively. We provide the quantitative results in Tab. 2. In this table, we present two versions of our pipeline, one is a single stage, where the landmark localization follows directly after the ROI localization step, and also a two-stage pipeline that includes ROI localization as a first step, initial inference of the landmark points as a second step, and re-centering of the ROI to the predicted tibial center and a second pass of landmark localization model as a third step.
|1 mm||1.5 mm||2 mm||2.5 mm|
Testing with Respect to the presence of Radiographic Osteoarthritis
To better understand the behaviour of our model on the test datasets, we investigated the performance of our 2-stage pipeline and BoneFinder for cases having KL and KL , respectively. These results are presented in Fig. 6. Our method performs on par with BoneFinder for Dataset A and even exceeds its localization performance for precision thresholds above mm for radiograhic OA. In Dataset B, on average, our method performs better than BoneFinder when both methods are benchmarked for both non-OA and OA cases. To provide better insights into the performance of our method for different stages of OA severity, we show examples of landmark localization done by our method, BoneFinder and manually (Fig. 7).
In this paper, the problem of anatomical landmark localization in knee radiographs was addressed. We proposed a new method that combines the power of latest advances in facial landmark localization and pose estimation that allowed us to accurately localize the landmarks on the unseen data.
Compare to the current state-of-the-art [24, 25], our method generalized better to the unseen test datasets that had completely different acquisition settings and patient populations. Consequently, these results suggest that our new method may be easily applicable to various tasks in clinical and research settings.
Our study has still some limitations. Firstly, the comparison with BoneFinder should ideally be conducted when it is trained on the same mm resolution data with the same KL grade-wise stratification, or at full image resolution. However, we did not have access to the training code of BoneFinder, thereby, leaving more systematic comparison to future studies. Another limitation of this study is the ground truth annotation process. Specifically, we used BoneFinder to pre-annotate the landmark for all the images in both train and test sets. In theory, this might give an advantage to BoneFinder compared to our method. On the other hand, all the landmarks were still manually refined, which should decrease this advantage.
The core methodological novelties of the study were in adapting the MixUp, soft-argmax layer and transfer learning from low-cost annotations for training our model. We think that the latter has applications in other, even non-medical domains, such as human pose estimation and facial landmark localization. It was shown that compared to RFRV-CLM, Deep Learning methods scale with the amount of training data, and therefore, we also expect our method to yield even better results when it is trained on a larger datasets 
. Besides, we also expect semi-supervised learning to help in this task.
To summarize, we developed a robust method for anatomical landmark localization that has potential to scale with the amount of training data and be applied in the other domains. Our source codes and the annotations made for OAI dataset will be made publicly available.
The OAI is a public-private partnership comprised of five contracts (N01- AR-2-2258; N01-AR-2-2259; N01-AR-2- 2260; N01-AR-2-2261; N01-AR-2-2262) funded by the National Institutes of Health, a branch of the Department of Health and Human Services, and conducted by the OAI Study Investigators. Private funding partners include Merck Research Laboratories; Novartis Pharmaceuticals Corporation, GlaxoSmithKline; and Pfizer, Inc. Private sector funding for the OAI is managed by the Foundation for the National Institutes of Health.
Development and maintenance of VGG Image Annotator (VIA) is supported by EPSRC programme grant Seebibyte: Visual Search for the Era of Big Data (EP/M013774/1).
-  J. Alabort-i Medina, E. Antonakos, J. Booth, P. Snape, and S. Zafeiriou. Menpo: A comprehensive platform for parametric image alignment and visual deformable models. In Proceedings of the 22nd ACM international conference on Multimedia, pages 679–682. ACM, 2014.
-  K. D. Allen and Y. M. Golightly. Epidemiology of osteoarthritis: state of the evidence. Current opinion in rheumatology, 27(3):276, 2015.
J. Antony, K. McGuinness, K. Moran, and N. E. O’Connor.
Automatic detection of knee joints and quantification of knee osteoarthritis severity using convolutional neural networks.In , pages 376–390. Springer, 2017.
-  J. Antony, K. McGuinness, N. E. O’Connor, and K. Moran. Quantifying radiographic knee osteoarthritis severity using deep convolutional neural networks. In 2016 23rd International Conference on Pattern Recognition (ICPR), pages 1195–1200. IEEE, 2016.
J. T. Barron.
A general and adaptive robust loss function.
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4331–4339, 2019.
-  A. Brahim, R. Jennane, R. Riad, T. Janvier, L. Khedher, H. Toumi, and E. Lespessailles. A decision support tool for early detection of knee osteoarthritis using x-ray imaging and machine learning: Data from the osteoarthritis initiative. Computerized Medical Imaging and Graphics, 73:11–18, 2019.
-  A. Bulat and G. Tzimiropoulos. Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources. In Proceedings of the IEEE International Conference on Computer Vision, pages 3706–3714, 2017.
-  O. Chapelle and M. Wu. Gradient descent optimization of smoothed information retrieval metrics. Information retrieval, 13(3):216–235, 2010.
-  P. Chen, L. Gao, X. Shi, K. Allen, and L. Yang. Fully automatic knee osteoarthritis severity grading using deep neural networks with a novel ordinal loss. Computerized Medical Imaging and Graphics, 2019.
-  T. F. Cootes, G. J. Edwards, and C. J. Taylor. Active appearance models. IEEE Transactions on Pattern Analysis & Machine Intelligence, 23(6):681–685, 2001.
-  D. Cristinacce and T. F. Cootes. Feature detection and tracking with constrained local models. In Bmvc, page 3. Citeseer, 2006.
-  A. K. Davison, C. Lindner, D. C. Perry, W. Luo, T. F. Cootes, et al. Landmark localisation in radiographs using weighted heatmap displacement voting. In International Workshop on Computational Methods and Clinical Applications in Musculoskeletal Imaging, pages 73–85. Springer, 2018.
-  T. DeVries and G. W. Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.
-  A. Dutta and A. Zisserman. The VIA annotation software for images, audio and video. arXiv preprint arXiv:1904.10699, 2019.
-  Z.-H. Feng, J. Kittler, M. Awais, P. Huber, and X.-J. Wu. Wing loss for robust facial landmark localisation with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2235–2245, 2018.
-  Z.-H. Feng, J. Kittler, and X.-J. Wu. Mining hard augmented samples for robust facial landmark localization with cnns. IEEE Signal Processing Letters, 26(3):450–454, 2019.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
-  S. Honari, P. Molchanov, S. Tyree, P. Vincent, C. Pal, and J. Kautz. Improving landmark localization with semi-supervised learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1546–1555, 2018.
-  T. Janvier, H. Toumi, K. Harrar, E. Lespessailles, and R. Jennane. Roi impact on the characterization of knee osteoarthritis using fractal analysis. In 2015 International Conference on Image Processing Theory, Tools and Applications (IPTA), pages 304–308. IEEE, 2015.
-  J. Kellgren and J. Lawrence. Radiological assessment of osteo-arthrosis. Annals of the rheumatic diseases, 16(4):494, 1957.
-  D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
-  M. Kothari, A. Guermazi, G. von Ingersleben, Y. Miaux, M. Sieffert, J. E. Block, R. Stevens, and C. G. Peterfy. Fixed-flexion radiography of the knee provides reproducible joint space width measurements in osteoarthritis. European radiology, 14(9):1568–1573, 2004.
-  Z. Laskar, I. Melekhov, H. R. Tavakoli, J. Ylioinas, and J. Kannala. Geometric image correspondence verification by dense pixel matching. arXiv preprint arXiv:1904.06882, 2019.
-  C. Lindner, P. A. Bromiley, M. C. Ionita, and T. F. Cootes. Robust and accurate shape model matching using random forest regression-voting. IEEE transactions on pattern analysis and machine intelligence, 37(9):1862–1874, 2014.
-  C. Lindner, S. Thiagarajah, J. M. Wilkinson, G. A. Wallis, T. F. Cootes, arcOGEN Consortium, et al. Accurate bone segmentation in 2d radiographs using fully automatic shape model matching based on regression-voting. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 181–189. Springer, 2013.
-  I. Melekhov, A. Tiulpin, T. Sattler, M. Pollefeys, E. Rahtu, and J. Kannala. Dgc-net: Dense geometric correspondence network. In 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1034–1042. IEEE, 2019.
-  A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation. In European conference on computer vision, pages 483–499. Springer, 2016.
-  B. Norman, V. Pedoia, A. Noworolski, T. M. Link, and S. Majumdar. Applying densely connected convolutional neural networks for staging osteoarthritis severity from plain radiographs. Journal of digital imaging, 32(3):471–477, 2019.
-  T. Ojala, M. Pietikäinen, and T. Mäenpää. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis & Machine Intelligence, 24(7):971–987, 2002.
-  A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer. Automatic differentiation in pytorch. In NIPS Workshop Autodiff, December 2017.
-  C. Payer, D. Štern, H. Bischof, and M. Urschler. Integrating spatial configuration into heatmap regression based cnns for landmark localization. Medical Image Analysis, 54:207–219, 2019.
-  J. Podlipská, A. Guermazi, P. Lehenkari, J. Niinimäki, F. W. Roemer, J. P. Arokoski, P. Kaukinen, E. Liukkonen, E. Lammentausta, M. T. Nieminen, et al. Comparison of diagnostic performance of semi-quantitative knee ultrasound and knee radiography with mri: Oulu knee osteoarthritis study. Scientific reports, 6:22365, 2016.
-  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015.
-  J. Thomson, T. O’Neill, D. Felson, and T. Cootes. Automated shape and texture analysis for detection of osteoarthritis from radiographs of the knee. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pages 127–134. Springer, 2015.
-  A. Tiulpin. Solt: Streaming over lightweight transformations. https://github.com/MIPT-Oulu/solt, 2019.
-  A. Tiulpin, S. Klein, S. Bierma-Zeinstra, J. Thevenot, E. Rahtu, J. van Meurs, E. H. Oei, and S. Saarakkala. Multimodal machine learning-based knee osteoarthritis progression prediction from plain radiographs and clinical data. arXiv preprint arXiv:1904.06236, 2019.
-  A. Tiulpin and S. Saarakkala. Automatic grading of individual knee osteoarthritis features in plain radiographs using deep convolutional neural networks. arXiv preprint arXiv:1907.08020, 2019.
-  A. Tiulpin, J. Thevenot, E. Rahtu, P. Lehenkari, and S. Saarakkala. Automatic knee osteoarthritis diagnosis from plain radiographs: A deep learning-based approach. Scientific reports, 8(1):1727, 2018.
-  A. Tiulpin, J. Thevenot, E. Rahtu, and S. Saarakkala. A novel method for automatic localization of joint area on knee plain radiographs. In Scandinavian Conference on Image Analysis, pages 290–301. Springer, 2017.
-  G. Tzimiropoulos, S. Zafeiriou, and M. Pantic. Subspace learning from image gradient orientations. IEEE transactions on pattern analysis and machine intelligence, 34(12):2454–2466, 2012.
-  Y. Wu and Q. Ji. Facial landmark detection: A literature survey. International Journal of Computer Vision, 127(2):115–142, 2019.
-  H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. mixup: Beyond empirical risk minimization. arXiv preprint arXiv:1710.09412, 2017.