Introduction
Grouping a set of objects in an unsupervised way that objects in the same group (called a cluster) are more similar to each other than these in other groups (i.e., object clustering) has attracted a lot attention in both academic and industrial communities in the past decades. Most current object clustering works [1, 29, 27, 28, 24, 4]
aim at recognizing “similar behavior” based on visual information captured by a visual camera (e.g., RGB or Depth camera) or represented by different description methods (e.g., SURF, LBP or deep features). These above methods have been successfully applied into statistics, computer vision, biology or psychology
[23, 10, 11, 18, 20].However, most existing object clustering works ignore one of the important sensing modality, i.e., tactile information (e.g., hardness, force, and temperature), which casts a light in compensating visual information on many practical manipulation tasks [16, 26]. For example, in the practical situation that a robot grasps an apple, the visual information of the apple becomes unobservable due to the occlusion of a robot hand while the tactile information can be easily obtained. Some objects whose appearance are visually similar can be hardly distinguished via merely using visual information (e.g., ripe versus unripe fruits). However, the ripe versus unripe fruits can be easily distinguished by tactile properties (e.g., hardness). Besides, some objects cannot be well distinguished only by either visual information or tactile information. For instance, it is hard to differentiate three visually similar bottles, where two bottles are empty and the remaining one is full of water. Hence, it is beneficial from each other to perform object clustering by fusing visual and tactile modalities.
To integrate visual with tactile information, a naive solution is to treat visual or tactile data as single view data, and directly perform the existing multiview clustering methods on the visualtactile object clustering task. However, the gap between visual and tactile modalities is very large [15]. On the one hand, the devices which are used to collect tactile and visual data are different. Tactile sensor obtains tactile data through constant physical contact, while the visual modality can simultaneously generate multiple different features of an object at a distance. Moreover, the format, frequency and receptive field is diverse since visual sensor usually perceives color, global shape and rough texture, while touch sensor is usually used to acquire detailed texture, hardness and temperature. Therefore, how to establish a novel visualtactile fusion object clustering model, which can tackle intrinsic gap challenge across visual and tactile data, is our focus in this work.
To address the challenges mentioned above, in this paper, we propose a deep AutoEncoderlike Nonnegative Matrix Factorization (NMF) framework for visualtactile fused object clustering. More specifically, deep NMF constrained with an undercomplete AutoEncoderlike structure is adopted to learn the hierarchical semantics, while preserving the local data structure among visual and tactile data in a layerwise manner. Then, we introduce a graph regularizer to reduce the differences between similar points inside each modality. Furthermore, as a nontrivial contribution, we carefully design a sparse consensus regularizer to tackle the intrinsic gap problems between visual and tactile data. We explore a consensus constraint to interact the individual component between different modalities with final consensus representation to align two modalities. Thus, it plays as the modalitylevel constraint to supervise the generation of a common subspace, in which the mutual information on visual and tactile data is maximized. To optimize our proposed framework, an efficient alternating minimization strategy is present. To the end, we conduct extensive experiments on public datasets to evaluate the effectiveness of our framework, wherein ours outperforms the stateofthearts. The contributions are summarized as:

We propose a deep AutoEncoderlike Nonnegative Matrix Factorization framework for visualtactile fusion object clustering. To our best knowledge, this is a pioneering work to incorporate visual modality with tactile modality in the object clustering task.

We develop an undercomplete AutoEncoderlike structure to jointly learn the hierarchical semantics and preserve the local data structure. Meanwhile, we design a sparse consensus regularization to seek a common subspace, in which the gap between visual and tactile modalities is mitigated and the mutual information is maximized.

To solve our proposed framework, an efficient solution based on an alternating direction minimization method is provided. Extensive experiment results verify the effectiveness of our proposed framework.
Related Work
The work in this paper lies in the tasks of visualtactile sensing and multiview clustering. We thus introduce the related work including visualtactile sensing and multiview clustering in this section.
VisualTactile Sensing
Vision and touch are the most important sensing modalities both for robots and humans, and they are widelyapplied in robot tasks [6, 16, 26, 3]. Generally, visualtactile sensing can be mainly divided into three categories including object recognition, 3D reconstruction and crossmodal matching.
Amongst the fields mentioned above, Liu et al. propose a visualtactile fusion framework to recognize household objects based on kernel sparse coding method [16]
. Yuan and Luo et al. propose a deep learning framework for clothing material perception by fusing visual and tactile information
[25]. Ilonen et al. develop to reconstruct 3D model of unknown symmetric objects by fusing visual and tactile information [6]. Wang et al. present to perceive accurate 3D object shape with a monocular camera and a highresolution tactile sensor [22]. Yuan et al. propose a multiinput net to connect the visual and tactile properties of fabrics [26]. Li et al. introduce a conditional generative adversarial network based prediction model to connect visual and tactile measurement [13]. Although the previous models have been successfully applied in supervised learning in the visualtactile sensing fields, its application in object clustering is still under insufficient exploration.
MultiView Clustering
Multiview clustering has shown remarkable successes in many realworld applications. Based on standard spectral clustering
[19], cotraining [7] and coregularizer [8] are performed to enforce consistence of different views. Based on the subspace clustering strategy, Cao and Zhang et al. try to capture complementary information from different views in the manner of subspace representations [1, 27] . Based on the framework of nonnegative matrix factorization and its variants [21], Li et al. propose a consensus clustering and semisupervised clustering method based on SemiNMF [12]. Zhao et al. propose a deep SemiNMF method for multiview clustering [29].The Proposed Method
NMF Revisit
NMF and its variants [9, 14] have previously shown to be promising in the field of multiview clustering. The objective of NMF can be defined as:
(1) 
where is the input feature matrix, is the basis matrix and is the compact representation, respectively. We can obtain the final clustering result by performing standard spectral clustering [19] on . However, in realworld applications, it is not enough to learn intrinsic data structure with singlelayer NMF due to complex data structure and data noise. Zhao et al. show that a deep NMF model has an appealing performance in data representation [29]. The deep NMF can be formulated as:
(2) 
where and represent the basis matrix and representation for the th layer, respectively. Inspired by this idea, we intend to explore the deep NMF architecture into our visualtactile object clustering framework.
The Proposed Framework
In the setting of visualtactile fusion object clustering framework, we use as the input data, where is the number of modalities ( is defined as for the visualtactile clustering task in this work), and represents the th modality. denotes the feature matrix for the th modality, represents the dimension of the feature, denotes the number of data samples. Then, we propose our deep visualtactile fused object clustering model as follows:
(3)  
where is the number of layer, and are the regularization parameters. represents the high hierarchical semantics of the th modality.
Moreover, the first and second terms denote the NMF constrained by an undercomplete AutoEncoderlike structure, which is designed to learn the hierarchical semantics while preserving the local structure of the input visual and tactile data. The first term denotes an undercomplete decoder process controlling the dimension of lower than and further force NMF to learn more salient features representation of . The second term denotes an encoder process which implicitly maintains the local data structure via recovering from . Furthermore, we have the following Remarks for the used regularization.
Remark 1
The graph regularization in the third term is designed to pull the similarities of nearby points inside each modality. denotes the graph Laplacian matrix for the th modality, constructed in nearest neighbor manner. By using the Eigendecomposition technique on , i.e., , we obtain: , where
. However, the process of collecting tactile or visual data is easily contaminated by environmental change, which leads to noise and outliers in the source data. Meanwhile, Frobenius norm is sensitive to the noises and outliers. We thus replace Frobenius norm by the
norm, which can jointly remove outliers and uncover more shared representation across the nearby points inside each modality.Remark 2
The last item is the consensus regularization, which is designed to tackle the intrinsic gap problem between visual and tactile data. This term directly measures the similarity between and in a utility way, where is the best mapping matrix to align to . After aligning to , the norm constraint is to calculate the dissimilarity between and in an efficient way. Therefore, this term plays as a modalitylevel constraint and learn a project matrix , which projects into the common subspace . In this subspace, the mutual information on each modality is maximized, which ultimately contributes to the object clustering.
Then the objective function Eq. (3) is further reformulated as:
(4)  
Optimization
To efficiently solve the optimization problem Eq. (4), we propose a solution based on alternating direction minimization algorithm. To reduce the training time, we pretrain each layer to approximate the factor matrices and . For the pretraining process, we decompose the input data matrix by minimizing first, where and . Then we decompose as , where and . is the dimension of layer and is the dimension of layer ^{1}^{1}1The layer size for layer to is denoted as in this paper. Repeating the process until all layers have been pretrained. Then each layer is finetuned by alternating minimization of the proposed framework in Eq. (4). Specifically, the update rules for each variable are as follows.
Update rule for :
With other variables fixed, we can have the following Lagrangian objective function:
(5)  
where , and is set as when . Taking the derivative to zero and applying the KarushKuhnTucker (KKT) conditions, we can have:
(6)  
This process converges because this is a fixed point equation. Then we obtain the update rule as:
(7) 
where represents the elementwise product.
Update rule for :
By utilizing a similar proof as [29], we can formulate the update rule for as follows:
(8) 
Update rule for and :
Solving these variables is a challenging problem since it is hard to directly get the explicit solutions. We thus introduce two auxiliary variables and to transform the optimization Eq. (4), and obtain the following objective function:
(9)  
After converting Eq. (9) to an augmented Lagrangian function, we obtain the following expression:
(10)  
where , and
are the Lagrangian multipliers,initialized with zero matrix;
, and are the parameters for penalty; is the slackness variable to satisfy the nonnegative constraint for . We then employ the alternating direction method of multipliers to solve this equation, and the update rules are as follows.Update rule for : With other variables fixed, we can have the following Lagrangian objective function:
(11)  
Taking the derivative respect of to zero, we obtain:
(12)  
Since Eq. (12) is a standard Sylvester equation, it can be effectively solved by BartelsStewart algorithm.
Update rule for and : With other variables fixed except for , we can have the following Lagrangian objective function:
(13)  
Taking the derivative to zero, we obtain the following update rule:
(14) 
where denotes the MoorePenrose pseudoinverse.
Similarly, can be updated with the following rule:
(15) 
Update rule for and : and are solved in a similar way as that to solve , and we thus obtain the following update rules. The update rule for is written as follows:
(16) 
where is a diagonal matrix with the ith diagonal element as . is the th row of the matrix .
is the identity matrix.
The update rule for can be written as follows:
(17) 
Until now, we have obtained all the update rules. We summarize the overall update process of the proposed framework in Algorithm 1.
After obtaining the optimized , we could obtain the final clustering result by performing a standard spectral clustering on .
Time Complexity
For the computational complexity, our proposed model consists of two steps, i.e., the pretrained stage and the finetuned stage. In order to simplify the analysis, we suppose that all the layers are with the same size of hidden units. In the pretrained stage, the computational complexity , where is the number of modalities, is number of layers, is the layer size, is the feature dimension, is the number of samples and is the number of iterations to achieve convergence in the pretraining process. In the finetuned stage, the computational complexity is , where is the number of iterations. Thus, the total time complexity is .
Experiments
In this section, we evaluate the performance of our proposed model via several empirical comparisons. We first provide the used datasets and experiment results, followed by some analyses about our model.
Experimental Setting
Extensive experiments are conducted on two visualtactile fusion datasets and one benchmark dataset to evaluate our proposed model: 1) PHAC2^{2}^{2}2http://people.eecs.berkeley.edu/ yg/icra2016 dataset: it contains color images and tactile signals of household objects. In this paper, we utilize all images and the first 8 tactile signals. 4096D visual and 2048D tactile features are extracted in a similar way as [5]. 2) GelFoldFabric^{3}^{3}3http://people.csail.mit.edu/yuan_wz/fabricperception.htm dataset: it contains color images and tactile images of kinds of fabrics. More details about this dataset can be found in [26]. In this paper, we use the pretrained VGG19 net to extract 4096D features both for tactile and visual images. 3) Yale^{4}^{4}4http://vision.ucsd.edu/content/yalefacedatabase dataset: it is employed to evaluate the performance of the proposed framework when the modality number of the input data is more than 2, which contains images of subjects. Similar to [29], three kinds of features (i.e., 3304D LBP, 4096D intensity, 6750D Gabor) are extracted as different views.
Method  ACC  NMI  AR  Fscore  Precision  Recall 

Vision  35.141.89  64.731.35  10.812.68  13.182.49  8.350.24  13.241.48 
Touch  26.251.03  55.970.79  7.520.80  9.340.72  7.910.85  11.480.63 
ConcatFea  46.931.28  68.060.39  25.350.25  26.66 1.26  25.100.98  27.941.01 
ConcatPCA  47.190.81  68.010.33  26.130.69  27.410.67  26.060.78  28.920.59 
CoReg  50.980.20  61.050.51  15.310.63  16.810.62  15.750.58  18.040.66 
CoTraining  52.301.70  72.361.30  32.371.90  32.303.00  33.522.90  36.742.90 
MinD  47.982.77  67.853.50  25.145.20  26.475.10  24.515.00  28.804.60 
MultiNMF  51.980.82  70.810.32  30.120.94  32.130.92  30.670.93  33.741.00 
DiMSC  36.991.17  65.690.77  18.210.97  17.860.92  15.631.10  19.020.70 
DMFMVC  55.020.96  72.960.31  34.390.55  35.530.53  33.860.66  37.830.53 
GLMSC  37.503.34  61.971.84  16.372.87  17.832.81  16.972.77  18.792.86 
Ours  59.171.40  75.270.54  38.971.13  40.031.11  38.121.29  42.150.96 
Method  ACC  NMI  AR  Fscore  Precision  Recall 

Vision  35.461.08  65.910.70  17.301.26  17.961.25  16.871.17  19.211.34 
Touch  33.921.05  65.000.52  15.710.92  16.390.91  15.420.85  17.481.00 
ConcatFea  36.560.82  66.950.27  18.530.58  19.190.58  18.020.48  20.530.77 
ConcatPCA  37.151.20  67.280.61  19.131.35  19.781.34  18.571.18  21.151.55 
CoReg  45.801.28  55.330.47  36.090.68  36.540.70  33.390.88  39.630.78 
CoTraining  37.850.78  45.850.78  35.141.70  35.591.74  32.432.00  39.271.62 
MinD  43.132.49  45.920.98  34.942.30  35.392.28  32.472.21  38.732.30 
MultiNMF  52.010.99  75.300.36  34.690.17  35.180.95  33.271.17  37.080.72 
DiMSC  37.730.77  66.970.47  18.350.77  18.030.76  17.080.85  20.110.62 
DMFMVC  53.030.82  76.600.36  36.500.98  36.610.76  34.710.87  39.020.92 
GLMSC  55.921.49  78.350.28  39.700.52  40.190.51  37.560.29  43.220.81 
Ours  62.190.55  80.730.24  45.860.65  46.251.02  44.130.93  49.490.66 
Method  ACC  NMI  AR  Fscore  Precision  Recall 

BestSV  61.603.00  65.400.90  44.001.10  47.501.10  45.701.10  49.501.00 
ConcatFea  54.403.80  64.100.60  39.200.90  43.100.80  41.500.70  44.800.80 
ConcatPCA  57.803.80  66.503.70  39.601.10  43.401.10  41.901.20  45.000.90 
CoReg  56.400.20  64.800.20  43.600.20  46.600.00  45.500.40  49.100.30 
CoTraining  63.000.10  67.200.60  45.201.00  48.700.09  47.001.00  50.501.62 
MinD  61.504.30  64.500.50  43.300.60  47.000.60  44.600.50  49.600.60 
MultiNMF  67.300.10  69.000.10  49.500.10  52.700.00  51.200.03  54.300.02 
DiMSC  70.900.30  72.701.00  53.500.10  56.400.20  54.300.10  58.600.30 
DMFMVC  74.501.10  78.201.00  57.900.20  60.100.20  59.800.10  61.300.20 
GLMSC  75.453.86  78.432.93  54.000.50  57.090.95  51.812.23  63.763.60 
Ours  80.730.63  82.090.94  64.510.69  63.350.66  62.250.73  65.091.17 
Comparison Models and Evaluation
We compare our proposed framework with the following models including 7 multiview baselines and 4 related singleview baselines. Related singleview clustering competitors: Vision (Touch) performs standard spectral clustering [19] on the visual (tactile) features; ConcatFea concatenates all features first and then carries out standard spectral clustering; ConcatPCA concatenates all the features and does PCA to project the concatenated features into a low dimensional subspace, then performs standard spectral clustering on the projected features; Multiview clustering competitors: CoReg [8] enforces the number shape between different views via coregularizing the clustering hypotheses; CoTraining [7] works on the hypothesis that the true underlying clustering would assign a point to the same cluster irrespective of the view; MinD [2] creates a bipartite graph basing on the “minimizingdisagreement” idea; MultiNMF [17] utilizes nonnegative matrix factorization to seek the common latent subspace for multiview input data; DiMsc [1] utilizes a diversity term to explore the complementary information of multiview data; DNMFMVC [29] proposes a deep nonnegative matrix factorization framework to capture the mutual information of multiview data; GLMSC [27] simultaneously seeks the underlying representation and explores complementary information of multiview data.
Similar to [1, 29], six different metrics i.e., accuracy (ACC), normalized mutual information (NMI), Precision, Fscore, Recall, adjusted rand index (AR) are adopted to evaluate the clustering performance. Higher value indicates the better performance for all metrics. We run all algorithms times and report the mean values along with standard deviations. Table 1 and Table 2 show the object clustering results on PHAC2 dataset and GelFabric dataset, respectively. Table 3 shows the results on Yale dataset. BestSV performs standard spectral clustering on the features in each view and reports the best performance. For avoiding overfitting, the maximum number of iterations is set to 150 for all experiments.
From the presented results, we obtain the following observations: our framework achieves very competitive performance when comparing with all the competing models, which reveals the remarkable effectiveness of our framework in object clustering task. Specifically, the results shown in Table 1 and Table 2 reveal the importance of fusing visual and tactile information when comparing with the models using visual (or tactile) information alone. This observation also reveals that our framework is able to utilize the visual and tactile information more effectively, when comparing with stateofthearts. The results in Table 3 also reveal that our framework is not limited to the modality (i.e., visualtactile fusion) case, and it can be applied into other applications whose modality number is more than .
Ablation Study Convergence Analysis
In this subsection, we analyze the proposed framework from three perspectives. Firstly, we analyze the effectiveness of the proposed AutoEncoderlike structure, graph regularization and the consensus regularization. Then, we analyze the parameter setting, followed by the convergence analysis.
Effectiveness of AutoEncoderlike Structure, Graph Regularization and Consensus Regularization: Figure 2 presents the effectiveness of the used items. We can draw the following conclusion. Overall, “Ours” achieves the best performance revealing that all the regularization and the AutoEncoderlike structure proposed in this paper contribute to learn the rich information between multimodality data which further boost the performance of clustering tasks. Specifically, “AE” achieve better performance than “None” denotes that via the proposed AutoEncoderlike structure which takes data local structure preservation into account could result better representation for the source data. “GR” achieve better performance than “None” reveal the effectiveness of the graph regularization which can pull the similarities of nearby points and remove outliers inside each modality. “CR” achieve better performance than “None” reveal that the proposed consensus regularization could fill the gap between visual and tactile data and ultimately boost the clustering tasks.
Parameter Analysis: To explore the effect of our used parameters, i.e, control parameters and and the layer size , we use PHAC2 dataset in this subsection. Specifically, Figure 3 shows the influence of ACC and NMI results w.r.t. the parameter under different layer sizes. As can be seen, under three different layer sizes, the framework performs best both in ACC and NMI when is set as . We thus set as default in this paper. Figure 4 explores the parameter sensitivity of the proposed framework w.r.t. the parameter under different layer sizes. In this experiment, is set as . Notice that the framework perform best both in ACC and NMI when is set as . So is set as default. Figure 3 and Figure 4 also explore the influence of model performance w.r.t. the layer sizes. We find that the setting of always leads to best performance. When the layer size is small, the framework is insufficient to learn the rich information behind the input data. And when the layer size is too large, it might introduce undesirable noise. This might be the possible reason why red curves perform better (i.e, layer size is ) than the blue curves (i.e.,)and the green curves (i.e.,).
Convergence Analysis: Even though we have not proved that the proposed framework theoretically converges, we present the convergence property empirically in Figure 5. The objective value and ACC are plotted and we choose the default parameters, i.e., , and layer size = in this experiments. Notice that the objective value gradually decreases until it converges after iterations. ACC has two stages: in the first stage, ACC increases rapidly; in the second stage, ACC grows slowly and sightly bumps until reaching the best performance.
Conclusion
In this paper, we propose a deep AutoEncoderlike NMF framework for visualtactile fusion object clustering. By constraining the deep NMF architecture by an undercomplete AutoEncoderlike structure, our framework can jointly learn the hierarchical semantics of visualtactile data and maintain the local structure of the source data. For each modality, a graph regularization is adopted to pull the similarities of nearby points and remove outliers inside each modality. To create a common subspace in which the gap between visual and tactile data is filled, a sparse consensus regularization is developed in this paper, while the mutual information amongst visual and tactile data is maximized. Extensive experiment results on two visualtactile fusion datasets and one benchmark dataset confirm the effectiveness of our framework, comparing with existing stateoftheart works.
References
 [1] (2015) Diversityinduced multiview subspace clustering. In CVPR, pp. 586–594. Cited by: Introduction, MultiView Clustering, Comparison Models and Evaluation, Comparison Models and Evaluation.
 [2] (2005) Spectral clustering with two views. In ICML Workshop, pp. 20–27. Cited by: Comparison Models and Evaluation.
 [3] (2019) Semantictransferable weaklysupervised endoscopic lesions segmentation. In ICCV, pp. 2304–2310. Cited by: VisualTactile Sensing.
 [4] (2020) Lifelong spectral clustering. In AAAI, Cited by: Introduction.
 [5] (2016) Deep learning for tactile understanding from visual and haptic data. In ICRA, pp. 536–543. Cited by: Experimental Setting.
 [6] (2014) Threedimensional object reconstruction of symmetric objects by fusing visual and tactile sensing. IJRR 33 (2), pp. 321–341. Cited by: VisualTactile Sensing, VisualTactile Sensing.
 [7] (2011) A cotraining approach for multiview spectral clustering. In ICML, pp. 393–400. Cited by: MultiView Clustering, Comparison Models and Evaluation.
 [8] (2011) Coregularized multiview spectral clustering. In NeurlPS, pp. 1413–1421. Cited by: MultiView Clustering, Comparison Models and Evaluation.
 [9] (2001) Algorithms for nonnegative matrix factorization. In NeurlPS, pp. 556–562. Cited by: NMF Revisit.
 [10] (2017) Sparse subspace clustering by learning approximation ℓ0 codes. In AAAI, Cited by: Introduction.
 [11] (2017) Projective lowrank subspace clustering via learning deep encoder. In IJCAI, Cited by: Introduction.
 [12] (2007) Solving consensus and semisupervised clustering problems using nonnegative matrix factorization. In ICDM, pp. 577–582. Cited by: MultiView Clustering.
 [13] (2019) Connecting touch and vision via crossmodal prediction. In CVPR, pp. 10609–10618. Cited by: VisualTactile Sensing.
 [14] (2011) Constrained nonnegative matrix factorization for image representation. TPAMI 34 (7), pp. 1299–1311. Cited by: NMF Revisit.
 [15] (2018) Robotic tactile perception and understanding: a sparse coding method. Springer. Cited by: Introduction.
 [16] (2016) Visual–tactile fusion for object recognition. TASE 14 (2), pp. 996–1008. Cited by: Introduction, VisualTactile Sensing, VisualTactile Sensing.
 [17] (2013) Multiview clustering via joint nonnegative matrix factorization. In ICDM, pp. 252–260. Cited by: Comparison Models and Evaluation.
 [18] (2018) Multimodal joint clustering with application for unsupervised attribute discovery. TIP 27 (9), pp. 4345–4356. Cited by: Introduction.

[19]
(2002)
On spectral clustering: analysis and an algorithm
. In NeurlPS, pp. 849–856. Cited by: MultiView Clustering, NMF Revisit, Comparison Models and Evaluation.  [20] (2019) Representative task selfselection for flexible clustered lifelong learning. ARKIV. Cited by: Introduction.

[21]
(2014)
A deep seminmf model for learning hidden representations
. In ICML, pp. 1692–1700. Cited by: MultiView Clustering.  [22] (2018) 3d shape perception from monocular vision, touch, and shape priors. In IROS, pp. 1606–1613. Cited by: VisualTactile Sensing.
 [23] (2013) Constrained clustering and its application to face clustering in videos. In CVPR, pp. 3507–3514. Cited by: Introduction.

[24]
(2019)
Deep spectral clustering using dual autoencoder network
. In CVPR, pp. 4066–4075. Cited by: Introduction.  [25] (2018) Active clothing material perception using tactile sensing and deep learning. In ICRA, pp. 1–8. Cited by: VisualTactile Sensing.
 [26] (2017) Connecting look and feel: associating the visual and tactile properties of physical materials. In CVPR, pp. 5580–5588. Cited by: Introduction, VisualTactile Sensing, VisualTactile Sensing, Experimental Setting.
 [27] (2018) Generalized latent multiview subspace clustering. TPAMI. Cited by: Introduction, MultiView Clustering, Comparison Models and Evaluation.
 [28] (2018) Binary multiview clustering. TPAMI 41 (7), pp. 1774–1782. Cited by: Introduction.
 [29] (2017) Multiview clustering via deep matrix factorization. In AAAI, pp. 11108–1113. Cited by: Introduction, MultiView Clustering, NMF Revisit, Update rule for :, Experimental Setting, Comparison Models and Evaluation, Comparison Models and Evaluation.
Comments
There are no comments yet.