Multi-tensor Completion for Estimating Missing Values in Video Data

by   Chao Li, et al.

Many tensor-based data completion methods aim to solve image and video in-painting problems. But, all methods were only developed for a single dataset. In most of real applications, we can usually obtain more than one dataset to reflect one phenomenon, and all the datasets are mutually related in some sense. Thus one question raised whether such the relationship can improve the performance of data completion or not? In the paper, we proposed a novel and efficient method by exploiting the relationship among datasets for multi-video data completion. Numerical results show that the proposed method significantly improve the performance of video in-painting, particularly in the case of very high missing percentage.



There are no comments yet.


page 2

page 3


JULIA: Joint Multi-linear and Nonlinear Identification for Tensor Completion

Tensor completion aims at imputing missing entries from a partially obse...

Low Rank Tensor Completion for Multiway Visual Data

Tensor completion recovers missing entries of multiway data. Teh missing...

Comprehensive Feature-based Robust Video Fingerprinting Using Tensor Model

Content-based near-duplicate video detection (NDVD) is essential for eff...

Une nouvelle approche de complétion des valeurs manquantes dans les bases de données

When tackling real-life datasets, it is common to face the existence of ...

An Augmented Regression Model for Tensors with Missing Values

Heterogeneous but complementary sources of data provide an unprecedented...

Brain-Computer Interface with Corrupted EEG Data: A Tensor Completion Approach

One of the current issues in Brain-Computer Interface is how to deal wit...

Flow-edge Guided Video Completion

We present a new flow-based video completion algorithm. Previous flow co...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

In computer vision and graphics, many methods were developed for image and video in-painting. Since not only RGB images but also videos can be considered as multi-mode arrays, more and more attention was paid on tensor-based algorithms

[1][2][3]. However, it should be noted that the majority of methods cannot have a satisfactory performance when too much data was missed (e.g. the missing percentage of pixels is higher than 95%). It is because remaining observations have been not sufficient to predict missing values well. Fortunately, in many real scenarios, more than one device is usually used for observation, and all the datasets are mutually related. Thus it is straightforward to infer that we can achieve a better performance for image and video in-painting by exploiting the relationship among multi-datasets.

In this paper, we proposed a novel method for multi-tensor completion. We individually applied a low-rank approximation on every unfolding matrix along each mode of data, and folded estimations together as the prediction of missing values. More details will be discussed in Section 2. Additionally, related work will be introduced in Section 3, and Section 4 will provide results of numerical experiments.

Ii Objective and algorithm

Suppose that there are tensors with missing values which are indexed by binary tensor (0-unobserved, 1-observed). In order to complete datasets, for each individual tensor , the corresponding estimation can be obtained by optimizing a traditional objective as [1][3]


where denotes nuclear norm of a tensor, and operator denotes choosing observed elements of which the corresponding elements in are equal to 1. In multi-tensor case, and by the definition of the unclear norm of a tensor in [4], we have


where denotes unfolding tensor along the th mode, is equal to the number of modes of , and are weights satisfying . In order to search the optimal point of Eq.2 efficiently, we use a low-rank factorization with Frobenius norm regularization to approximate each unfolding of tensors. Specifically, we have an equivalent form of Eq.2 as


where is tuning parameter, and denotes factor matrices for . It is well known that Frobenius norm regularization on each factor matrices results in a low-rank approximation of [5]. Thus Eq.3 can be considered as a equivalent form of Eq.2.

For related multi-tensor, it is straightforward to infer that datasets would share information along some modes. Thus we suppose that there exists , or , such that . It will be found that this assumption plays the key role to improve the performance for completion. To search the local optimal point of Eq.3, we can alternately update by


where operator denotes transforming a matrix into a tensor which is opposite to the operator , and

denotes a identity matrix. In the algorithm, Eq.

4-7 are alternately used to update until convergence. For initialization, choosing both

randomly and by Singular Value Decomposition (SVD) are recommended, and missing elements in

can be initialized by zero. Compared to other traditional nuclear norm based methods, our method is non-convex. But it can stably provide a good performance in practice.

Iii Related work

Many tensor based methods for completion have been developed[2][1][3][6][7], where [2] and [7] are based on tensor decomposition to predict missing values while [1][3][6] minimize nuclear norm of a tensor directly. But all of them just focus on a single tensor data. It is worthwhile to notice that TMac proposed in [6] used a similar optimization method to our method. However, we use the Frobenius norm regularization on factor matrices to control the rank of unfolding of tensors but TMac only justifies the number of column of latent matrices to control the nuclear norm.

In many different field, multi-data analysis or group analysis was also deeply discussed[8][9][10]. But the majority of methods focus on common and distinctive component extraction. In this paper, we do not care latent features in each dataset. Thus we can reduce the complexity of algorithms compared to tensor decomposition based method like [2][7]. A multi-matrix completion method Convex Collective Matrix Factorization (cCMF) was proposed in [10]. Compare to cCMF, The proposed method in this paper can be considered as an extension of cCMF into tensor problem.

Iv Experiments

In this section, we use two experiments to evaluate the performance of the proposed method. In the first experiment, videos from IXMAS111 were used to jointly predict missing values. IXMAS consists of videos in which 13 daily-live motions (e.g. sit down, check watch, kick, etc.) performed by 11 actors, actors can choose freely position and orientation, and each action was simultaneously recorded by 5 cameras. In the experiment, we choose 4 videos from two actors, they did the similar action (sit down), and there are two views for each actor. For each video, it can be modelled as a mode-3 tensor (PixelChannelTime) . Since two actors did similar action and each action was recorded by two cameras, videos recorded from the same camera can be supposed to have common pixel and channel information, and videos from the same actors share time-mode information. Under such an assumption, we implement the proposed method for video in-painting.

Methods 95% 50% 10%
Data 1 Data 2 Data 1 Data 2 Data 1 Data 2
Proposed method 0.085 0.270 0.018 0.115 0.006 0.082
FaLRTC 0.016 0.129 0.006 0.078
HardC 0.079
CPWOPT - - - -
TABLE I: RSE of prediction of multi-video in-painting in different missing percentage with 10 runs
(b) RedSkirt
Fig. 1: Average running time of algorithms in two experiments
Fig. 2: Comparison of algorithms in the first frame of IXMAS

In the second experiment, we use multi-view videos (HeightWidthChannelTime) of human action ( body move of a Asian lady ) which were simultaneously recorded by 3 cameras222 Since the three videos were recorded at the same time with different position, it is easy to infer that they share the information along the time mode. Further, we resize each video so that they lost common information from pixels, therefore tensor concatenation cannot be used for in-painting under this data. Such the case can be usually found when different type of cameras were used in real application. Tab.I shows the Relative Square Error (RSE) for missing value prediction with different missing percentage, and Fig.1 shows comparison of the average running time from two experiments.

For comparison, FaLRTC[1], HardC[3], TMac[6], and CPWOPT[2] was individually implemented on every video. It is shown from Tab.I that the proposed method outperform other methods, particularly in the case of high missing percentage. It is because, in the case of high missing percentage, remaining information from observations for each individual video are too little for completion, but the proposed method can exploit relationship from all the datasets, and such the relationship provides more information to each dataset for missing value prediction. It is seen from Fig.1 that the running time of the proposed time is comparable to TMac, HardC, and FaLRTC, while tensor-decomposition based algorithm CPWOPT which is not shown in Fig.1 are much slower than the other four methods. Fig.2 shows completion results from all the methods to estimate the first frame of IXMAS. Fig.2 also reflects that the proposed method can achieve a better performance for video in-painting.

V Conclusion

In this paper, we developed a novel completion method for multi-tensor. Compared to traditional approaches, the proposed method can exploit common information shared among datasets. By numerical results, it is demonstrated that the shared information can improve the performance in multi-videos in-painting particularly in the case of high missing percentage.


This paper was supported by the China Scholarship Council.