1 Introduction
Terahertz pulsed imaging (TPI) systems hold great potential in many applications such as medical diagnosis of human tissue, the detection and chemical mapping of illicit drugs and explosives, and pharmaceutical tablet inspection Ychen_tablet1 ; ychen_tablet3 ; Muller2012690
. In a typical TPI measurement, the THz waveform for each pixel is recorded as a function of optical time delay. This provides a threedimensional (3D) data set where two axes describe an object’s spatial content, and the third one represents the (time) depth information. In TPI, the transient electric field, rather than the radiation intensity, is measured. Thus, by applying the one dimensional Fourier transform along the time axis we can get the magnitude and phase information of THz spectral data for each pixel. Despite these advantages, most existing TPI systems suffer from slow scanning speed and high implementation cost due to pixelbypixel raster scan mechanism. One approach to speed up the acquisition process is to reduce the number of samples (measurements) and yet preserve the reconstruction quality. Recently, compressive sampling (CS)
CS_Donoho ; CS_Candes1580791 has been widely used for this purpose. This theory essentially works based on random projections of input signals and sparse representation CS_intro during reconstruction. It holds great potential for sampling rates reduction, imaging time, power consumption and computational complexity. However, the main challenge in utilizing CS is implementation costs and specifically designing the sampling operator which requires costly and sophisticated hardware modules. Most recent researches on CSTHz has been focused to address this issue. For instance, in Hao_spin the authors propose a spining disk as a plausible sampling operator for highspeed compressive acquisition. In block_CS2 , a blockbased CS method is proposed, and in Chan2008a single pixel camera based on Bernoulli random matrix is presented. Recently, an improved version of
block_CS2 , called adaptive CS, is reported adaptiveCS . In this method, additional measurement points are adaptively added at the regions prone to degradation with the aim of improving reconstruction quality.In this paper, we consider a very simple acquisition scenario, i.e. incomplete (subsampled) data, which can be easily realized without too much of hardware burden. Instead, we aim to utilize advanced reconstruction techniques to retrieve the original data samples with least possible quality degradation. In the sequel, more details are explained.
It should be noted that the spatial and temporal (or spectal) features in 3D TPI data have intrinsic geometrical structures. If these structures are modelled properly, it may lead to new TPI imaging systems with faster acquisition speed and more accurate data analysis. This aspect is an open issue and has not been specifically studied in the THz literature. Over the past few years, there have been increased interests in the study of sparse representation, in which a signal is characterized by a few nonzero coefficients in a certain transform domain. Up until now, sparse modelling has found applications in many image processingrelated tasks such as acquistion, denoising, inpainting and superresolution
inpaint1 ; inpaint2 ; inpaint3 ; superres1 ; superres2 . Most of existing works are limited to a deterministic sparsifying transform, such as the Fourier, the wavelet and the curvelet etc. More recently, it has been shown that a signal can be represented with fewer coefficients over a learned dictionary from a number of training samples (see KSVD and references therein).In this paper, we aim to investigate dictionary learning from incomplete and noisy 3D TPI data set. In particular, the missing data are first estimated from random subsets using Bicubic interpolation. Incomplete TPI data set can occur in two different scenarios, in which both require retrieving the missing samples;
i) result of data transfer from a sender to receiver, and ii) subsampled THz data: a simplified case of CS where merely some samples are missing (this obviously requires much simpler sampling operator than original CS). As a novel tool to address the sample recovery problem, 3D dictionary learning is used here. Spatialtemporal dictionary learning is applied to the 3D data set by exploiting the joint sparse model. Finally, denoising is performed based on the learned dictionary through convex optimization. Experimental results show that even with 5% to 20% subsampled data, one can still get reliable spatial, structural and spectral information of the object. To the best of our knowledge, this is the first work demonstrating the advantages of 3D dictionary learning for THz data. These results can be exploited to speed up TPI measurement process substantially by reducing the quantity of data.This paper is organized as follows. In the next section we describe the proposed method including mathematical expression of the model, dictionary learning and joint sparse recovery. Section 3 is devoted to demonstrating its applicability to the experimental results. Finally, the conclusion is drawn in section 4.
2 Proposed method
2.1 Problem formulation
The first step for recovering the original data from incomplete (subsampled) observations is inpainting. This can be simply achieved using a preliminary and fast approach such as Bicubic interpolation block_CS2 . The main drawback is that the results of such techniques are of low quality especially in noisy scenarios. Other inpainting methods which rely on fixed transforms (e.g. 3D wavelet) 3Dwavelet also do not lead to acceptable results as we will see later in our experimental results. A recent family of methods which try to build up an adaptive dictionary directly over the incomplete data has shown improved results for 2D natural images when the observation rate is above elad_sparse_book . Efficient extension of these techniques for THz data, especially in severely subsampled noisy data () is our objective. To take advantages of existing techniques for both incomplete and noisy data we propose a new model shown in Fig. 1. If we denote the original clean THz datacube by and the incomplete noisy datacube by (of the same size), the estimated lowquality noisy datacube can be represented by (of the same size). Our aim is to denoise using a dictionary which is specifically designed to exploit both spatial and temporal correlations existing within THz data.
2.2 Dictionary learning
Since the THz data is huge, one cannot learn a dictionary directly over the entire datacube. More importantly, 3D nature of THz data should be taken into account when constituting the training samples. Assume that is the
th training vector which is obtained through partitioning
into small 3D blocks of size , where, and are the block sizes in spatial and temporal dimensions, respectively. These blocks may have spatial and temporal overlaps and are always converted to the columnvector of length . Fig. 2 illustrates the geometrical view of the proposed block extraction. In addition to capturing spatiotemporal structure in the proposed scheme, we have the flexibility to define the contributions of spatial and temporal dimensions by adjusting and , respectively.The next step is to find the dictionary and sparse coefficients, denoted by , from the training set . In spite of most traditional dictionary learning methods in which all training vectors are treated independently, we devote a different approach. In order to effectively exploit the 3D spatiotemporal structure of THz data we cumulate every training vectors into a matrix and solve the following problem:
(1) 
where denotes the th subset, is the dictionary, is the corresponding sparse coefficients matrix, and represents the decomposition error. This is a joint sparse model which simultaneously deals with the training vectors in . There are different ways to choose the training subset . For instance, one can cluster the 3D blocks and then group them based on the amount of similarity within the voxels of these blocks. For simplicity, and yet preserving the sparsity structure of neighboring blocks, we use rasterscanning with full spatial and temporal overlap from lefttoright and toptobottom as shown in Fig. 2. We have observed encouraging results using this scheme, but the performance of more sophisticated grouping techniques can be further investigated in the future. Here, we group all the blocks into subsets of size :
where is the ceiling operator. Correspondingly, ’s are defined as matrices as follows:
and the aim is to solve the following minimization problem subject to sparsity of :
(2) 
Most typical dictionary learning approaches work based on “alternating minimization” KSVD ; MOD1257971 ; olshausen.and.field.1996 . These methods mainly contain two major steps, coefficient update and dictionary update, which are alternately executed until reaching local minima for (2).
In order to jointly update the sparse coefficients we choose MMV (multiple measurement vector) sparse coding framework. In this framework, as opposed to SMV (single measurement vector), several sparse vectors are simultaneously recovered. The main assumption in MMV is that the sparse vectors admit a common sparsity pattern, i.e., the locations of nonzeros are the same for all vectors. Fig. 3 is a diagram illustrating the MMV model.
There are several algorithms in the literature to solve (2) with respect to . Most of these methods are extended from SMVbased approaches M_FOCUSS ; SOMP1 ; SOMP2 . We have found that SOMP^{1}^{1}1Available in SPAM: http://spamsdevel.gforge.inria.fr/doc/html/doc_spams002.html. (Simultaneous Orthogonal Matching Pursuit) SOMP1 ; SOMP2 works well for reconstruction of THz data and hence we utilized it for coefficients update.
Next, for the dictionary update (solving (2) with respect to ), we use a wellestablished method called KSVD KSVD . KSVD^{2}^{2}2KSVDBox: http://www.cs.technion.ac.il/~ronrubin/software.html
is a generalization of Kmeans clustering and updates the dictionary in a columnbycolumn scheme by using singular value decomposition (SVD) (full details in
KSVD ).2.3 THz data reconstruction
After finding and , we need to estimate the THz data. To do this, we define the following minimization problem:
(3) 
where and , both with length , are vectorized representations of datacubes (noisy) and (clean), respectively.^{3}^{3}3Note that and should not be confused with 3D blocks where always have subscript indices thorough the paper. Also, and , where is a huge binary matrix of size . has only one nonzero (i.e. 1) in each row. As a block extraction operator, extracts the voxels belonging to th block.
In (3), the leftmost term is the error between noisy and clean data, the middle term is related to the sparse decompostion error, and the rightmost term is the smoothness constraint with regularization parameter . Moreover, is the mean vector of the th block (i.e. average of all elements in ). Problem (3) is convex in and can be solved by zeroing its gradient, with respect to , which finally leads to:
(4) 
Although the above expression seems computationally expensive at the first glance, it does not need to be directly calculated in practice. Instead, since the inverting term in (4) is diagonal, we obtain the estimated datacube in a voxelwise fashion (similar to the strategy used in denoise_Elad ). We can show the voxelwise calculation of one voxel (taking the corresponding voxels in and into account) using the following sets of equations:
(5)  
where is the cost function to be minimized, and is the corresponding voxel in . It is seen that the last equation above well matches with (4) which is the nonpractical form of reconstructing datacube.
The pseudocode of the proposed method is given in Algorithm 1.
3 Experimental results
We present the results of applying the proposed method to two different types of THz data. The first dataset, which we call it “Tshape”, has size , and acquired across an area of 20mm22mm using a TPIscan1000 system (TeraView Ltd, Cambridge, U.K.), covering a spectral range from 0.1 to 3.5 THz. The sample used is a polythene pellet of a diameter of 25 mm. Inside the pellet there is a Tshaped plastic sheet which locates approximately 0.2 mm below the sample surface. For this dataset, the structural information such as thickness and depth are of interest. The second data is a low spatial resolution data called “Tablet data” acquired from pharmaceutical tablets. Two of such sets (called LA and TP) have size , and other seven sets have size . Extraction of spectral information such as chemical mapping is crucial for this dataset.
3.1 TShape data
Following the proposed model, we first applied Bicubic interpolation to the noisy incomplete data, yielding . We considered input additive Gaussian noise, leading to input SNR in the range dB, which is a reasonable range in practice. We then applied the proposed dictionary learning method to find the dictionary and corresponding sparse coefficients to reconstruct the entire datacube. Due to importance of structural information in Tshape data, we gave more contribution to the spatial domain than temporal domain for the dictionary learning. Hence, the experiments were conducted by setting the spatial size , and two different temporal sizes: (spatial dictionary), and (spatiotemporal dictionary). In any case we used full spatial and temporal overlaps. According to these settings, the obtained dictionaries were of size and . A sample reconstructed image with , , , , and input SNR of dB is given in Fig. 4. It is seen from this figure that the proposed method has recovered and denoised the original image from only of noisy samples with highest SNR among other methods. The proposed method was able to enhance the SNR of Bicubic interpolation results by up to dB. Other methods, i.e. softthresholding after applying 3D symlet4 wavelet transform 3Dwavelet , shows weaker performances compared to the proposed method. We also show in Fig. 5 the reconstructed SNR at the fixed observation rate of versus different input SNRs. As seen from this figure, the proposed approach outperforms other methods.
Next, performance of the proposed method against different values of and is investigated. For this purpose, SNRs of the recovered datacube for different observation rates are given in Table 1. From this table, the advantage of spatiotemporal dictionary can be observed by comparing the SNRs at , with those at . Also, the given SNRs in this table indicates that the proposed method is more effective than 3D wavelet transform. We empirically observed that gives the best performance for Tshape dataset.
observation rate  SNR (dB)  ,  
3D wavelet  Bicubic interp.  proposed method  
10.53  16.89  17.45  1, 1  
17.69  1, 10  
18.78  4, 1  
18.96  4, 10  
14.71  17.68  19.41  1, 1  
19.46  1, 10  
20.17  4, 1  
20.94  4, 10  
16.72  17.87  20.05  1, 1  
20.33  1, 10  
20.86  4, 1  
22.25  4, 10  
17.69  17.96  20.88  1, 1  
21.03  1, 10  
22.53  4, 1  
23.15  4, 10 
Table 2 gives an insight about the computational complexity of the proposed method. In this table, the elapsed time of the dictionary learning stage for different selections of and is shown. It is clearly seen that large values of can significantly reduce the computation time. This behavior supports the advantages of using joint sparse recovery which has already shown to improve the quality as well (in Table 1). It can be further seen from Table 2 that increasing also decrease the computation time, though not as dramatically as that for .
, observation  

rate  
1,1  110.3  122.5 
1, 10  5.0  7.6 
4, 1  93.3  97.6 
4, 10  4.8  5.9 
As a useful measure, we performed the thickness and depth evaluations before and after reconstruction of Tshape object. These results give structural information about the object of interest. Fig. 6 (a) geometrically illustrates these two parameters. Thickness () and depth () calculations are practically useful to identify inhomogeneities or defects in the object of interest Ychen_tablet1 . In order to find these parameters we refer to a sample temporal waveform at one spatial location shown in Fig. 6 (b). The obtained values for these two parameters are subject to appropriate scalings to be converted to millimeter (more details can be found in Ychen_tablet1
). The calculated means and standard deviations of these parameters for different observation rates are given in Table
3. The results in this table are shown for both original and reconstructed data using the proposed method. It is found from Table 3 that the accuracy of reconstruction using the proposed method is very high, as the depth and thickness are approximately equivalent to the original data.observation rate  reconstructed  original  

Thickness (mm)  
Depth (mm)  
3.2 Tablet data
The dictionary learning settings for Tablet data is different from those of TShape data. Due to very low spatial resolution of Tablet data, it is natural to give a much higher contribution to the temporal dimension during the dictionary learning. This is inline with the fact that the temporal/spectral information is more valuable than spatial information in pharmaceutical studies Ychen_tablet1 Ychen_table2_springer . Therefore, we selected and with full spatial and temporal overlaps within the blocks.
As a quantitative measure, we show in Fig. 7 the variations of reconstructed SNR of both LA and TP datasets versus different observation rates, where , , and were used for the proposed method. These values are chosen empirically and are manually tuned to achieve the best performance among other selections. We also observed that the achieved results are not sensitive to variations of and around the selected values. The dictionary columns size, i.e. , is set to due to lower spatial resolution in Tablet data compared to Tshape data. The input SNR for this experiment was set to dB. Similarly, we show the reconstructed SNR versus input SNR at observation rate. It is clearly seen that the proposed method is able to significantly improve the reconstructed SNR in both figures.
In order to observe the robustness of the proposed approach, more sets of tablet data were included in our experiments and the average performance was measured. Seven incomplete Tablet datasets at different observation rates were reconstructed by using wellknown methods. Three input SNR levels were used in this experiment. The results are given in Table 4. It is seen that the achieved results using the proposed method is better than those obtained using conventional techniques. Moreover, the average output SNRs are compatible with what obtained in the previous experiment (Figures 7 and 8).
Input SNR (dB)  observation rate ()  Output SNR (dB)  

3D wavelet  Bicubic interp.  proposed method  
10  7.91  9.49  14.71  
9.31  11.14  16.26  
12.16  13.75  18.12  
20  8.34  12.40  16.13  
10.67  14.83  17.22  
12.24  15.91  19.49  
30  9.81  14.92  18.23  
12.64  16.78  20.12  
14.31  18.63  22.70 
As mentioned before, chemical mapping is a useful tool to analyze and observe the uniformity of the tablet Ychen_tablet1 . Chemical mapping can be obtained using the terahertz spectral information. This evaluation should be performed in the frequencydomain, thus, we first take Fourier transform of the timedomain waveforms for both original and reconstructed data. Then, a reference (spectral) waveform denoted by is selected from original LA dataset at one pixel location (normally from the middle of the tablet). After that, a spectral matching should be performed to generate the chemical map. We used the cosine correlation mapping (CCM) CCM , as used in Ychen_tablet1 , for this purpose. If we define a spectral frequencydomain vector by , then the CCM can be calculated via:
(6) 
where is the number of spectral bands. Parameter is the angle between and , and therefore, is between 0 and 1. This evaluation should be performed for all pixels of the corresponding dataset. Smaller angle (larger ) means better match between reference spectra and the waveform under evaluation. The results of chemical mapping for our tablet data is shown in Fig. 9. We calculated CCM for four cases; TP with LA reference, LA with LA reference, TP with TP reference, and TP with LA reference. It is seen from Fig. 9 that both the reconstructed data have a close chemical map similar to the original ones, and that they have a smooth and uniform shape.
4 Discussion and Conclusion
In this paper, we addressed the THz data recovery from an incomplete noisy set of observations. The main advantage of starting with incomplete (subsampled) THz data is fast and inexpensive acquisition process. Inspired by recent developments in dictionary learning and sparse recovery frameworks, we proposed a spatiotemporal dictionary learning technique to find an efficient sparse domain for the THz data. The proposed method aimed at exploiting the existing temporal and spatial correlations. It is worthy here to know the difference between the timedomain THz system and visiblelight camera by noting their image formation details. In fact, even for a static object, one can still get 3D THz data, while for a visiblelight camera, the 3D data leads to a video sequence with moving scene. To be more accurate, in terms of spatial correlation, THz signal exhibits some similar correlations as that obtained by a visible camera. However, in terms of temporalcorrelation, it is significantly different from that of video sequences at visible spectrums. Note that the temporal THz data will either give about structural information like thickness and depth or provide some spectral information, e.g., the chemical component of an object, as presented in the previous section. Extensive sets of experiments were conducted to support the effectiveness and accuracy of the proposed method. The thickness and depth calculations, and chemical mapping analysis for two different types of THz data, along with the achieved SNRs, confirmed the advantages of the proposed approach. In future, we are going to extend the proposed method for the complex scenario where a joint spatiotemporal and frequency domain dictionary learning can be achieved.
5 Acknowledgement
This work was supported by the Engineering and Physical Sciences Research Council (EPSRC), UK, under project EP/I038853/1. We would also like to thank Dr. Yue Dong at University of Liverpool for providing new sets of THz data for further analysis and evaluation.
References
References
 (1) Y. C. Shen, P. Taday, Development and application of terahertz pulsed imaging for nondestructive inspection of pharmaceutical tablet, IEEE Journal of Selected Topics in Quantum Electronics 14 (2) (2008) 407 –415.
 (2) R. K. May, M. J. Evans, S. Zhong, I. Warr, L. F. Gladden, Y. Shen, J. A. Zeitler, Terahertz inline sensor for direct coating thickness measurement of individual tablets during film coating in realtime, Journal of Pharmaceutical Sciences 100 (4) (2011) 1535 –1544.
 (3) J. Muller, D. Brock, K. Knop, J. A. Zeitler, P. Kleinebudde, Prediction of dissolution time and coating thickness of sustained release formulations using Raman spectroscopy and terahertz pulsed imaging, European Journal of Pharmaceutics and Biopharmaceutics 80 (3) (2012) 690–697.
 (4) D. L. Donoho, Compressed sensing, IEEE Transactions on Information Theory 52 (4) (2006) 1289 –1306.
 (5) E. J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information, IEEE Transactions on Information Theory 52 (2) (2006) 489 – 509.
 (6) E. J. Candes, M. B. Wakin, An introduction to compressive sampling, IEEE Signal Processing Magazine 25 (2) (2008) 21 –30.
 (7) H. Shen, L. Gan, N. Newman, Y. Dong, C. Li, Y. Huang, Y. C. Shen, Spinning disk for compressive imaging, Optics Letters 37 (1) (2012) 46–48.
 (8) C. SangHeum, L. SangHun, N.G. Chan, O. SeoungJun, S. JooHiuk, P. Hochong, A. ChangBeom, Fast terahertz reflection tomography using blockbased compressed sensing, Optics Express 19 (17) (2011) 16401–16409.
 (9) W. L. Chan, K. Charan, D. Takhar, K. F. Kelly, R. G. Baraniuk, D. M. Mittleman, A singlepixel terahertz imaging system based on compressed sensing, Applied Physics Letters 93 (12) (2008) 121105–121105–3.
 (10) K. Kim, D.G. Lee, W.G. Ham, J. Ku, S.H. Lee, C.B. Ahn, J.H. Son, H. Park, Adaptive compressed sensing for the fast terahertz reflection tomography, IEEE Transactions on Terahertz Science and Technology 3 (4) (2013) 395–401.

(11)
Z. Xu, J. Sun, Image inpainting by patch propagation using patch sparsity, IEEE Transactions on Image Processing 19 (5) (2010) 1153–1165.
 (12) M. Bertalmio, L. Vese, G. Sapiro, S. Osher, Simultaneous structure and texture image inpainting, IEEE Transactions on Image Processing 12 (8) (2003) 882–889.
 (13) T. Ruzic, A. Pizurica, Contextaware patchbased image inpainting using markov random field modeling, IEEE Transactions on Image Processing 24 (1) (2015) 444–456.
 (14) M. Elad, A. Feuer, Restoration of a single superresolution image from several blurred, noisy, and undersampled measured images, IEEE Transactions on Image Processing 6 (12) (1997) 1646–1658.
 (15) S. Villena, M. Vega, R. Molina, A. Katsaggelos, A nonstationary image prior combination in superresolution, Digital Signal Processing 32 (0) (2014) 1 – 10.
 (16) M. Aharon, M. Elad, A. Bruckstein, KSVD: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Transactions on Signal Processing 54 (11) (2006) 4311–4322.
 (17) A. Woiselle, J.L. Starck, J. Fadili, Inpainting with 3D sparse transforms, in: the SPIE Wavelets XIII, Vol. 7446, 2009, pp. 74461C–74461C.
 (18) M. Elad, Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing, Springer, 2010.
 (19) K. Engan, S. O. Aase, J. Hakon Husoy, Method of optimal directions for frame design, in: IEEE International Conference on Acoustics, Speech, and Signal Processing –ICASSP’99, 1999, pp. 2443–2446.
 (20) B. A. Olshausen, D. J. Field, Emergence of simplecell receptive field properties by learning a sparse code for natural images, Nature 381 (6583) (1996) 607–609.
 (21) S. F. Cotter, B. D. Rao, K. Engan, K. KreutzDelgado, Sparse solutions to linear inverse problems with multiple measurement vectors, IEEE Transactions on Signal Processing 53 (7) (2005) 2477 – 2488.
 (22) J. Tropp, A. Gilbert, M. Strauss, Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit, Signal Processing 86 (3) (2006) 572–588.
 (23) J. Tropp, Algorithms for simultaneous sparse approximation. Part II: Convex relaxation, Signal Processing 86 (3) (2006) 589–602.
 (24) M. Elad, M. Aharon, Image denoising via sparse and redundant representations over learned dictionaries, IEEE Transactions on Image Processing 15 (12) (2006) 3736 –3745.
 (25) R. Cogdill, R. Forcht, Y. C. Shen, P. Taday, J. Creekmore, C. Anderson, J. Drennen, Comparison of terahertz pulse imaging and nearinfrared spectroscopy for rapid, nondestructive analysis of tablet coating thickness and uniformity, Journal of Pharmaceutical Innovation 2 (2007) 29–36.
 (26) J. Schwarz, K. Staenz, Adaptive threshold for spectral matching of hyperspectral data, Canadian Journal of Remote Sensing 27 (2001) 216 –224.