SAR Tomography at the Limit: Building Height Reconstruction Using Only 3-5 TanDEM-X Bistatic Interferograms

by   Yilei Shi, et al.
Technische Universität München

Multi-baseline interferometric synthetic aperture radar (InSAR) techniques are effective approaches for retrieving the 3-D information of urban areas. In order to obtain a plausible reconstruction, it is necessary to use more than twenty interferograms. Hence, these methods are commonly not appropriate for large-scale 3-D urban mapping using TanDEM-X data where only a few acquisitions are available in average for each city. This work proposes a new SAR tomographic processing framework to work with those extremely small stacks, which integrates the non-local filtering into SAR tomography inversion. The applicability of the algorithm is demonstrated using a TanDEM-X multi-baseline stack with 5 bistatic interferograms over the whole city of Munich, Germany. Systematic comparison of our result with TanDEM-X raw digital elevation models (DEM) and airborne LiDAR data shows that the relative height accuracy of two third buildings is within two meters, which outperforms the TanDEM-X raw DEM. The promising performance of the proposed algorithm paved the first step towards high quality large-scale 3-D urban mapping.



page 1

page 3

page 7

page 9

page 10


Non-Local Compressive Sensing Based SAR Tomography

Tomographic SAR (TomoSAR) inversion of urban areas is an inherently spar...

Urban Surface Reconstruction in SAR Tomography by Graph-Cuts

SAR (Synthetic Aperture Radar) tomography reconstructs 3-D volumes from ...

Large-scale Building Height Retrieval from Single SAR Imagery based on Bounding Box Regression Networks

Building height retrieval from synthetic aperture radar (SAR) imagery is...

CBHE: Corner-based Building Height Estimation for Complex Street Scene Images

Building height estimation is important in many applications such as 3D ...

Towards an unsupervised large-scale 2D and 3D building mapping with LiDAR

A 2D and 3D building map provides invaluable information for understandi...

Procedural Urban Forestry

The placement of vegetation plays a central role in the realism of virtu...

Maximal Entropy Reduction Algorithm for SAR ADC Clock Compression

Reduction of comparison cycles leads to power savings of a successive-ap...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

I-a The TanDEM-X Mission

TanDEM-X satellite is a German civil and commercial high-resolution synthetic aperture radar (SAR) satellite which has almost identical configuration as its ’sister’ TerraSAR-X satellite. Together with TerraSAR-X, they are aiming to provide a global high-resolution digital elevation model (DEM) [1]. Both satellites use a spiral orbit constellation to fly in tight formation in order to acquire the image pair simultaneously, which significantly reduces the temporal decorrelation error and the atmospheric interference. Since its launch in 2010, TanDEM-X has been continuously providing high quality bistatic interferograms that are nearly free from deformation, atmosphere and temporal decorrelation.

I-B SAR Tomography Techniques

Tomographic synthetic aperture radar (TomoSAR) is a cutting-edge SAR interferemetric technique that is capable of reconstructing the 3-D information of scatterers and retrieving the elevation profile. Among the many multi-baseline InSAR techniques, TomoSAR is the only one that strictly reconstructs the full reflectivity along the third dimension elevation. SAR tomography and its differential form (D-TomoSAR) have been extensively developed in last two decades [2] [3] [4] [5] [6] [7] [8] [9]. They are excellent approaches for reconstructing the urban area and monitoring the deformation, especially when using high resolution data like TerraSAR-X [10] [11] or COSMO-Skymed [12]. Compare to the classic multi-baseline InSAR algorithms, compressive sensing (CS) based methods [13] [14]

can obtain extraordinary accuracy for TomoSAR reconstruction and show the super-resolution (SR) power, which is very important for urban areas, since layover is dominant.

Although TanDEM-X bistatic data has many advantages, there is only a limited number of acquisitions available for most areas. For a reliable reconstruction, SAR tomography usually requires fairly large interferometric stacks (

images), because the variance of the estimates is asymptotically related to the product of SNR and the number of acquisitions. Therefore, it is not appropriate for the micro-stacks, which have limited number of interferograms


I-C The proposed framework

As mentioned above, the accuracy of 3-D reconstruction replies on the product of SNR and the number of measurements . Since the motivation of our work is the large-scale urban mapping, the data we adopted is TanDEM-X stripmap co-registered single look slant range complex (CoSSC), whose resolution is about 3.3 m in azimuth direction and 1.8 m in range direction. The typical number of available interferograms for most areas is 3 to 5 [16]. In [17], the pixels with similar height are grouped for the joint sparsity estimation, which leads to an accurate inversion of TomoSAR using only six interferograms. Although the unprecedented result is obtained, the accurate geometric information is usually not available for most areas. Therefore, the feasible way to keep the required precision of the estimates is to increase the SNR.

Recent works [18] [19] [20] show that SNR can be dramatically increased by applying non-local filters to the TomoSAR processing for different sensors, such as airborne E-SAR, COSMO-Skymed and TerraSAR-X. In [18], different non-local filters have been adopted to improve the estimation of the covariance matrix for distributed scatterers, which leads to a better height estimation for simulated data and airborne SAR data. Ferraioli et al. [19] introduced the non-local filter and the total variation regularizer to improve the multi-baseline phase unwrapping process. In [20], it is shown that we can achieve a reasonable reconstruction using only seven interferograms and better super-resolution properties when the number of interferograms is relative low.

In this work, we extend the concept of non-local compressive sensing TomoSAR in [20] [21] [22] and propose a new framework of spaceborne multi-baseline SAR tomography with TanDEM-X bistatic micro-stacks, i.e. 3 to 5 interferograms. The framework includes non-local filtering, spectral estimation, model selection and robust height estimation. Since the different spectral estimators have different estimation accuracy and computational cost, we compared the estimation accuracy of different estimators with micro-stacks.

The demonstration of different TomoSAR inversion methods for a large-scale area has been shown in [23] [24] [11]. Only a few works on the validation of single buildings were reported in [8] [10] [25]. Therefore, the validation of the specified quality of the TomoSAR result at a larger scale, would be of considerable interests for the scientific and commercial users. We choose Munich city as a test site because of a high quality LiDAR reference available to us, and we propose a complete workflow to compare the TomoSAR point cloud [26] generated by the proposed framework, TanDEM-X DEM product, and LiDAR data.

I-D Contribution of this work

The major contributions of this work are summarized as follows;

  • We make possible a new application of bistatic SAR data for global building height reconstruction.

  • We have pointed out that for pixel-wise multi-master TomoSAR the well-known system equation is no longer valid in the multi-scatterer case.

  • We have developed a framework for tomographic stacks with only 3 - 5 interferograms. A systematic investigation on the estimation accuracy and super-resolution power for the micro-stacks have been carried out, which was never done before.

  • We use 5 TanDEM-X bistatic data to demonstrate the proposed framework. A systematic validation for large-scale TomoSAR reconstruction have been carried out and a method for comparing with other reference data is established. The results are quantitatively compared with LiDAR reference for more than 34,000 buildings.

The paper is organized as follows: In section II, the non-local TomoSAR framework is introduced; In section III, the estimation accuracy of TomoSAR with small stacks has been systematically studied; The experiments using real data, is presented in section IV; In section V, the quantitative validation is carried out; Finally, conclusions are given in section V.

Ii Non-Local TomoSAR for Multi-Master InSAR

In this section, we introduce the non-local TomoSAR framework for multi-master multi-baseline InSAR configuration. Fig. 1 illustrates multi-master multi-baseline SAR imaging. The framework consists of several steps: (1) non-local filtering; (2) spectral estimation; (3) model selection; (4) robust height estimation. Fig. 2 shows the flowchart of the non-local TomoSAR framework.

Ii-a The multi-master TomoSAR imaging model

Fig. 1: Illustration of multi-master multi-baseline SAR imaging.
Fig. 2: Workflow of non-local TomoSAR framework.

For a fixed azimuth-range position, represents the reflectivity profile along elevation . The measurement , i.e. the complex value of the corresponding azimuth-range pixel, in the SAR image is then a sample of

– the Fourier transform of

, where the elevation wavenumber is a scaled version of the sensor’s position projected on the cross-range-azimuth axis :




Note that are no baselines, but the positions of the sensor w.r.t. some origin. In case of monostatic multi-temporal data stacks, a single master is chosen with and its phase is subtracted from all other acquisitions: . This operation renders the phase spatially smooth and is a prerequisite for spatial phase unwrapping, averaging and graph (network) processing. It does not reduce information, since the phase of any (master) acquisition is random. Note that the choice of the point only defines the x-r-s coordinate system. This point need not necessarily be the master track position . However, is a mathematically convenient choice and is assumed in all conventional TomoSAR system model like in [27]. All the equations in [27], however, are actually independent of this particular choice. Only in the special case the are identical to baselines.

Here we are dealing with stacks of bistatic acquisitions, i.e. with the multi-master case. From each of these acquisitions we get a master taken at and a slave image taken at , where is the bistatic baseline (which takes the effective positions of the transmit-receive phase center into account). If we used a standard, i.e. single-master, TomoSAR inversion algorithm, we would confuse and . In the case of a single scatterer in , this misinterpretation would do no harm, because the Fourier transform of a single point has a constant magnitude and a linear phase. In order to determine the slope of the phase ramp we can take any two samples and divide their phase difference by the difference in wavenumbers (= baseline). This is no longer true for two or more scatterers. The example of two symmetric and equally strong scatterers makes this clear:


Hence, acquisitions with the same baseline are different depending on where the two sensors were located along . Every bistatic acquisition provides three pieces of information: the two magnitudes and as well as the phase difference , i.e. we must normalize the phase by the respective master in every acquisition, in order to become unaffected by deformation and atmospheric delay. Spectral estimation based conventional TomoSAR inversion algorithms, however, require complex spectral samples at several wavenumbers, phase-normalized to a single master phase. In a current parallel work by one of the authors [28] it is shown that pixel-wise TomoSAR using multi-master acquisitions is a non-convex hard to solve problem.

This is true for pixel-wise

tomographic inversion or for point scatterers. The situation becomes different, though, once we talk about averages of pixels, i.e. estimates of expectation values. Let us assume Gaussian distributed scattering with a backscatter coefficient along elevation of


Assuming further that is white, its power spectral density is stationary and is the autocorrelation function of , i.e. the Fourier transform of as a function of the baseline wavenumber :


Instead of sampling the Fourier spectrum we sample its autocorrelation function by the bistatic data stack. Since this relationship is independent of because of stationarity, it makes no difference, where the two acquisitions have been taken, only their baseline counts. In other words we can use standard TomoSAR inversion algorithms in this case.

In this paper we use nonlocal filtering to improve SNR for micro-stacks. These filters perform ensemble averages with number of looks in the order of tens to hundreds. Hence, we tend to the assumption that we work with reasonably good estimates of and can use the bistatic interferograms for TomoSAR reconstruction.

By introducing a noise , the matrix notation of TomoSAR model can be formulated as:



is vector notation of the complex-valued measurement with dimension

, and is the expectation value of reflectivity profile along elevation uniformly sampled at . is a sensing matrix with the dimension , where .

Ii-B Non-Local Procedure

Since we have only limited number of acquisitions for large-scale area, the SNR need to be dramatically increased in order to obtain the required accuracy. As shown in [20], non-local procedure is an efficient way to increase the SNR of interferograms without notable resolution distortion. The idea of patch-wise non-local means considers all the pixels in the search window, when the patch with the central pixel is similar to the patch with central pixel , the value of is selected for calculating the value of pixel . The value of pixel is estimated by using a weighted maximum likelihood estimator (WMLE).


where weights can be calculated by using patch-wise similarity mesurement [20]. Assuming that we have two expressions and , where denotes the complex-valued measurement. and are the instensity of two SAR images. is the interferometric phase. is the true value of the parameters, where is the noise-free interferometric phase, is the coherence magnitude, and is the variance. The likelihood function is adopted from [29] with following formualtion:


denotes the non-local estimator, where . represents the parameters being estimated, where is the estimated interferometric phase, stands for the coherence magnitude, and stands for the variance. is the maximum likelihood estimator and the estimated parameters can be formulated as


The patch size and search window size are set to be 7 7 and 21 21 according to experimental study, which is also reported by other works [30] [31]. Each pixel represents 2.17 m in azimuth and 1.36 m in range.

Fig. 3: Monte Carlo simulations of single scatterer with SNR in [0 30] (dB). X-axis presents in dB. Y-axis is the normalized CRLB . (a) Comparison of CRLB with different spectral estimators with five acquisitions. SVD (red solid line), CS (blue solid line), CLRB (black dash-dotted line) (b) Comparison of CRLB using SVD with three to five acquisitions. (red solid line), (blue solid line), (green solid line), CLRB (black dash-dotted line). The vertical black dash-dotted line indicates the estimation accuracy for dB. The red, blue and green markers represent , respectively.

Ii-C Spectral Estimation

After the non-local procedure, spectral estimation is applied. The most relevant spectral estimation algorithms, including singular value decomposition (SVD)

[5] [7], compressive sensing (CS) are introduced in the following.

  • SVD:

  • CS:


where is the noise covariance matrix, which is defined as:


Under the assumption that the model errors are circular Gaussian distributed with zero mean, the noise covariance matrix is formulated as and is the noise power level. is the covariance matrix of the prior, if it is assumed to be white, i.e. .

The choice of different combinations of spectral estimators depends on the required accuracy, the computational time and others. We follow the procedure proposed in [23]. It consists of three parts: (1) an efficient low-order spectral estimation; (2) the discrimination of the number of scatterers; (3) an accurate high-order spectral estimation. The elevation profile is first estimated by an efficient low-order spectral estimator in order to discriminate the number of scatterers in one resolution cell. Then, CS-based approach is adopted for the pixel which has multiple scatterers. This method decreases the amount of pixels that need the minimization, which leads to reduce the computational cost. Furthermore, the rest of pixels can be efficiently solved by randomized blockwise proximal gradient method [24].

Ii-D Model Selection

The abovementioned spectral estimators retrieve a nonparametric reflectivity profile. Since our data is in urban area, we assume only a few dominant scatterers exist along the reflectivity profile. Therefore, the number of scatterers is estimated by a model order selection algortihm as well as their elevation in one azimuth-range pixel [7]. The estimator can be expressed as follows.


where is a model complexity penalty term which avoids more complicated model overfitting the observed data. The classical penalized likelihood criteria are the Bayesian information criterion (BIC), the Akaike information criterion (AIC), and the minimum description length (MDL) principle [32].

As mentioned in [7], the criteria of model order selection has to be chosen according to the experiments for the particular situation, because it is difficult to remove the bias of the selection.

Ii-E Robust Height Estimation

To tackle the possible remaining outliers in the height estimates, the final height will be fused from the result of multiple neighbouring pixels as a post-processing. But instead of simple averaging, the height will be adjusted robustly using an

M-estimator. Instead of minimizing the sum of squared residuals in averaging, M-estimator minimizes the sum of a customized function of the residuals:


where is the elevation estimates of the th neighbouring pixel. It is shown that the close-formed solution of Eq. (16) is simply a weighted averaging of the heights of the neighbouring pixels [33]. The weighting function can be expressed as follows, if the derivative of exists.


The robust estimated height can be written as follows:


where , and is incident angle. The choice of the weighting function depends on the distribution of the heights. Without prior knowledge of the distribution, promising robust weighting functions are Tukey’s biweight or t-distributed weighting [33]

. For instance, the formulation of Tukey’s biweight loss function can be written as:


and the weighting function can be formulated as:


Iii Estimation accuracy of TomoSAR with small stacks

This section will discuss the theoretical 3-D reconstruction accuracy of a micro-stack with 3-5 interferograms. The estimation accuracy of TomoSAR has been systematically investigated. It is exhaustively shown in [15] that the elevation estimation accuracy and SR power depend asymptotically on the multiplication . In this section, we investigate the estimation accuracy of TomoSAR with the extremely small number of interferograms, which is 3 to 5.

Fig. 4: Monte Carlo simulations of double scatterer with different normalized distances: and dB. X-axis represents normalized true distance of simulated facade and ground. Y-axis is normalized estimated distance

of simulated facade and ground. The blue dot marker denotes the estimated location of facade and the error bar indicates the standard deviation of the estimates, whereas the red dot marker represents the estimated location of ground. The green dot suggests that detection rate of double scatterers is below 5% and denotes the estimated result of single scatterer. (a) Illustration (b) SVD (c) CS.

Iii-a The Lower Bound for Micro-stacks

In the case of pixel-wise TomoSAR inversion, i.e. without spatial averaging, each of our bistatic pairs contain three pieces of information, as mentioned before. If we want to reconstruct elevation profiles containing discrete scatterers we need to infer parameters, i.e. elevation, magnitude and phase for each scatterer. Hence an absolute lower bound of the micro-stack size is .

Distributed scatterers, on the other hand, are characterized by only two parameters each: elevation and backscatter coefficient. Likewise each interferogram provides only two parameters, magnitude and phase (difference). Since our goal is 3-D reconstruction based on bistatic data, we disregard motion-induced phase here. Hence, also in this case the absolute lower limit is . This limit is only a necessary condition, however not sufficient from the robustness point of view, because of ambiguities in the inversion cost functions.

For 3-D urban mapping the single and double scattering cases are the dominant ones. We investigate the cases in this paper, because these are close to the mentioned limits and are relevant for TanDEM-X.

Iii-B Crlb

It is demonstrated in [7] that the Cramer-Rao lower bound (CRLB) of the elevation estimates for single scatterer can be expressed as:


where is the standard deviation of the baseline distribution.

is the number of interferograms, and SNR is the signal-to-noise ratio.

For the double scatterers’ case, the CRLB can be written as:


where represents the CRLB on the elevation estimation of the th scatterer without the interference with the others. is the correction factor of the interference for the scatterers, which are closely located [15]. It is nearly free from and SNR, which can be written as:


where is the phase difference of the two scatterers. is the normalized distance between two scatterers (defined in next section). Since

is a random variable, the approximated formulation of

can be calculated by integrating the variances over .


A note on the baseline distribution is worth mentioning: all the Eqs. (21)-(24) are satisfied with large stacks. In the micro-stacks with only 3 - 5 acquisitions, the baseline distribution may be unfavorable, even if the baseline spread

is acceptable. For example, if two baselines were very similar, the information content would be reduced. It is therefore desirable to have the baselines possibly statistically uniformly distributed.

Fig. 5: Visual comparison of NL-TomoSAR point clouds and TanDEM-X DEM over Munich, Germany. Color code: 565 m (blue) – 596 m (red), scene size: 15 km 9 km, north = top. The voilet bounding box indicates the region of interest (ROI) over the area of European bureau of patent and the white bounding box indicates the ROI near Munich central station. (a) Point Clouds generated by NL-TomoSAR with five interferograms. (b) TanDEM-X DEM.

Iii-C Monte Carlo Simulations

In this section, we compare different spectral estimators using simulated data. Two cases were carried out. The first case considers only a single scatterer in the interest of exploring the effect of and SNR on the estimation accuracy for micro-stacks and the performance of different estimators. The second case considers double scatterers to investigate the estimation accuracy and the super-resolution power for different estimators. The inherent (Rayleigh) elevation resolution is inversely proportional to the maximal elevation aperture [15].


The normalized distance is defined as


For the first test case, only one scatterer is placed at , and the SNR is in the range between 0 and 30 dB. For each value, 100 different baseline distributions were generated. We carried out a Monte Carlo simulation for each baseline distribution with 10,000 realizations. Afterwards, the CRLB was evaluated by averaging the value of 100 different baselines. Fig. 3 (a) shows a performance comparison between SVD, and CS on simulated data with five acquisitions for a single scatterer. X-axis presents in dB. Y-axis is the normalized CRLB . As one can see, both approaches have similar estimation accuracy. They are asymptotically towards the CRLB and collapse it when is large. More interestingly is when is fixed, leaving as a variable. Fig. 3 (b) presents the estimation accuracy of SVD with . It shows that the accuracy when is the smallest. This indicates that SNR carries more weight than on the estimation accuracy when is very small.

With the 3-D reconstruction accuracy of single scatterer clearly analyzed, we switch to the double scatterers case. In the simulation, the elevation of one scatterer is fixed at 0. the normalized elevation of the other scatterer is increased from 0.1 to 1.5, in order to mimic the layover of a ground layer and a facade layer. The number of acquisitions is set to , same as the first simulation. SNR is set to be 10 dB, since the SNR of TanDEM-X bistatic data is usually higher than this value in urban area [34].

The Monte Carlo simulation result is shown in Fig. 4. The x-axis represents the true normalized elevation distance of the simulated facade and ground layers. Y-axis is the estimated normalized elevation distance of the simulated facade and ground layers. The two solid lines in the Fig. 4 (a) (b) (c) represent the true position of the building facade and the ground, respectively. The dashed lines imply the true position plus and minus the CRLB. The blue bar and dot imply the standard deviation and the mean of the estimated elevation of the facade scatterers, whereas the red ones represent those of the ground scatterers. The green dot indicates that the detection rate of double scatterers is below 5% and denotes the estimated result of the single scatterer. Fig. 4 (b) (c) show the estimated results by SVD and CS, respectively.

As one can see in Fig. 4 (b) (c), the result of SVD has larger bias and slightly bigger standard deviation than CS. Note that, comparing to SVD, CS can give the better result, not only the accuracy of the estimation but also the super-resolution power. As one can see that the SVD has scarcely no super-resolution power, which can only distinguish double scatterers tile one Rayleigh resolution . In contrast, CS can achieve until 0.6 .

Iv Practical Demonstration

Iv-a Data Description

We make use of a stack of five co-registered TanDEM-X bistatic interferograms to evaluate the proposed algorithm. The dataset is over Munich, Germany, whose slant range resolution is 1.8 m and the azimuth resolution is 3.3 m. The images were acquired from July 2016 to April 2017. The most pertinent parameters of a TanDEM-X bistatic stripe map acquisition of Munich are listed in Table I and Table II. All the preprocessing steps, like deramping, are standard that are known from bistatic forest tomography. For interested readers, please refer to [2].

Name Symbol Value
Distance from the scene center 698 km
Wavelength 3.1 cm
Incidence angle at scene center
Maximal elevation aperture 187.18 m
Number of interferograms 5
TABLE I: Parameters of Tandem-X Stripe Map Acquisition of Munich
No. Date Baseline [m] Height Ambiguity [m/cycle]
1 2016-07-25 184.40 50.30
2 2016-09-07 171.92 54.01
3 2017-02-19 32.30 286.03
4 2017-04-26 -2.78 -8710.99
5 2017-07-01 9.30 1073.03
TABLE II: Detailed Information of Tandem-X Stripe Map Acquisition for used Dataset

Iv-B Visual Comparison with TanDEM-X raw DEM

In this work, the TanDEM-X raw DEM is adopted for visual comparison with TomoSAR point clouds of the test area, which is formed by two TanDEM-X bistatic acquisitions using the Integrated TanDEM-X Processor (ITP).

Fig. 6: Visual comparison of NL-TomoSAR point clouds and TanDEM-X DEM, close-up 3-D view over the area of European bureau of patent. (a) TomoSAR point clouds. (b) TanDEM-X DEM.
Fig. 7: Visual comparison of NL-TomoSAR point clouds and TanDEM-X DEM, close-up 3-D view over the area of Munich central station. (a) TomoSAR point clouds. (b) TanDEM-X DEM.
Fig. 8: Optical images of nine test sites for quantitative comparison of NL-TomoSAR point clouds and TanDEM-X DEM. (a) Munich central station. (b) European bureau of patent. (c) Technical University of Munich. (d) Railway signal light stand. (e) Train repair garage. (f) Residential building between two bridges. (g) Munich University of Applied Sciences. (h) Residential building near Lowenbrau beer company. (i) Karstadt (shopping mall).

A top view of the reconstructed point cloud of TomoSAR is shown in Fig. 5 (a). The black regions in the figure is where the pixels are not coherent. The corresponding area of TanDEM-X raw DEM is presented in Fig. 5 (b) as a comparison. It is clear that the result of TomoSAR point cloud preserves more detailed building structures. The road layer is also better represented in the TomoSAR result as well. In Fig. 5 (b), the flat ground surface are well reconstructed. But when it comes to complex or high-rise buildings, their accuracy is compromised. For instance, the building of European bureau of the patent in the bottom right (red color) along the Isar river. A close view to this building can be seen in Fig. 6. Due to the complex building structure, as well as the multilooking processing, the TanDEM-X raw DEM merges several buildings together and exhibits lower accuracy on the height of the buildings.

As another example, Fig. 7 shows the visual comparison over the area around Munich central station. It is clear that NL-TomoSAR result can show more detailed structures, such as the bridge, the central station, and roads.

V Quantitative Validation

In this section, we have quantitatively compared the TomoSAR point clouds with TanDEM-X raw DEM, as well as a much more precise LiDAR reference. The LiDAR dataset of Munich is provided by Bavarian State Office for Survey and Geoinformation with ten centimeter accuracy [35].

Since the TomoSAR point cloud is with respect to a reference point that was chosen during the TomoSAR processing, its location is not with respect to a geo-coordinate system. We coregistrated the point cloud of TomoSAR with the DEM and the LiDAR point cloud. In addition, in order to compare point clouds with DEM, we rasterize the two point clouds. These preparing steps are briefly explained in this section.

V-a Geocoding

Since the result of TomoSAR inversion is a 3-D point cloud in the range-azimuth coordinate, the first step is to transform the result to Universal Transverse Mercator (UTM) coordinate with the Range-Doppler approach [36].

Structures Sources Top Bottom Height Absolute Height Difference
Min Max Std Mean Min Max Std Mean
Structure 1 T -3.76 2.59 1.22 -0.75 -19.30 -12.87 1.15 -15.26 14.51 0.69
L - - - 539.01 - - - 525.19 13.82 -
D 583.19 597.58 2.32 587.91 566.16 570.15 2.01 568.06 19.84 6.02
Structure 2 T 20.39 22.62 0.56 21.39 -27.85 -22.62 1.18 -25.30 46.70 0.75
L - - - 559.09 - - - 513.14 45.95 -
D 598.57 642.43 8.35 624.99 556.61 583.46 4.45 574.83 50.16 4.21
Structure 3 T 13.84 17.04 0.97 15.32 -25.52 -21.56 1.09 -23.67 38.49 0.90
L - - - 552.97 - - - 515.38 37.59 -
D 594.31 599.13 2.12 596.15 562.03 569.28 1.60 565.18 30.97 6.62
Structure 4 T -4.61 -1.90 0.74 -2.91 -13.84 -10.35 0.83 -12.05 9.14 0.67
L - - - 535.04 - - - 526.57 8.47 -
D 582.41 584.94 0.84 584.06 572.76 574.96 0.48 573.42 10.64 2.17
Structure 5 T -3.95 -0.96 0.63 -2.44 -15.94 -13.67 0.54 -14.61 12.17 0.96
L - - - 535.62 - - - 524.41 11.21 -
D 583.26 587.26 0.96 584.77 572.71 578.98 1.22 575.58 9.19 2.02
Structure 6 T 10.42 13.47 0.49 12.11 -16.05 -14.31 0.82 -15.61 27.72 0.67
L - - - 551.73 - - - 523.34 28.39 -
D 587.23 594.29 2.12 589.76 567.22 570.82 1.32 569.36 20.4 7.99
Structure 7 T 2.71 6.65 0.87 4.99 -27.57 -21.32 1.41 -24.25 29.24 0.60
L - - - 548.01 - - - 519.37 28.64 -
D 574.71 597.30 5.21 588.27 563.98 571.78 2.26 568.09 20.18 8.46
Structure 8 T 0.06 6.42 1.34 4.06 -20.96 -20.27 0.18 -20.69 24.75 0.94
L - - - 542.96 - - - 517.27 25.69 -
D 584.73 598.09 3.36 590.40 566.26 574.70 2.62 570.12 20.28 5.41
Structure 9 T -7.53 -6.73 0.16 -7.41 -23.13 -22.57 0.29 -22.85 15.44 0.89
L - - - 530.39 - - - 515.84 14.55 -
D 580.67 581.22 0.11 580.97 567.14 573.42 1.56 569.3 11.67 2.88
TABLE III: Statistics of quantitative comparison of nine test structures. First column shows the number of each structure. Second column implies the source of each result, i.e., t (tomosar), l (lidar), d (dem). Third and Fourth columns present the statistics (min, max, mean and standard deviation) of sample points at top layer and bottom layer. Fifth column demonstrates the relative height of each structure, which is calculated by using the mean value of top layer minus the mean value of bottom layer. Sixth column shows the relative height difference between TomoSAR point clouds and LiDAR data, as well as between TanDEM-X raw DEM and LiDAR data.

V-B Coregistration of different point clouds

Consequently, when the TomoSAR point cloud is transformed to a UTM coordinate, its position may differ from the ground truth since the height of the reference point is unknown. Hence, the alignment of different point clouds is necessary. The most popular 3-D point cloud registration algorithm is iterative closest points (ICP) approach [37].

The performacne of ICP depends on the initial alignment. Hence, a coarse alignment is adopted before applying ICP, which includes three steps: (1) the edge image is extracted by an edge detector, such as Sobel algorithm [38]; (2) the horizontal coregistration of two edge images is using cross-correlation of two edge images; (3) the vertical coregistration is using cross-correlation of the two height histograms. After the coarse alignment, ICP can be applied for the fine alignment [39].

V-C Object-based raster data generation

The direct comparison of TomoSAR and LiDAR points is not feasible, as the central position of two corresponding points (one TomoSAR and one LiDAR point) and the footprint of the points are differing. Consequently, for comparing both data, an object-based raster need to be generated by using geographic information system GIS data.

V-D Comparison of individual structure

In order to evaluate the estimation accuracy, nine test sites with high average SNR have been chosen for individual quantitative comparison. Fig. 8 shows the optical images of nine test sites for quantitative comparison of NL-TomoSAR point clouds and TanDEM-X DEM. They are (1) Munich central station, (2) European bureau of patent, (3) Technical University of Munich, (4) A railway signal light stand near Hirschgarten, (5) A train repair garage near Hirschgarten, (6) A residential building between two bridges (Hackerbrücke and Donnersbergebrücke), (7) Munich University of Applied Sciences, (8) A residential building near Lowenbrau beer company and (9) Karstadt (shopping mall). The summary of the results is shown in Tab. III.

From Tab. III we can see that the height differences between TomoSAR result and LiDAR data are within one meter and the height differences between TanDEM-X DEM product and LiDAR data vary from 2.5 m to 8.5 m. Similar performance is shown in standard deviation, for NL-TomoSAR is up to 1.4 m and for TanDEM-X DEM is up to 8.4 m.

V-E Average accuracy

In order to have an assessment of the overall accuracy in a city scale, we compared all the 36,499 buildings in the area with the LiDAR point cloud. 38.7% buildings are within 1 m accuracy. 62.8% are within 2 m accuracy. A detailed distribution of accuracy is listed in Tab. IV. However, the two datasets (TanDEM-X CoSSC and LiDAR) were acquired at different time. It is almost certain that changes happened during the period. Therefore, in order to obtain a more realistic assessment, we truncated the distribution of height difference at . The truncated histogram can be seen in Fig. 9. 34,054 buildings remains after the truncation. Their overall standard deviation is 1.96 m.

Percentage of buildings Estimation accuracy

within 1 m
62.8% within 2 m
93.3% within 15 m
TABLE IV: Statistics of quantitative comparison of the whole city
Fig. 9: Histogram of height differences of structures in the whole Munich area.

Vi Conclusion

A new SAR tomographic inversion framework tailored for very limited number of measurements is proposed in this paper. A systematic investigation of the estimation accuracy of TomoSAR with a micro-stacks is carried out using simulated data. Our experiments show that SVD and CS-based methods have almost identical performance on the estimation of single scatterer and the SNR plays a more important role than for the estimation accuracy, when is small. For the estimation of double scatterers, CS-based approach outperforms the other spectral estimators. Experiments using TanDEM-X bistatic data shows the relative height accuracy of 2 m can be achieved in large scale. Thus it demonstrates the proposed framework being a promising solution for high quality large-scale 3-D urban mapping.


  • [1] G. Krieger, A. Moreira, H. Fiedler, I. Hajnsek, M. Werner, M. Younis, and M. Zink, “TanDEM-X: A satellite formation for high-resolution SAR interferometry,” IEEE Trans. Geosci. Remote Sens., vol. 45, no. 1, pp. 3317-3341, Oct. 2007.
  • [2] A. Reigber, and A. Moreira, “First demonstration of airborne SAR tomography using multibaseline L-band data,” IEEE Trans. Geosci. Remote Sensing, vol. 38, no. 5, pp. 2142-2152, Sep. 2000.
  • [3] F. Gini, F. Lombardini, and M. Montanari, “Layover solution in multibaseline SAR interferometry,” IEEE Trans. Aerosp. Electron. Syst., vol. 38, no. 4, pp. 1344-1356, 2002.
  • [4] F. Lombardini, “Differential tomography: A new framework for SAR interferometry,” IEEE Trans. Geosci. Remote Sensing, vol. 43, no. 1, pp. 37-44, Jan. 2005.
  • [5] G. Fornaro, F. Lombardini, and F. Serafino, “Three-dimensional multipass SAR focusing: experiments with long-term spaceborne data,” IEEE Trans. Geosci. Remote Sensing, vol. 43, no. 4, pp. 702-714, Apr. 2005.
  • [6] G. Fornaro, D. Reale, and F. Serafino, “Four-dimensional SAR imaging for height estimation and monitoring of single and double scatterers,” IEEE Trans. Geosci. Remote Sensing, vol. 47, no. 1, pp. 224-237, Jan. 2009.
  • [7] X. X. Zhu and R. Bamler, “Very high resolution spaceborne SAR tomography in urban environment,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 12, pp. 4296-4308, Dec. 2010.
  • [8] N. Ge, F. Rodriguez Gonzalez, Y. Wang, Y. Shi, and X. X. Zhu, “Spaceborne Staring Spotlight SAR Tomography–A First Demonstration With TerraSAR-X,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 11, no. 10, pp. 702-714, Oct. 2018.
  • [9] X. X. Zhu, Y. Wang, S. Montazeri, N. Ge, “A Review of Ten-Year Advances of Multi-Baseline SAR Interferometry Using TerraSAR-X Data,” Remote Sensing, vol. 10, no. 9, pp. 1374, Aug. 2018.
  • [10] X. X. Zhu and R. Bamler, “Demonstration of super-resolution for tomographic SAR imaging in urban environment,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 8, pp. 3150-3157, Aug. 2012.
  • [11] X. X. Zhu and Y. Wang and S. Gernhardt and R. Bamler, “Tomo-GENESIS: DLR’s tomographic SAR processing system,” Proc. Joint Urban Remote Sensing Event, 2013 pp. 159-162.
  • [12] G. Fornaro, A. Pauciullo, D. Reale, and S. Verde. “Multilook SAR Tomography for 3-D Reconstruction and Monitoring of Single Structures Applied to COSMO-SKYMED Data,” IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 7, no. 7, pp. 2776-2785, Jul. 2014.
  • [13] X. X. Zhu and R. Bamler, “Tomographic SAR inversion by norm regularization – The compressive sensing approach,” IEEE Trans. Geosci. Remote Sens., vol. 48, no. 10, pp. 3839-3846, Oct. 2010.
  • [14] A. Budillon, A. Evangelista, and G. Schirinzi, “Three-dimensional SAR focusing from multipass signals using compressive sampling,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 1, pp. 488-499, Jan. 2011.
  • [15] X. X. Zhu and R. Bamler, “Super-resolution power and robustness of compressive sensing for spectral estimation with application to spaceborne tomographic SAR,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 1, pp. 247-258, Jan. 2012.
  • [16] P. Rizzoli, M. Martone, C. Gonzalez, C. Wecklich, D. Borla Tridon, B. Bräutigam, M. Bachmann, D. Schulze, T. Fritz, M. Huber, B. Wessel, G. Krieger, M. Zink, A. Moreira, “Generation and performance assessment of the global TanDEM-X digitalelevation model,” ISPRS J. Photogram. Remote Sens., vol. 132, pp. 119-139, 2017.
  • [17] X. X. Zhu, N. Ge, M. Shahzad, “Joint sparsity in SAR tomography for urban mapping,” IEEE J. Sel. Topics Signal Process., vol. 9, no. 8, pp. 1498-1509, Dec. 2015.
  • [18] O. D’Hondt, C. López-Martínez, S. Guillaso, and O. Hellwich, “Nonlocal Filtering Applied to 3-D Reconstruction of Tomographic SAR Data,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 1, pp. 272-285, Jan. 2018.
  • [19] G. Ferraioli, C.A. Deledalle, L. Denis, and F. Tupin, “PARISAR: Patch-Based Estimation and Regularized Inversion for Multibaseline SAR Interferometry,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 3, pp. 1626-1636, Mar. 2018.
  • [20] Y. Shi, X. X. Zhu and R. Bamler, “Non-Local Compressive Sensing Based SAR Tomography,” IEEE Trans. Geosci. Remote Sens., vol. 57, no. 5, pp. 3015-3024, May 2019.
  • [21] Y. Shi, X. X. Zhu and R. Bamler, “SAR Tomography Using Non-Local Sparse Reconstruction,” IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, 2018, pp. 6091-6094.
  • [22] Y. Shi, Y. Wang, X. X. Zhu and R. Bamler, “Non-Local SAR Tomography for Large-Scale Urban Mapping,” IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 2019, pp. 5197-5200.
  • [23] Y. Wang, X. X. Zhu, and R. Bamler, “An Efficient tomographic inversion approach for urban mapping using meter resolution SAR image stacks,” IEEE Geosci. Remote Sens. Lett., vol. 11, no. 7, pp. 1250-1254, Jul. 2014.
  • [24] Y. Shi, X. X. Zhu, W. Yin, and R. Bamler, “A fast and accurate basis pursuit denoising algorithm with application to super-resolving tomographic SAR,” IEEE Trans. Geosci. Remote Sens., vol. 56, no. 10, pp. 6148-6158, Oct. 2018.
  • [25] A. Budillon, A. C. Johnsy, and G. Schirinzi, “Extension of a Fast GLRT Algorithm to 5D SAR Tomography of Urban Areas,” Remote Sensing, vol. 9, no. 8, pp, 844, 2017.
  • [26] J. Otepka, S. Ghuffar, C. Waldhauser, R. Hochreiter, N. Pfeifer, “Georeferenced Point Clouds: A Survey of Features and Point Cloud Management,” ISPRS Int. J. Geo-Inf. vol. 2, pp. 1038-1065, 2013.
  • [27] G. Fornaro, F. Serafino, and F. Soldovieri, “Three-Dimensional Focusing With Multipass SAR Data,” IEEE Trans. Geosci. Remote Sensing, vol. 41, no. 3, pp. 507-517, Apr. 2003.
  • [28] N. Ge, R. Bamler, D. Hong and X.X. Zhu, “Single-Look Multi-Master SAR Tomography: An Introduction,” IEEE Trans. Geosci. Remote Sens., submitted, 2019.
  • [29] J. W. Goodman, “Speckle phenomena in optics: Theory and applications,” Roberts and Company Publishers, 2007.
  • [30] C. A. Deledalle, L. Denis, and F. Tupin, “NL-InSAR: Nonlocal interferogram estimation,” IEEE Trans. Geosci. Remote Sens., vol. 49, no. 4, pp. 1441-1452, Apr. 2011.
  • [31] X. X. Zhu, G. Baier, M. Lachaise, Y. Shi, F. Adam, R. Bamler, “Potential and limits of non-local means InSAR filtering for TandDEM-X high-resolution dem generation,” Remote sensing of environment, vol. 218, pp. 148-161, 2018.
  • [32] F. Lombardini and F. Gini, “Model order selection in multi-baseline interferometric radar systems,” EURASIP J. Appl. Signal Process., vol. 2005, no. 20, pp. 3206–3219, 2005.
  • [33] Y. Wang and X. X. Zhu, “Robust estimators for multipass SAR interferometry,” IEEE Trans. Geosci. Remote Sens., vol 54, no. 2, pp. 968-980, 2016.
  • [34] X. X. Zhu and R. Bamler, “Sparse tomographic SAR reconstruction from mixed TerraSAR-X/TanDEM-X data stacks,” in Proc. IEEE IGARSS, 2012, pp. 7468-7471.
  • [35] Landesamt für Digitalisierung, Breitband und Vermessung Bayern.
  • [36] M. Schwäbisch, “A fast and efficient technique for SAR interferogram geocoding,” in Proc. IEEE IGARSS, 1998., pp. 1100-1102.
  • [37] Z. Zhang, “Iterative point matching for registration of free-form curves and surfaces,” Int. J. Comput. Vis., vol. 13, no. 2, pp. 119-152, 1994.
  • [38] I. Sobel, “An isotropic 3 x 3 image gradient operator,” Presented at Stanford AI Project, 1968.
  • [39] Y. Wang, X. X. Zhu, B. Zeisl, and M. Pollefeys, “Fusing meter-resolution 4-D InSAR point clouds and optical images for semantic urban infrastructure monitoring,” IEEE Trans. Geosci. Remote Sens., vol. 55, no. 1, pp. 14-26, 2017.