1 Introduction
The onboard estimation of the pose, i.e., the relative position and attitude, of a known noncooperative spacecraft using a monocular camera is a keyenabling technology for current and future onorbiting servicing and debris removal missions such as the RemoveDEBRIS mission by Surrey Space Centre[1], the Phoenix program by DARPA [2], and the RestoreL mission by NASA[3]. The knowledge of the current pose of the target spacecraft during the proximity operations of these missions allows realtime approach trajectory guidance and control. In contrast to systems based on LiDAR and stereo camera sensors, monocular navigation ensures pose estimation under low mass and power requirements, making it a natural sensor candidate for future formationflying missions especially employing small satellites.
Prior demonstrations of closerange pose estimation have utilized image processing based on handengineered features[4, 5, 6, 7, 8, 9, 10, 11, 12] and an apriori knowledge of the pose[13, 14, 15, 16]. A key strength for many of these methods is their use of the perspective transformation between the scene and the image to hypothesize and test feature correspondences detected in the 2D image and the known 3D model of target spacecraft. However, the formulation of specific features is not scalable to spacecraft of different structural and physical properties as well as not robust to the dynamic illumination conditions of space. Secondly, the apriori knowledge of the pose of the target is not always available due to mission operations constraints nor desirable when full autonomy is required.
Recent advancements in pose estimation techniques for terrestrial applications have relied on deep learning algorithms. Broadly, these algorithms bypass the classical image processing based pipeline and instead attempt to learn the nonlinear transformation between the twodimensional input image space and the sixdimensional output pose space in an endtoend fashion. The deep learningbased methods either discretize the pose space and solve the resulting classification problem
[17, 18, 19, 20] or directly regress the relative pose from the input image [21, 22, 23]. RenderforCNN [18] demonstrated the use of rendered images to train a convolutional neural network for viewpoint estimation on actual camera images. The viewpoint estimation problem is casted as classifying the camera rotation parameters into finegrained bins. PoseCNN [23] used separate branches of a convolutional neural network to predict semantic labels, object centers in the 2D image, and object rotation using direct regression. However, the approach is not accurate enough without further refinement using the iterative closest point method[24]. Classificationbased approaches rely on the fine discretization of the pose space into a large number of pose labels in order to achieve reasonable pose accuracy. On the other hand, direct regressionbased approaches require careful choice of parameters to avoid unpredictable behavior while learning the transformation between the input twodimensional pixel information and the output regression parameters describing the sixdimensional pose space. In addition, the applicability of these algorithms to space imagery is not trivial due to two reasons. Firstly, unlike terrestrial applications, spaceborne navigation cameras are challenged with quickly varying illumination conditions and capture imagery with low signaltonoise ratio and high contrast. Secondly, the massive datasets necessary to train machine learning algorithms are typically not available for spaceborne navigation.
The primary contribution of this work is the introduction of the Spacecraft Pose Network (SPN), a new deep learningbased pose estimation method. As shown in Figure 1
, the SPN method uses a convolutional neural network to estimate the relative position and relative attitude in a decoupled fashion from a single grayscale image. One branch of the convolutional neural network is used to detect a 2D bounding box around the target spacecraft in the input image. The other two branches are used to estimate the relative attitude using a hybrid discretecontinuous method. The relative attitude and the 2D bounding box are then combined with geometrical constraints of the perspective transformation to estimate the relative position using the GaussNewton algorithm. In contrast to current deep learningbased techniques, the relative attitude accuracy provided by the SPN method is not limited by the level of discretization of the pose space and it explicitly uses the geometrical knowledge of the perspective transformation in the estimation of relative position. As compared with techniques comparing image features with features of a known target spacecraft model onboard, the SPN method provides a pose estimate from a single image without requiring a long initialization phase or favorable relative translation motion. Further, due to the decoupling of the relative attitude and position estimation and the use of transfer learning, the SPN method can be used to infer the pose of the target spacecraft through training on a relatively small number of synthetic images of the same spacecraft.
The secondary contribution of this work is the creation of the Spacecraft Pose Estimation Dataset (SPEED), which consists of highfidelity imagery involving close proximity operations around a tumbling spacecraft. The dataset contains 15,300 images from two sources: a purely softwarebased Augmented Reality (AR) source that fuses synthetic and actual space imagery, and a purely realitybased source that uses an actual camera sensor to capture images of a mockup spacecraft under highfidelity illumination conditions. The convolutional neural network of the SPN method is trained only on a portion of these AR images while it is tested on the remaining AR images as well as the actual camera images of SPEED. SPEED will also be made publicly available to the community to enable a fair comparison of the stateoftheart pose estimation techniques through a competition on pose estimation for spaceborne applications organized in collaboration with the European Space Agency[25].
The paper is organized as follows: Section 2 describes the framework for the SPN method; Section 3 describes the SPEED image generation; Section 4 describes the experiments conducted to validate the performance of SPN method; and Section 5 presents conclusions from this work as well as directions for further research and development.
2 Spacecraft Pose Network
Formally, the problem statement for this work is the estimation of the attitude and position of the camera frame, C, with respect to the body frame of the target spacecraft, B. As shown in Figure 2, is the relative position of the origin of the target’s body reference frame w.r.t. the origin of the camera’s reference frame. Similarly, is the quaternion associated with the rotation that aligns the target’s body reference frame with the camera’s reference frame.
As shown in Figure 3, the SPN method uses three separate branches of a convolutional neural network to estimate and . Branch 1 is used to detect a 2D bounding box in the image around the target spacecraft. The output of the convolutional layers corresponding to the detected 2D bounding box is used as an input to Branches 2 and 3 to obtain an estimate of the relative attitude, . Finally, the 2D bounding box and are input to the GaussNewton algorithm to estimate the relative position as . The estimation of the relative position and relative attitude are discussed in detail in the following subsections.
2.1 Relative Position Estimation
The relative position estimation begins with the detection of a 2D bounding box in the image around the target spacecraft using the region proposal network [26]. The detected 2D bounding box is not only important to effectively remove the background from the image before relative attitude estimation but makes the SPN method extensible to perform pose estimation for multiple objects or spacecraft components from the same image. The region proposal network used in the SPN method is an “offtheshelf” object detection algorithm that takes the output of the five convolutional layers as an input. This network uses a sliding window approach on the output of the convolutional layers to produce region proposals based on predefined anchor boxes. Each anchor box is centered at the sliding window in question, and is associated with a scale and aspect ratio. Typically, 3 scales and 3 aspect ratios are used, yielding = 9 anchor boxes at each sliding position. Therefore, for each sliding window location, the region proposal network outputs probabilities (or scores) of whether the target spacecraft is present and regresses to coordinates of the corresponding 2D bounding boxes. Finally, the 2D bounding box associated with the highest probability is provided as the output. The reader can find further implementation details in the original paper [26]. Even though the SPN method uses the region proposal network in its current implementation, it can be easily swapped with another stateoftheart object detection algorithm [27] based on specific computation runtime, storage, and accuracy requirements.
The resulting 2D bounding box estimated by the region proposal network and the relative attitude estimated by Branches 2 and 3 of the SPN method are combined with geometrical constraints to estimate the relative position using the GaussNewton algorithm. The SPN method uses the fact that the perspective projection of the 3D wireframe model of the target spacecraft must fit tightly within the detected 2D bounding box.
As shown in Figure 4, the 2D bounding box can be parametrized by its topleft coordinates, , and its bottomright coordinates, , defined in the image plane. Given the estimated relative attitude parametrized as a rotation matrix, , the camera focal lengths, , and the camera principal points, , the 3D points defined in the body frame of the target spacecraft, , can be projected into the image plane using the perspective equation,
(1) 
The constraint that the perspective projection of the 3D wireframe model fits tightly within the 2D bounding box requires that the four extremal projected points, , be as close as possible to the four edges of the 2D bounding box. Specifically, the four extremal projected points are the “leftmost”, “rightmost”, “topmost”, and “bottommost” points, respectively. For example, is the “leftmost” projected point and it is constrained to be as close as possible to , the coordinate representing the left edge of the 2D bounding box. Mathematically, the SPN method solves the following minimization problem subject to the constraint posed by Equation 1
(2) 
The minimization problem is solved using the GaussNewton algorithm, which requires an initial guess of . The SPN method uses the detected 2D bounding box and the characteristic length of the 3D wireframe model to provide this initial guess.
Figure 5 shows that the knowledge of the diagonal characteristic length, , of the spacecraft 3D model and the diagonal length of the detected 2D bounding box, , can be used to obtain . In particular, a coarse estimate of the distance to the target spacecraft from the origin of the camera frame is
(3) 
Assuming that the origin of the target spacecraft body frame, B, lies on the ray projected from the origin of the camera frame, C, towards the center of the detected 2D bounding box, , azimuth and elevation angles, , can be derived using and
(4)  
(5) 
Finally, the coarse relative position used as the initial guess in the GaussNewton algorithm is
(6) 
2.2 Relative Attitude Determination
The layers of a convolutional neural network transform the raw pixel information,
, to gradually more abstract representations using predefined nonlinear functions that contain unknown coefficients. The output of the network is then constrained to minimize a loss function that represents a discrepancy or error between the output of the network and the expected output for a set of training samples (supervised learning). For example, the network used in the SPN method can be represented by
(7) 
where are the unknown weights for the th layer, are the unknown biases for the th layer, and are the known nonlinear functions for the th layer. For relative attitude estimation, the network’s output is expected to be values defined on a continuous nonEuclidean space, prohibiting the direct use of a typical L2 loss function. To handle this problem, other authors have proposed several loss functions [23, 28, 29] that directly regress to a relative attitude estimate, however, these have limited accuracy and require further refinement [30]. Instead, the SPN method uses two branches of fullyconnected layers that share the output of five preceding convolutional layers as their inputs. In order to be robust to the background in the images, the features output by the final convolutional layer associated with the detected 2D bounding box are selected using the RoI pooling layer[31]. Using these features as input, Branch 2 performs a classification task to find the closest predefined attitude classes that describe the input image. Branch 3 uses these features as input to perform a regression task to find the relative weights of the attitude classes identified in Branch 2. For example, Figure 6 shows a simplified example where one dimension of the relative attitude has been discretized. For this case, Branch 2 is expected to find attitude classes 4 and 5 as the two adjacent classes that best describe the input image while Branch 3 is expected to find relative weights as a function of the angular differences and associated with the corresponding attitude classes. The sizes of the five convolutional layers and both sets of the three fully connected layers have been adopted from the Zeiler and Fergus model architecture [32].
The attitude classes are parametrized as
unit quaternions representing uniformly distributed random rotations in the
space. Algorithm 1 shows the subgroup algorithm [33] used to obtain these random rotations. The algorithm amounts to multiplying a uniformly distributed element from the subgroup of planar rotations with a uniformly distributed coset represented by the rotations pointing the zaxis in different directions.To determine which of the predefined attitude classes are the closest to the given image, Branch 2 is tasked to output an vector
, which represents a probability distribution. Each entry
is the probability whether the attitude class in question is one of the closest attitude classes to the relative attitude of the given image. Note that both and are hyperparameters. Closeness is defined by the angular difference between the unit quaternion representing the attitude class, , and the unit quaternion representing the ground truth value of the relative attitude, . To ensure that is a valid probability distribution, the output of the final layer of the Branch 2 is passed through the softmax function(8) 
Then the loss function for Branch 2, , representing the difference between the branch output and the expected probability distribution, , can be written as
(9) 
where represents the weights of the final three layers of Branch 2 and is a scalar representing the strength of the L2 regularization. The aim of the L2 regularization is to penalize the existence of large weights and prevent overfitting to the training examples. Note that the entries of corresponding to the closest attitude classes is set to while the rest of the entries are set to zero.
To estimate the relative attitude for the given image, Branch 3 is tasked to output an vector, , containing weights for the closest attitude classes identified in Branch 2. The weights for the remaining attitude classes are output but not used during either training or inference. The weights of the closest attitude classes are passed through the softmax function such that their sum adds to unity. Let be the vector of indices of the largest values in , then the loss function for the second branch can be written as
(10) 
where represents the weights of the final three layers of Branch 3 and is the ground truth vector of weights for the given image. The entries of are set based on the angular difference between the quaternion of the attitude class in question and the ground truth quaternion of the given image. Specifically,
(11) 
where is the angular difference between the unit quaternion representing the attitude class in question and the unit quaternion representing the ground truth of the relative attitude of the given image. Hence, the total loss function for relative attitude estimation can be written as:
(12) 
The classification loss, , is designed to force Branch 2’s coefficients to be correlated with macroscopic changes in the relative attitude without penalizing classification predictions of attitude classes that have small angular difference between them. Instead, the regularization loss, is designed to force the Branch 3’s coefficients to be correlated with the microscopic differences between adjacent attitude classes.
During inference, the estimate of the relative attitude, , is computed using an average of the unit quaternions of the closest attitude classes, , weighted by their corresponding relative weights, . This is akin to minimizing a weighted sum of the squared Frobenius norms of the differences between the rotation matrix representation of and
(13) 
where denotes the unit 3sphere[34]. The pseudocode used to compute the weighted average is presented below as Algorithm 2.
3 Spacecraft Pose Estimation Dataset (SPEED)
Training a convolutional neural network usually requires extremely large labeled image datasets such as ImageNet
[35] and Places[36], which contain millions of images. Collecting and labeling such amount of actual space imagery is extremely difficult. Therefore, this work introduces SPEED, a dataset of images that enables not only training and validation of the SPN method but also benchmarking various stateoftheart monocular visionbased pose estimation techniques. The SPEED images and the corresponding ground truth pose information are generated using two key complementary sources at the Space Rendezvous Laboratory (SLAB) of Stanford University[37]. The first source produced 15000 augmented reality images based on the Optical Stimulator [17, 38] camera emulator software and actual images of the Earth captured by the Himawari8 geostationary meteorological satellite [39]. The second source produced 300 actual camera images of a 1:1 mockup of the Tango spacecraft using the Testbed for Rendezvous and Optical Navigation (TRON). Table 1 provides the camera model used for both these sources.Parameter  Description  Value 

Number of horizontal pixels  
Number of vertical pixels  
Horizontal focal length  m  
Vertical focal length  m  
Horizontal pixel length  m  
Vertical pixel length  m 
The first source renders synthetic images of the Tango spacecraft using MATLAB and C++ language bindings of OpenGL. To create a diverse set of views of the target spacecraft, a set of relative attitudes and relative positions are selected. Unit quaternion parametrization of uniformly random rotations in the space are selected using Algorithm 1. Figure 10 shows the distribution of in the SPEED images, parametrized as Euler angles. The relative positions are obtained by separately selecting the relative distance and the bearing angles (defined in Equations 4 and 5
). The bearing angles are uncorrelated random values selected from a multivariate normal distribution,
. The relative distance is randomly selected from a standard normal distribution, . Any relative distance values below 3 meters and above 50 meters are rejected. Figure 7 shows the distribution of in the SPEED images.The azimuth and elevation angles for the solar illumination are specifically chosen to match the solar illumination in the 72 actual images of the Earth captured by the Himawari8 geostationary meteorological satellite. Figure 8 shows a montage of these images. The 72 images each provide a pixels resolution diskview of the Earth and were taken 10 minutes apart from each other over a period of 12 hours. Each of the images is converted to grayscale, cropped and inserted in as the background for half of the synthetic images of the Tango spacecraft. The location of the crop is selected at random from a uniform distribution spanning the Earth image. The size of the crop is selected to match the scale of the Earth when viewed through a camera (described in Table 1
) located at an altitude of 700 km and pointed in the nadir direction. Gaussian blurring and white noise are added to all images to emulate the depth of field and shot noise, respectively. Figure
9 shows a montage of the resulting augmented reality images.The second source of the SPEED images is the TRON facility at SLAB. It consists of a 7 degreesoffreedom robotic arm, which positions and orients a visionbased sensor with respect to a target object or scene. Custom illumination devices simulate Earth albedo and Sun light to high fidelity to emulate the illumination conditions present in space [40]. TRON provides images of a 1:1 mockup model of the Tango spacecraft using a Point Grey Grasshopper 3 camera with a Xenoplan 1.9/17 mm lens. Note that this is the same camera as the one used in the first source. Calibrated motion capture cameras report the positions and attitudes of the camera and the Tango spacecraft, which are then used to calculate the “ground truth” pose of Tango with respect to the camera. Figure 11 shows a montage of the SPEED actual camera images.
The SPEED trainingset consists of 12000 synthetic images from the first source while the remainder 3000 synthetic images and the 300 actual camera images from the second source are available as two separate testsets. The motivation for excluding the actual camera images from the trainingset is to evaluate robustness and the domain adaptation capabilities of the pose estimation techniques.
4 Experiments
The proposed SPN method was trained and tested using the SPEED images. For all experiments, the region proposal network, the convolutional layers, and the fully connected layers were pretrained on the relatively larger ImageNet dataset [35]. Following that, the region proposal network and the fully connected layers were trained using 80 of the SPEED trainingset while the remainder 20 was used for validation. The two sets of fully connected layers were trained jointly while the region proposal network was trained separately. Both training routines were carried out using stochastic gradient in batches of 16 images. The initial learningrate was set to 0.003 with an exponential decay of 0.95 every 1000 steps. For attitude determination, the hyperparameters and were set to 1000 and 5, respectively. During training, each image was resized to 224 224 pixels to match the input size of the Zeiler and Fergus model architecture[32].
For quantitative analysis of the performance, three separate metrics are reported. To measure the accuracy of the 2D bounding box detection as compared with the ground truth 2D bounding box, the IntersectionOverUnion (IoU) metric is reported as
(14) 
To measure the accuracy of the estimated relative position, and the ground truth relative position, , the translation error for each image can be calculated as
(15) 
To measure the accuracy of the estimated relative attitude, and the ground truth relative attitude, , the attitude error for each image can be calculated as
(16)  
(17) 
Metric  SPEED synthetic testset  SPEED real testset 

Mean IoU ()  0.8582  0.8596 
Median IoU ()  0.8908  0.8642 
Mean (m)  [0.055 0.046 0.78]  [0.036 0.015 0.189] 
Median (m)  [0.024 0.021 0.496]  [0.029 0.013 0.191] 
Mean (deg)  8.4254  18.188 
Median (deg)  7.0689  13.208 
Table 2 lists the mean and median values of the three performance metrics for the SPEED synthetic and real testsets. The levels of the relative position and attitude errors improve upon those of other pose estimation methods applied on similar imagery in previous work[5, 17]. Notably, the 2D bounding box detection is less affected by the gap between the synthetic and real testsets as compared with the attitude estimation. The values for the actual camera images are lower than that of the synthetic images since the relative distances for the actual camera images is comparatively much lower than that of the synthetic images.
Figure 12 shows the qualitative results of the 2D bounding box detection on the SPEED synthetic and real testsets. The region proposal network used in the SPN method was successful in detecting the 2D bounding box in images regardless of whether the Earth is visible in the background. In general, the bounding box detection worked marginally better on actual camera images as compared to synthetic images since all actual camera images had a black background. The bounding box detection was successful even in cases with strong shadows as long as one side of the spacecraft was illuminated.
However, the bounding box detection tends to fail in cases where the target spacecraft was in eclipse or when the target spacecraft was closer than 5 meters. Figure 13 shows a few examples from both these failure modes. At relative distances less than 5 meters, the Tango spacecraft starts getting clipped by the image boundaries leading to unpredictable behavior of the detection. In a docking scenario, this could potentially be resolved by performing pose estimation relative to the docking mechanism or another smaller fixture, instead of the entire spacecraft. Note that the SPN method can be extended to detect multiple 2D bounding boxes corresponding to the fixtures of interest and performing attitude estimation and position determination for each.
To further examine such trends, the three performance metrics are plotted against the relative distance. In particular, the images from the SPEED synthetic testset were grouped into batches of 100 each according to their ground truth relative distance, . The mean value of the performance metric was then plotted against the mean ground truth relative distance for each batch.
Figure 14 shows the mean IoU values of the SPN method for the SPEED synthetic testset plotted against the mean relative distance. The bounding box detection has the highest accuracy at ranges of 7 meters to 20 meters while there is a sharp dropoff in bounding box detection at relatively closer distances of less than 5 meters. In contrast, there is a gradual dropoff in performance at relatively farther distances as the target spacecraft starts occupying too few pixels in the image plane to allow for an accurate 2D bounding box detection. Degradation in performance at larger separations is also contributed to by the fact that the SPEED trainingset contains more images at lower relative distances.
Figure 15 shows the mean values of the SPN method for the SPEED synthetic testset plotted against the mean relative distance. The mean
values are between 5 and 10 degrees for most of the relative distances. The range of the relative distances where attitude estimation has the highest accuracy is similar to the bounding box detection. This makes sense since both the region proposal network and the two sets of fully connected layers share the output of shared convolutional layers in the SPN method. Unlike the bounding box detection, the attitude estimation has a sharp dropoff in performance at both high and low relative distances. This is due to the presence of more outliers in the attitude estimation as compared with bounding box detection at high relative distances.
The estimated relative attitude value for each image in the SPEED synthetic testset was then combined with the corresponding 2D bounding box detection to produce estimated values of the relative position. Figure 16 shows the three components of mean values of the SPN method plotted against the mean relative distance. Note that the zaxis is aligned with the camera boresight direction while the xaxis and yaxis (lateral directions) are aligned with the image plane axes. The errors in the lateral directions was an order of magnitude lower than the camera boresight direction, which implies that the bounding box detection was fairly successful at estimating the center of the bounding box as compared with the size of the bounding box. Trends in performance degradation of mirror those observed for and IoU.
Figure 17 and Figure 18 show some qualitative results of the pose estimated by the SPN method on the SPEED real and synthetic testsets, respectively. The probability distribution output from the Branch 2 is also plotted for the respective images. In future, the probability distribution allows the setting up of a confidence metric to reject outliers in addition to being used by a navigation filter to accumulate information from sequential images to provide a more accurate estimate at a high rate. Predictably, the peaks in the probability distribution are lower for the SPEED real testset as compared with the SPEED synthetic testset since the convolutional neural network has overfit the synthetic images in the SPEED trainingset to some extent. This could possibly be addressed by a stronger L2 regularization parameter and/or augmenting the trainingset with images containing randomized textures of the target spacecraft similar to work in the area of domain adaptation[41, 42] and domain randomization[43, 44].
5 Conclusions
This work introduces the SPN method to estimate the relative pose of a target spacecraft using a single grayscale image without requiring apriori pose information. The SPN method makes the novel use of a hybrid classification and regression technique to estimate the relative attitude. The SPN method leverages the geometric knowledge of the perspective transformation and advances in 2D bounding box detection to estimate the relative position using the GaussNewton algorithm. This work also introduces SPEED, a publicly available dataset to allow for the training and validation of monocular pose estimation techniques. The SPEED trainingset consists of 12000 augmented reality images created by fusing synthetic images of a target spacecraft with actual camera images of the Earth. Apart from the 3000 images of the Tango spacecraft in the SPEED synthetic testset, 300 actual camera images of the same spacecraft are provided in the SPEED real testset. The subsequent application of the SPN method on the SPEED synthetic testset produces degreelevel mean relative attitude errors and centimeterlevel mean relative position errors exceeding the performance of conventional featurebased methods used in previous work. The pose estimation performance carried over to the actual camera images as well albeit with slightly higher errors due to the gap between the synthetic images used during training and actual camera images used for testing.
However, further work is required in a few directions. First, a complete assessment of how the SPN method stacks against conventional featurebased approaches as well as the more recent deep learningbased methods needs to be performed. The SPEED images and the associated performance metrics allow an excellent framework to carry out this assessment. Second, data augmentation techniques and/or stronger regularization during training is required to bridge the performance gap of the SPN method between the synthetic and real testsets. Third, the performance dropoff at relative distances where the target spacecraft is only partially visible needs to be addressed to allow for pose estimation during all stages of close proximity operations. Lastly, the SPN method needs to be embedded in flightgrade hardware to profile its computational runtime and memory usage.
6 Acknowledgments
The authors would like to thank the King Abdulaziz City for Science and Technology (KACST) Center of Excellence for research in Aeronautics & Astronautics (CEAA) at Stanford University for sponsoring this work. The authors would like to thank OHB Sweden, the German Aerospace Center (DLR), and the Technical University of Denmark (DTU) for the 3D model of the Tango spacecraft used to create the images used in this work. The authors would like to thank Nathan Stacey of the Space Rendezvous Laboratory at Stanford University for his technical contributions in generating ground truth pose information for the actual camera images of SPEED.
References
 [1] J. L. Forshaw, G. S. Aglietti, N. Navarathinam, H. Kadhem, T. Salmon, A. Pisseloup, E. Joffre, T. Chabot, I. Retat, R. Axthelm, S. Barraclough, A. Ratcliffe, C. Bernal, F. Chaumette, A. Pollini, and W. H. Steyn, “RemoveDEBRIS: An inorbit active debris removal demonstration mission,” Acta Astronautica, Vol. 127, No. 2016, 2016, pp. 448–463, 10.1016/j.actaastro.2016.06.018.
 [2] B. Sullivan, D. Barnhart, L. Hill, P. Oppenheimer, B. L. Benedict, G. Van Ommering, L. Chappell, J. Ratti, and P. Will, “DARPA Phoenix Payload Orbital Delivery System (PODs): “FedEx to GEO”,” AIAA SPACE 2013 Conference and Exposition, 2013, pp. 1–14, 10.2514/6.20135484.
 [3] B. B. Reed, R. C. Smith, B. J. Naasz, J. F. Pellegrino, and C. E. Bacon, “The RestoreL Servicing Mission,” AIAA Space Forum, Long Beach, CA, 2016, pp. 1–8, 10.2514/6.20165478.
 [4] S. D’Amico, M. Benn, and J. Jorgensen, “Pose estimation of an uncooperative spacecraft from actual space imagery,” Proceedings of 5th International Conference on Spacecraft Formation Flying Missions and Technologies, No. 1, 2013, pp. 1–17.
 [5] S. Sharma, J. Ventura, and S. D’Amico, “Robust ModelBased Monocular Pose Initialization for Noncooperative Spacecraft Rendezvous,” Journal of Spacecraft and Rockets, 2018, pp. 1–16, 10.2514/1.A34124.
 [6] A. Cropp and P. Palmer, “Pose Estimation and Relative Orbit Determination of a Nearby Target Microsatellite using Passive Imagery,” 5th Cranfield Conference on Dynamics and Control of Systems and Structures in Space 2002, 2002, pp. 389–395.
 [7] C. Liu and W. Hu, “Relative pose estimation for cylindershaped spacecrafts using single image,” IEEE Transactions on Aerospace and Electronic Systems, Vol. 50, No. 4, 2014, pp. 3036–3056, 10.1109/TAES.2014.120757.
 [8] B. J. Naasz, J. Van Eepoel, S. Z. Queen, C. M. Southward, and J. Hannah, “Flight results from the HST SM4 Relative Navigation Sensor system,” 33rd Annual AAS Guidance and Control Conference, Breckenridge, CO, USA, 2010.
 [9] V. Capuano, G. Cuciniello, V. Pesce, R. Opromolla, S. Sarno, M. Lavagna, M. Grassi, F. Corraro, G. Capuano, P. Tabacco, F. Meta, M. L. Battagliere, and T. Alberto, “VINAG: A highly integrated system for autonomous onboard absolute and relative spacecraft navigation,” The 4S Symposium 2018, No. 1, 2018.
 [10] J. Kelsey, J. Byrne, M. Cosgrove, S. Seereeram, and R. Mehra, “Visionbased relative pose estimation for autonomous rendezvous and docking,” 2006 IEEE Aerospace Conference, 2006, 10.1109/AERO.2006.1655916.
 [11] S. Sharma and S. D’Amico, “Comparative assessment of techniques for initial pose estimation using monocular vision,” Acta Astronautica, Vol. 123, 2015, pp. 435–445, 10.1016/j.actaastro.2015.12.032.
 [12] P. Lunghi, L. Losi, V. Pesce, and M. Lavagna, “Ground testing of visionbased GNC systems by means of a new experimental facility,” 69th International Astronautical Congress (IAC), Bremen, Germany, IAF, 2018, pp. 1–15.
 [13] A. Petit, E. Marchand, and K. Kanani, “Visionbased space autonomous rendezvous: A case study,” IEEE International Conference on Intelligent Robots and Systems, 2011, pp. 619–624, 10.1109/IROS.2011.6048176.
 [14] S. Zhang and X. Cao, “Closed‐form solution of monocular vision‐based relative pose determination for RVD spacecrafts,” Aircraft Engineering and Aerospace Technology, Vol. 77, No. 3, 2005, pp. 192–198, 10.1108/00022660510597214.
 [15] M. Avilés, D. Mora, M. Canetri, and P. Colmenarejo, “A Complete IPbased Navigation Solution for the Approach and Capture of Active Debris,” 67th International Astronautical Congress, 2016, pp. 1–8.
 [16] S. Sharma and S. D’Amico, “ReducedDynamics Pose Estimation for NonCooperative Spacecraft Rendezvous using Monocular Vision,” Proceedings of the 40th Annual AAS Rocky Mountain Section Guidance and Control Conference, Breckenridge, CO, 2017, pp. 1–25.
 [17] S. Sharma, C. Beierle, and S. D’Amico, “Pose Estimation for NonCooperative Spacecraft Rendezvous Using Convolutional Neural Networks,” 2018 IEEE Aerospace Conference, Big Sky, USA, IEEE, 2018, pp. 1–12.

[18]
H. Su, C. R. Qi, Y. Li, and L. J. Guibas, “Render for CNN: Viewpoint
estimation in images using CNNs trained with rendered 3D model views,”
Proceedings of the IEEE International Conference on Computer Vision
, Vol. 1118Dece, 2016, pp. 2686–2694, 10.1109/ICCV.2015.308. 
[19]
S. Tulsiani and J. Malik, “Viewpoints and keypoints,”
Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
, 2015, pp. 1510–1519.  [20] S. Sharma, C. Beierle, and S. D’Amico, “Towards Pose Determination for NonCooperative Spacecraft using Convolutional Neural Networks,” Proceedings of the 1st IAA Conference on Space Situational Awareness (ICSSA), 2017, pp. 1–5.
 [21] S. Mahendran, H. Ali, and R. Vidal, “3D Pose Regression Using Convolutional Neural Networks,” Proceedings  2017 IEEE International Conference on Computer Vision Workshops, ICCVW 2017, Vol. 2018Janua, 2018, pp. 2174–2182, 10.1109/ICCVW.2017.254.
 [22] A. Kendall, M. Grimes, and R. Cipolla, “PoseNet: A convolutional network for realtime 6dof camera relocalization,” Proceedings of the IEEE International Conference on Computer Vision, Vol. 2015 Inter, 2015, pp. 2938–2946, 10.1109/ICCV.2015.336.
 [23] Y. Xiang, T. Schmidt, V. Narayanan, and D. Fox, “PoseCNN: A Convolutional Neural Network for 6D Object Pose Estimation in Cluttered Scenes,” 2017, 10.15607/RSS.2018.XIV.019.
 [24] P. Besl and N. McKay, “A Method for Registration of 3D Shapes,” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 14, No. 2, 1992, pp. 239–256, 10.1109/34.121791.
 [25] European Space Agency, “Kelvins  ESA’s Advanced Concepts Competition Website,” https://kelvins.esa.int. Accessed Januray 4, 2019.
 [26] S. Ren, K. He, R. Girshick, and J. Sun, “Faster RCNN: Towards RealTime Object Detection with Region Proposal Networks,” Advances In Neural Information Processing Systems, 2015, pp. 91–99, 10.1109/TPAMI.2016.2577031.
 [27] M. Kaushal, B. S. Khehra, and A. Sharma, “Soft Computing based object detection and tracking approaches: StateoftheArt survey,” Applied Soft Computing Journal, Vol. 70, 2018, pp. 423–464, 10.1016/j.asoc.2018.05.023.
 [28] K. Hara, R. Vemulapalli, and Rama Chellappa, “Designing Deep Convolutional Neural Networks for Continuous Object Orientation Estimation,” ArXiv:1702.01499, 2017.
 [29] J. Wu, Robotic Object Pose Estimation with Deep Neural Networks. PhD thesis, Massachusetts Institute of Technology, 2018.
 [30] Y. Li, G. Wang, X. Ji, Y. Xiang, and D. Fox, “DeepIM: Deep Iterative Matching for 6D Pose Estimation,” ArXiv:1804.00175, 2018.
 [31] R. Girshick, “Fast RCNN,” ArXiv:1504.08083, apr 2015.
 [32] M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” European Conference On Computer Vision, 2014, pp. 818–833.
 [33] K. Shoemake, “Uniform Random Rotations,” Graphics Gems III (IBM Version), pp. 124–132, Elsevier, 1992.
 [34] F. L. Markley, Y. Cheng, J. L. Crassidis, and Y. Oshman, “Averaging Quaternions,” Journal of Guidance, Control, and Dynamics, Vol. 30, jul 2007, pp. 1193–1197, 10.2514/1.28949.
 [35] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. FeiFei, “ImageNet Large Scale Visual Recognition Challenge,” International Journal of Computer Vision, Vol. 115, No. 3, 2015, pp. 211–252, 10.1007/s112630150816y.

[36]
B. Zhou, A. Lapedriza, J. Xiao, A. Torralba, and A. Oliva, “Learning Deep Features for Scene Recognition using Places Database,”
Advances in Neural Information Processing Systems, 2014, pp. 487–495.  [37] Space Rendezvous Laboratory, Stanford University, “SLAB  Multisatellite systems for unrivaled space science and exploration,” https://slab.stanford.edu. Accessed Januray 4, 2019.
 [38] C. Beierle and Simone D’Amico, “Variable Magnification Optical Stimulator for Training and Validation of Spaceborne VisionBased Navigation,” Journal of Spacecraft and Rockets (In Print), 2018.
 [39] K. Bessho, K. Date, M. Hayashi, A. Ikeda, T. Imai, H. Inoue, Y. Kumagai, T. Miyakawa, H. Murata, T. Ohno, A. Okuyama, R. Oyama, Y. Sasaki, Y. Shimazu, K. Shimoji, Y. Sumida, M. Suzuki, H. Taniguchi, H. Tsuchiyama, D. Uesawa, H. Yokota, and R. Yoshida, “An Introduction to Himawari8/9 Japan’s NewGeneration Geostationary Meteorological Satellites,” Journal of the Meteorological Society of Japan, Vol. 94, No. 2, 2016, pp. 151–183, 10.2151/jmsj.2016009.
 [40] S. Sharma, A. Koenig, and J. Sullivan, “Verification of Lightbox Devices for Earth Albedo Simulation,” https://damicos.people.stanford.edu/sites/g/files/sbiybj2226/f/tn2016_sharmakoenigsullivan.pdf, 2018.
 [41] J. Blitzer, M. Dredze, and F. Pereira, “Biographies, Bollywood, Boomboxes and Blenders: Domain Adaptation for Sentiment Classification,” Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, Prague, Czech Republic, Association for Computational Linguistics, 2007, pp. 440–447, 10.1029/RS006i008p00787.
 [42] H. Daumé, “Frustratingly Easy Domain Adaptation,” ArXiv:0907.1815v1, 2009, 10.1.1.110.2062.
 [43] M. Andrychowicz, B. Baker, M. Chociej, R. Jozefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning Dexterous InHand Manipulation,” ArXiv:1808.00177v2, 2018, pp. 1–27.
 [44] J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” IEEE International Conference on Intelligent Robots and Systems, 2017, pp. 23–30, 10.1109/IROS.2017.8202133.
Comments
There are no comments yet.