Gastric endoscopy is a well-adopted procedure that enables medical doctors to diagnose the stomach inside a body. However, there still exists some challenges to doctors such as the limited point of view and the uncertainty of endoscope poses relative to a target organ. The accurate localization of a malignant lesion within the global view of the whole stomach is crucial for gastric surgeons to decide the operative procedure of the laparoscopic gastrectomy for early gastric cancer. The location of the malignant lesion is usually identified by the double contrast barium radiography . However, morphological evaluations such as barium study sometimes cause the gastric surgeons difficulty in identifying flat malignant lesions. Recently, 3D computed tomography (CT) gastrography was developed for the lesion localization purpose . However, 3D CT gastrography does not embed color texture information to the reconstructed 3D model. If the 3D shape of the whole stomach can be reconstructed from a standard endoscopic video, the location of the malignant lesion can be identified by the visual color information in addition to the 3D morphological information, which should be very valuable for the gastric surgeons.
Previous studies have shown that 3D endoscopy systems (e.g., a stereo endoscope) have advantages over traditional 2D endoscopes in fields such as computer-aided laparoscopic surgery  and endoscopic surface imaging . Nevertheless, those 3D systems are not widely available and the 2D counterpart is still the mainstream.
Some existing works have proposed a software solution to reconstruct the 3D structure of a target organ (e.g., colon, liver, and larynx) with the estimated endoscope poses from an endoscope video. The methods are ranging from shape-from-shading (SfS)[5, 6], visual simultaneous localization and mapping (SLAM) [7, 8, 9], and structure-from-motion (SfM) [10, 11, 12, 13, 14, 15]. Even though SfS can reconstruct an organ’s surface from a single image, it requires accurate estimation of the light position, which is a difficult problem. SLAM offers a real-time solution with the reconstruction quality as a trade-off. SLAM uses a simple feature detector and descriptor and also sequential feature matching, which leads to a limited reconstruction quality. On the other hand, SfM offers an off-line solution with higher reconstruction quality. SfM uses a more accurate feature detector and descriptor to obtain higher quality features. Moreover, SfM can exhaustively use all input images to find feature correspondences and perform global reconstruction optimization applying bundle adjustment. However, since SfM relies on the detected features, it is still challenging to reconstruct texture-less surfaces, which are common in internal organs. To tackle this challenge, some systems [13, 14] exploit a projector to add a structured light pattern on the texture-less surface. Although these systems can successfully increase the number of features, they requires expensive hardware modification. Enhanced imaging colonoscopy and narrow-band imaging were also applied to enhance the surface details for SfM . The above-mentioned works only demonstrated the reconstruction results of a partial surface, which is not sufficient for many potential applications such as the 3D localization of a lesion within the whole shape of the organ.
In this work, we aimed at reconstructing the 3D model of a whole stomach with color texture information from a standard endoscopic video using SfM. We specifically investigated the combined effect of chromo-endoscopy and color channel selection on SfM to achieve better reconstruction quality. Our study found that 3D reconstruction of the whole stomach can be achieved by using red channel images captured under chromo-endoscopy by spreading indigo carmine (IC) dye onto the stomach surface. To the best of our knowledge, this is the first paper to report a successful 3D reconstruction of a whole stomach and visualize the color details of the mucosal surface of it by texture mapping generated from the standard monocular endoscope video.
We also demonstrate our custom viewer that can visualize a particular image frame’s location in the 3D model.
Ii Materials and Methods
In this section, we briefly describe the data collection and the 3D reconstruction method. We first explain our endoscopy hardware setup and the captured video sequences information (Section II-A). Then, we explain each component of our method starting from the input images extraction for SfM (Section II-B), the SfM pipeline (Section II-C), and the mesh and texture generation (Section II-D).
Ii-a Data collection
This study was conducted in accordance with the Declaration of Helsinki. The Institutional Review Board at Nihon University Hospital approved the study protocol on March 8, 2018, before patient recruitment. Informed consent was obtained from all patients before they were enrolled. This study was registered with the University Hospital Medical Information Network (UMIN) Clinical Trials Registry (identification No.: UMIN000031776) on March 17, 2018. This study was also approved by the research ethics committee of Tokyo Institute of Technology, where 3D reconstruction experiments were conducted.
We captured the endoscope video using a standard endoscope system. We used an Olympus IMH-20 image management hub coupled with a GIF-H290 scope. To prevent any compression and unwanted artifacts such as image interlacing, we used an Ephipan video grabber to capture unprocessed data from the image management hub. The video data was saved as an AVI format in 30 frames per second with full HD resolution.
The videos used for 3D reconstruction were captured on three different subjects undergoing general gastrointestinal endoscopy. As shown in Figure 1 (a) and (b), each video contains two image sequences captured without and with spraying the IC blue color dye onto the stomach surface as chromo-endoscopy, which is widely applied in endoscopy to enhance the surface visualization. For the dye, we used manufactured by Daiichi Sankyo Company, Limited, Tokyo, Japan. Additionally, we captured images of a planar checkerboard pattern from multiple orientations for the camera calibration purpose.
Ii-B Pre-processing of the collected data
The pre-processing of the collected data is performed to estimate intrinsic camera parameters and to extract input images for SfM. This process includes camera calibration, frame extraction, and color channel separation as follows.
An endoscopy camera generally uses an ultra-wide lens to provide a large angle of view inside the stomach. As a trade-off, the ultra wide lens introduces a strong visual distortion and produces images with a convex non-rectilinear appearance, which leads to incorrectly estimated 3D structure. Therefore, camera calibration is needed to obtain the intrinsic camera parameters such as focal length, projection center, and distortion parameters. We used the previously captured planar checkerboard pattern images and a fish-eye camera model  for the camera calibration. The acquired intrinsic camera parameters were used to optimize the 3D reconstruction process in SfM and to correct the image’s distortion.
In the input images extraction process, we first extracted all RGB frames from each video. Then, we extracted two image sequences from each video, where the first one consists of the images captured without IC dye (see Fig. 1(a)), while the second one consists of the images captured with IC dye (see Fig. 1(b)). After an in-depth inspection, we found that there are many color artifacts in the RGB images caused by color channel misalignment as shown in Fig. 1(a) and (b). To minimize the effect of the artifacts, we decided to use single channel images as SfM inputs. We also removed any duplicate frames that have almost no movement between successive frames. We used six channel images, as shown in Fig. 1(c)-(h), as SfM inputs and investigated the combined effect of chromo-endoscopy and color channel selection.
Ii-C Stomach 3D reconstruction
The stomach 3D reconstruction follows the general flow of an SfM pipeline, assuming that the stomach has minimum movements. The algorithm starts with extracting features from the input images, matching the extracted features, and followed by the endoscope poses estimation and the feature points triangulation in parallel. This step generates a sparse point cloud of the stomach based on the endoscope motion and estimates each frame’s pose with respect to each other.
Ii-D Mesh and texture generation
Mesh and texture representation enables better visualization of the reconstructed 3D model. Our mesh generation starts by downsampling the original point cloud from the SfM result to a
number of 3D points and removes outlier points using statistical outlier removal to generate a smooth mesh. The outlier removal starts by calculating the each 3D point’s mean distance () with its closest neighboring points. Assuming the distance distribution is Gaussian, the global distance mean (
) and the standard deviation () are then computed. Any 3D points whose mean distance is over a threshold are removed as outliers, leaving numbers of inlier 3D points. Then, the normal of each inlier 3D point is estimated based on its closest neighboring points. Each of the estimated normal is further refined using the related endoscope camera poses information to prevent it pointing outward. Finally, the triangle mesh is generated based on the outlier-removed 3D point cloud and its per-point estimated normal by using screened Poisson surface reconstruction . To add more visual detail and functionality, we then applied a texture to the generated mesh based on the registered endoscope cameras in the SfM step. For each triangle mesh, we searched the best registered image for texturing based on the triangle-to-camera angle and distance.
Iii Results and Discussion
We performed the endoscope camera calibration using the OpenCV camera calibration library. The SfM pipeline was implemented on Colmap . For filtering the point cloud, we set as to generate a smooth triangle mesh. We applied screened Poisson reconstruction  for triangle mesh generation. For the texturing purpose, we applied the parameterization and texturing function from Meshlab .
Figure 2 shows the 3D point cloud results on subject A, which are reconstructed using different color channels of the cases without and with IC dye. In general, the channels with the IC dye (Fig. 2(d)-(f)) give a more complete reconstruction result compared to the channels without the IC dye (Fig. 2(a)-(c)). In the case without the IC dye, the green channel gives the best result even though the model is full of holes. In the case with the IC dye, the results of all the three channels have the shape of the stomach. Among the RGB channels, the red channel gives the most complete and densest result. Some holes still exist in the result using the green channel, while the blue channel is only able to reconstruct around of the whole stomach.
Table I shows the objective evaluation of the 3D point cloud results on all three subjects. Table I shows that the number of 3D points is generally higher when the IC dye is present. We also notice that the average observation, which represents the per-image average number of the 2D feature points that can be triangulated into the 3D points, is generally increased when the IC dye exists. In addition, the percentage of reconstructed images over input images is significantly increased by using the IC dye. Among all the results, the red channel with the IC dye gives the best result, where more than 95% images are reconstructed. When the IC dye is not present, the green channels gives the best result.
The above subjective and objective evaluation consistently shows that the red channel with the IC dye gives the best result. As shown in Fig. 1(c)-(h), this is because that the red channel leverages the effect of the IC dye more than the other channels. In Fig.1(f), many textures, from which many distinctive features can be extracted, are apparent in the red channel. When the IC dye is not used, the green channel has better contrasts compared to the others channels. The blue channel is the least preferable among those three channels for both cases without and with the IC dye.
Figure 3 shows the results of triangle mesh and texture generation using the red channel with the IC dye. We can confirm that the generated meshes resemble the whole shape of a stomach. Moreover, the textured representation makes the generated 3D model more perceptible for viewers.
Figure 4 shows our custom viewer that can project any selected reconstructed images to the generated triangle mesh based on the estimated endoscope poses in SfM. This custom viewer provides viewers with the estimated location of a particular image frame, which can be used for the 3D localization of a malignant lesion. Our viewer should be very valuable for gastric surgeons to make a medical decision.
In this paper, we have presented an offline solution to reconstruct the whole shape of a stomach from a standard monocular endoscope video. To obtain better reconstruction quality using SfM, we used a single channel images without color channel misalignment artifact. We found that the chromo-endoscopy with IC blue color dye generally gives significant improvement to the completeness of the reconstruction result. Furthermore, we found that the red channel with the IC dye provides the most complete 3D model compared to the other channels. A custom viewer that can localize a particular image frame in the reconstructed 3D model was also presented. In future work, we plan to refine the mesh generation process for more detail representation considering more effective downsampling and outlier removal approaches. To view the results in more detail, please visit our project page in the following link (http://www.ok.sc.e.titech.ac.jp/res/Stomach3D/).
-  N. Yamamichi, C. Hirano, Y. Takahashi, C. Minatsuki, C. Nakayama, R. Matsuda, T. Shimamoto, C. Takeuchi, S. Kodashima, S. Ono, Y. Tsuji, M. Fujishiro, R. Wada, T. Mitsushi, and M. Koike, “Comparative analysis of upper gastrointestinal endoscopy, double-contrast upper gastrointestinal barium X-ray radiography, and the titer of serum anti-Helicobacter pylori IgG focusing on the diagnosis of atrophic gastritis,” Gastric cancer, vol. 19, no. 2, pp. 670–675, 2016.
-  J. W. Kim, S. S. Shin, S. H. Heo, H. S. Lim, N. Y. Lim, Y. K. Park, Y. Y. Jeong, and H. K. Kang, “The role of three-dimensional multidetector CT gastrography in the preoperative imaging of stomach cancer: Emphasis on detection and localization of the tumor,” Korean Journal of Radiology, vol. 16, no. 1, pp. 80–89, 2015.
-  L. Maier-Hein, P. Mountney, A. Bartoli, H. Elhawary, D. Elson, A. Groch, A. Kolb, M. Rodrigues, J. Sorger, S. Speidel, and D. Stoyanov, “Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery,” Medical Image Analysis, vol. 17, no. 8, pp. 974–996, 2013.
-  J. Geng and J. Xie, “Review of 3-D endoscopic surface imaging techniques,” IEEE Sensors Journal, vol. 14, no. 4, pp. 945–960, 2014.
-  T. Okatani and K. Deguchi, “Shape reconstruction from an endoscope image by shape from shading technique for a point light source at the projection center,” Computer Vision and Image Understanding, vol. 66, no. 2, pp. 119–131, 1997.
-  C. H. Q. Foster and C. Tozzi, “Towards 3D reconstruction of endoscope images using shape from shading,” in Proc. of Brazilian Symposium on Computer Graphics and Image Processing, pp. 90–96, 2000.
-  O. G. Grasa, E. Bernal, S. Casado, I. Gil, and J. M. M. Montiel, “Visual SLAM for handheld monocular endoscope,” IEEE Trans. on Medical Imaging, vol. 33, no. 1, pp. 135–146, 2014.
-  N. Mahmoud, I. Cirauqui, A. Hostettler, C. Doignon, L. Soler, J. Marescaux, and J. M. M. Montiel, “ORBSLAM-based endoscope tracking and 3D reconstruction,” in Proc. of International Workshop on Computer-Assisted and Robotic Endoscopy (CARE), pp. 72–83, 2016.
-  N. Mahmoud, C. Toby, A. Hostettler, L. Soler, C. Doignon, and J. M. M. Montiel, “Live tracking and dense reconstruction for handheld monocular endoscopy,” IEEE Trans. on Medical Imaging, vol. 38, no. 1, pp. 79–88, 2019.
-  S. Mills, L. Szymanski, and R. Johnson, “Hierarchical structure from motion from endoscopic video,” in Proc. of Int. Conf. on Image and Vision Computing New Zealand (IVCNZ), pp. 102–107, 2014.
-  D. Sun, J. Liu, C. A. Linte, H. Duan, and R. A. Robb, “Surface reconstruction from tracked endoscopic video using the structure from motion approach,” in Proc. of Augmented Reality Environments for Medical Imaging and Computer-Assisted Interventions (AE-CAI), pp. 127–135, 2013.
-  K. L. Lurie, R. Angst, D. V. Zlatev, J. C. Liao, and A. K. E. Bowden, “3D reconstruction of cystoscopy videos for comprehensive bladder records,” Biomedical Optics Express, vol. 8, no. 4, pp. 2106–2123, 2017.
-  R. Furukawa, H. Morinaga, Y. Sanomura, S. Tanaka, S. Yoshida, and H. Kawasaki, “Shape acquisition and registration for 3D endoscope based on grid pattern projection,” in Proc. of European Conf. on Computer Vision (ECCV), pp. 399–415, 2016.
-  C. Schmalz, F. Forster, A. Schick, and E. Angelopoulou, “An endoscopic 3D scanner based on structured light,” Medical Image Analysis, vol. 16, no. 5, pp. 1063–1072, 2012.
-  P. F. Alcantarilla, A. Bartoli, F. Chadebecq, C. Tilmant, and V. Lepilliez, “Enhanced imaging colonoscopy facilitates dense motion-based 3D reconstruction,” in Proc. of Int. Conf. of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 7346–7349, 2013.
-  J. Kannala and S. S. Brandt, “A generic camera model and calibration method for conventional, wide-angle, and fish-eye lenses,” IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 28, no. 8, pp. 1335–1340, 2006.
-  D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of Computer Vision, vol. 60, no. 2, pp. 91–110, 2004.
-  B. Triggs, P. F. McLauchlan, R. I. Hartley, and A. W. Fitzgibbon, “Bundle adjustment – A modern synthesis,” in Proc. of Int. Workshop on Vision Algorithms, pp. 298–372, 1999.
-  M. Kazhdan and H. Hoppe, “Screened Poisson surface reconstruction,” ACM Trans. on Graphics, vol. 32, no. 3, p. 29, 2013.
J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in
Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 4104–4113, 2016.
-  “Meshlab,” http://www.meshlab.net/, (Accessed on 01/10/2019).