Navigable videos for presenting scientific data on head-mounted displays

11/28/2016
by   Jacqueline Chu, et al.
University of California-Davis
0

Immersive, stereoscopic viewing enables scientists to better analyze the spatial structures of visualized physical phenomena. However, their findings cannot be properly presented in traditional media, which lack these core attributes. Creating a presentation tool that captures this environment poses unique challenges, namely related to poor viewing accessibility. Immersive scientific renderings often require high-end equipment, which can be impractical to obtain. We address these challenges with our authoring tool and navigational interface, which is designed for affordable head-mounted displays. With the authoring tool, scientists can show salient data features as connected 360 video paths, resulting in a "choose-your-own-adventure" experience. Our navigational interface features bidirectional video playback for added viewing control when users traverse the tailor-made content. We evaluate our system's benefits by authoring case studies on several data sets and conducting a usability study on the navigational interface's design. In summary, our approach provides scientists an immersive medium to visually present their research to the intended audience--spanning from students to colleagues--on affordable virtual reality headsets.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

page 6

page 7

11/23/2020

ASIAVR: Asian Studies Virtual Reality Game a Learning Tool

The study aims to develop an application that will serve as an alternati...
10/04/2020

Interface Design for HCI Classroom: From Learners' Perspective

Having a good Human-Computer Interaction (HCI) design is challenging. Pr...
02/22/2021

Remote VR Studies – A Framework for Running Virtual Reality Studies Remotely Via Participant-Owned HMDs

We investigate the opportunities and challenges of running virtual reali...
10/02/2017

Accelerating Scientific Data Exploration via Visual Query Systems

The increasing availability of rich and complex data in a variety of sci...
10/03/2019

iVRNote: Design, Creation and Evaluation of an Interactive Note-Taking Interface for Study and Reflection in VR Learning Environments

In this contribution, we design, implement and evaluate the pedagogical ...
02/16/2003

Analysis and Interface for Instructional Video

We present a new method for segmenting, and a new user interface for ind...
10/10/2019

Visual Understanding of Multiple Attributes Learning Model of X-Ray Scattering Images

This extended abstract presents a visualization system, which is designe...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Related Work

Our work encompasses interactive videos, immersive scientific visualization, and animation for the use of storytelling. Using these concepts together, our system addresses the lack of effective presentation media for scientists to share their research on low-cost immersive, stereoscopic viewing displays.

1.1 Interactive Video

In our approach, “interactive” refers to providing users control over their navigation of the video paths and 360 viewing. However, what constitutes as an interactive video can be ambiguous [29]. Some features include non-linear playback [30] and detail-on-demand video summaries [37]. All of these features leverage different forms of interactivity to provide a flexible viewing experience. Panoramic videos are also considered interactive for when users change their viewing direction during its playback. Having recognized the benefits of immersion, work has been done to facilitate the production and viewing of immersive videos [2].

In addition, our work includes the authoring of immersive videos that showcase scientific data, similar to that of Stone et al.’s [40]. In their work, they visualized and produced movies on molecular dynamic simulations involving millions of 3D atoms. By incorporating omnidirectional, panoramic techniques into their rendering engine, the resulting movies can be viewed on various HMDs. Likewise, we also have added panoramic projection techniques to create immersive and stereoscopic videos; more details are discussed in Section 2.1.

1.2 Immersive Scientific Visualization

Scientific visualization has been shown on a variety of immersive displays: spherical [4], large-tiled [33], fish-tank [7], CAVEs [45], and HMDs [28]. In particular to HMDs, Drouhard et al. proposed design strategies for immersive virtual environments to facilitate the adoption of VR into scientific domains [8]. They discussed how influential HMDs can be for the scientific community, with one of its key benefits being affordability. Designed for consumer-available headsets, our system facilitates knowledge sharing, especially in a classroom environment where funding and space are too limited to obtain high-end displays [34] like a CAVE.

To provide a comfortable VR experience, optimization techniques have been developed to improve the viewing and interactive experience around immersive scientific data. Ebert et al. used a glyph-based volume renderer–which they preferred over isosurface or voxel-based techniques–to provide fast rendering times to support their stereoscopic viewing system [9]. Kniss et al. implemented a texture-based rendering system for terabyte-sized volume data sets on a high-IO, multi-hardware system [18]. Although only achieving 5 to 10 frames per second, this technique provides low latency by modifying the pipelines’ workload, either by rendering small, but multiple portions or data subsampling to render fewer samples per frame. In recent work, Hanel et al. continuously adjusted the visual quality in favor of stable frame rates and preventing simulation sickness [15].

For our purposes, we found videos and our treatment of them to provide a unique learning and presentation experience. Exporting to this medium also avoids the side effects that optimization techniques are often associated with, such as the loss of visual quality or network dependencies for rendering. Our approach preserves the visual quality of advanced rendering techniques and offers interactivity through a roadmap’s size and structure of video paths. Since scientists use their expertise when authoring videos, the resulting navigable videos of scientific data are promising for education, as the application of VR has shown to be useful in other learning domains [32, 35].

1.3 Animation for Storytelling

In our efforts to support effective content presentation, our system was inspired by storytelling principles. Scientific visualization have recognized its benefits [26, 42] and have established frameworks for effective communication to the target audience. However, these storytelling guidelines and frameworks can be difficult to incorporate in practice. Gershon and Page summarized this challenge as “a story is worth a thousand pictures” in which a single static image cannot capture all the multifaceted components of a story [11]. Fortunately, animation is an effective tool to aid storytelling visualization, but must be applied appropriately to improve user experience and visual discourse [6].

Nowadays, most scientific tool kits include basic animation support for video export. More comprehensive systems allow changes for a variety of dimensions [3, 23]

by interpolating viewpoint, color mapping, or clipping planes. Hsu et al. generated animation by automating camera paths from user-specified criteria 

[16], while Liao et al. leveraged a scientist’s exploration history [22]. We have found that the underlying models of scientific animations tend to be timelines, which suggest that the animation is linear [38].

However, studies have found that users prefer non-linear animations [25, 27]. Zhang et al. discussed that interactive videos enhance learner-content interactivity which potentially improves learning effectiveness and motivation [44]. For reasons like this, our work focuses on the producing and viewing of non-linear stories around the scientific data. Similar to Wohlfart and Hauser’s work [43], we use a node-link diagram to build non-linear animation. However, our playback interface is its own stand-alone component and is not integrated into the given renderer. Instead of allowing user manipulation of the presentation, we limit the user’s interaction to navigation of the author’s roadmap, which is composed of the fundamental and tailored characteristics of the data. Having the ability to truly explore can verify a user’s understanding [14], but an advantage of our interaction design is that users can stay focused. If we allowed users to deviate from the intended storyline, they may become distracted [39].

2 System Overview

Figure 1: An illustrative system overview of our authoring tool and navigational interface. The video output is an intermediary stage—comprised of videos and a roadmap—between the authoring and navigational components.
Figure 2: An example of using the authoring tool around a backpack data set, which was generated from evaluating nondestructive testing methods. The authoring tool’s two subcomponents, timeline editor and roadmap interface, are shown on the right of our volume renderer. The timeline editor shows dimensional changes involving viewpoint and transfer function (TF). Each lane has an icon and its keyframes are color coded to match its respective dimension. Shown with frame numbers, snapshots are placed above its respective keyframe and reflect the change made. Options on the top-left help with creation of a single sequence of animation. The roadmap interface shows the roadmap that connects and structures the non-linear animation. Nodes with specular highlights store a keyframe, whereas gray ones do not. The thicker black line represents the current video the author is modifying. Although video playback and dimension interpolation are bidirectional, edges are displayed to be directed to denote the corresponding video’s start and end. This provides authors a reference on which ends of the video they are modifying and connecting the dimensional changes. The options on the right can be used to save and load roadmaps to apply the changes to other data sets. Using the export button, all the videos are generated and connected appropriately.

Our system is comprised of two distinct components as shown in Figure 1. The authoring tool enables scientists to construct stories around their data in the form of a roadmap, while the navigational interface facilitates the immersive, stereoscopic viewing of the resulting content on HMDs. For the authoring and viewing of navigable videos, a typical workflow starts with the author visualizing the data of interest. Using a renderer that has our authoring tool integrated, a scientist can utilize its subcomponents—the timeline editor and roadmap interface—to build each video. For a single animation segment, the timeline editor can interpolate multiple data dimensions for the scientist to highlight key characteristics of the data. Using the roadmap interface, the author can connect the animation segments to customize how users should experience the content. Once the content is finalized, the authoring tool exports the roadmap and a series of videos as input for the navigation component. The roadmap serves as the underlying structure of the video navigation, such that the navigational interface can play the next video by using the viewer’s current position in the roadmap.

In the remainder of this section, we detail both the authoring tool and navigational interface. Since the scope of our work mainly targets HMDs that require smartphones, the details disclosed are in context of mobile hardware. In this work, we used a Samsung Galaxy S6 and recommend that the use of the navigational interface should be on phones with similar specifications. For more examples of the system components and resulting videos, please refer to the supplementary video.

2.1 Authoring Tool

This authoring stage can be seen as a preprocessing step to produce high-quality visuals that can be presented comfortably in VR headsets. This tool is designed to be a modular animation library which can be integrated into different types of rendering systems, such as those that render 3D meshes or non-uniform grid data. In this paper, we have visualized volumetric data that is uniformly structured on a grid.

This tool was implemented using C++, Qt, and FFMPEG for video export. Although it is designed to be renderer-agnostic, the authoring tool must be connected to the given renderer using Qt’s event framework. If it is not set up with these dependencies, the renderer must be able to export videos that match the specifications described in Section 2.2. Omnidirectional rendering must also be used to enable stereoscopic viewing, similar to that generated by our camera model, which is described in Section 2.1.1. Finally, a roadmap metadata file, which contains additional video information, will then need to be created. Once it is generated, the video output can then be used as input into our navigation interface.

2.1.1 Renderer

An interactive renderer allows scientists to experiment with various rendering parameters, such as color or viewpoint, and produce visuals that effectively showcase the unique data features. However, a small caveat exists when rendering stereoscopic content: The renderer must provide an image for each eye. Since scientific visualizations involve expensive rendering algorithms, we have implemented Google’s Omni-directional Stereo (ODS) camera model. ODS achieves stereoscopic viewing by producing two panoramic images—one panorama for each eye. We favored this technique as it does not require the composition of sub-images to recreate the projection effects [12].

For our volume renderer, we modified its ray casting algorithm, such that the ray directions match those that are described in Google’s ODS developer guide [13]. As suggested in the document, we used an interpupillary distance (IPD) of 6.4 cm, which was converted to match the units used by our renderer. For advanced rendering techniques, we added pre-integration to alleviate sampling artifacts and volumetric shadows to improve depth cues. We also incorporated the ability to change the clipping plane distance to prevent volume data features from being rendered uncomfortably close to the viewer.

2.1.2 Timeline Editor

Figure 2 (top-right) shows the timeline editor, which is a keyframe-based interface for the author to create a single instance of linear animation. The timeline is made up of several independent lanes with icons to indicate the lane’s respective dimension. Each dimension can be changed over the animation sequence. The author is able to preview their animation in the renderer and make any necessary edits.

Our authoring tool currently supports interpolation over camera, transfer function (TF), clipping planes, and temporal dimensions. We briefly summarize the benefits of each dimensional change:

  • Camera: Viewpoint, or spatial, changes can help users have a better vantage point of the data set. Camera changes include rotation, fly-through, and panning. However, camera rotations–especially around the y-axis–may not be effective, since users already can view the content in any direction. If possible, we recommend keeping the camera inside the data set, which will fully immerse the viewer in the content.

  • TF: Color mapping changes can isolate particular features of the data that fall on a specific range of values. This can help the viewer focus on a certain feature, while the other characteristics are set to a lower opacity.

  • Clipping planes: Changing the positions of the XYZ planes–the planes that define an axis-aligned bounding box–can clip off values that fall outside the data’s boundaries. In some cases, moving clipping planes can reveal the internal structures of the data. This can be particularly useful for medical data sets as it contains many internal structures for study.

  • Temporal: These changes are applicable to time-varying data sets, which show how the data evolves over the collected time steps. Interesting data attributes may reveal themselves at certain time steps and not in others. This dimension complements storytelling well, since it shows a natural progression of the data changing.

In addition, the timeline editor includes features that ease the creation of animation, such as showing thumbnails of the data at the time of a dimensional change, color coding the keyframes to their respective dimension, and allowing the author to preview the changes in the renderer through scrubbing and playback of the timeline.

2.1.3 Roadmap Interface

The roadmap interface facilitates the creation of interactive, navigable videos. Our approach centers around navigable video paths, which abstracts the filtering tasks–those required to visualize the meaningful attributes–from the end user. These preset paths are represented in our roadmap structure, which has an underlying model and appearance of a node-link graph. In our design, each node contains a single keyframe and each edge contains an instance of animation, which can be modified in the timeline editor. An example roadmap is shown in Figure 2 (bottom-right).

To build navigable video content, the author must first build their roadmap. An author can either lay out the final roadmap structure, construct the edges systematically one-by-one, or have a workflow that is a mixture of the two methods. Edges are visualized with an arrow from the source to target node which represent the starting and ending keyframes respectively. Although video playback and dimension interpolation are bidirectional, the display of a directed edge provides authors a reference to which ends of the video they are modifying and connecting the dimensional changes. In respect to the system, knowledge of a start and end of a video allows the interface to connect the animations seamlessly.

The resulting roadmap structure can be fairly arbitrary due to the interface’s support for free-form creation. For example, the length of the edge does not represent its animation’s length. However, the links between the nodes themselves determine how keyframes are shared across adjacent nodes. These roadmap operations can be generalized to three categories:

  • Build: Whether along a single or already concatenated edge, content can be built upon in this linear fashion.

  • Branch: Video content can branch off from a mutual node. This operation presents multiple video options for the viewer to choose.

  • Merge: Video content can be merged due to an earlier branch operation. Keyframes are shared based on the connection order of the incoming edges to this mutual node.

With the programmatic support of sharing keyframes, edges can be properly connected to ensure continuous animation amongst the adjacent video segments.

2.2 Video Output

The authoring tool exports the roadmap along with several videos, where each video is associated with an edge. The videos are encoded using the H.264 codec with FFMPEG. This output is represented in a roadmap metadata file, which is used by the navigation stage to reconstruct the roadmap’s connectivity. A video output example is shown in Figure 3. Connectivity: 0 , 1 roadmap_0 Connectivity: 1, 2 roadmap_1 Connectivity: 1, 3 roadmap_2 Connectivity: 3, 0 roadmap_3

(a) Metadata file
(b) Roadmap
Figure 3: A video output example. a) A simple metdata file that lists the video edges and connections with its end nodes. b) The corresponding roadmap from the metadata file.

Since our approach reduces the data exploration space, we wanted to support forward and backward video playback. This allows viewers to traverse the videos at their own pace and review anything they may have missed. Since video codecs do not support backwards playback, we generate two videos for both directions to implement this design feature. Since stereoscopy is essential to include for enhanced depth and spatial cues, we also must generate two videos for the left and right eye to achieve parallax. As a result, we have videos, where represents the number of edges in the roadmap.

As a roadmap grows and becomes more complex, the resulting memory footprint can grow significantly for a single viewing experience. We mitigate the negative effects of this trend by choosing an appropriate Group of Picture (GOP) length. This value dictates how often a keyframe, or an uncompressed frame, will be stored in the video file. In addition, it affects seeking accuracy and memory size. Decreasing the GOP length improves seeking accuracy while increasing the file’s size. Ideally, we want to have high seeking accuracy and a low memory footprint. We use 0.25 seconds for the GOP length and found that it represents a good trade-off between these two factors.

We have encoded the video files with a frame rate of 30 frames per second (FPS), which is generally recommended for our target set of HMDs [19]. We have found that this frame rate has a good balance amongst file size, I/O bandwidth, and latency. The frame rate of the playback interface is designed to stabilize around 60 FPS.

Since we are designing our system for VR headsets that require smartphones, we must be mindful of the available GPU resources, specifically, the number of video decoders. We experimented with video resolution sizes that allowed four videos to be decoded for a given edge. With our small benchmarking tool, we have found that 720p, or 1280x720 pixels, to be the maximum resolution for 360 videos that is supported by the mobile device’s hardware. Since this is a fairly low resolution, we supersampled the frames which were rendered as 4K, or 3840x2160 pixels, images to counterbalance visual artifacts such as aliasing.

2.3 Navigational Interface

Our system’s navigational interface is the front-end component that presents the authored content to the user. Between the authoring and navigation stages, the roadmap structure is preserved to determine how a viewer can traverse the content. The roadmap is also presented to the viewer for reference on their progress within the authored content. In the eyes of the viewer, each of the roadmap’s edge represents a video and a node represents a position at either the start or end of the video segment. A viewer is on an edge when viewing the video and is at an intersection when they reach either end of the video. We have designed this interface to be simple and effective when guiding the viewer through the videos. With its functionality similar to a virtual tour, our interface enables viewers to explore what the author has intended them to see.

2.3.1 Head-Mounted Displays

Nowadays, many HMDs are available to general consumers, such as Google Cardboard, Samsung GearVR, Sony PlayStationVR, Oculus Rift, and HTC Vive. In contrast to specialized display systems like CAVEs, these headsets provide an affordable alternative to VR. In particular, Google Cardboard is an accessible platform since it has been designed to be paired with an inexpensive viewing device and a smartphone. Naturally, the viewing quality is not as vivid compared to higher-end devices like the Oculus Rift or HTC Vive. All of these HMDs feature head tracking, stereoscopic viewing, and at least one input element. For this paper, we designed the navigational interface around the Google Cardboard platform, as it is the most affordable in the market.

Figure 4: An illustrative example of our Unity scene setup for our navigational interface. A panoramic video frame (top) is projected onto the sphere (middle) with a camera rig at its center. A plane displays UI elements (bottom), either of one of the three playback widgets or preview mode.

2.3.2 Interface Setup

For development, we used Unity as our engine. Unity offers cross-platform support for desktop and mobile deployment, along with native VR support and compatibility with popular HMDs. As a result, we were able to port our interface to our target set of HMDs with little worry about device-specific development. Since Unity currently does not provide support for video textures on mobile platforms, we used a third-party plugin to communicate with the video decoder for rendering the frames to our specified texture.

As shown in Figure 4, we have one scene set up to load the video content. The video frames are mapped onto the sphere with a camera rig that is placed at the sphere’s center. The camera rig contains two cameras that are offset by an 6.4cm IPD-equivalent in Unity’s world coordinates. To generate correct parallax, a camera either renders for the left or right eye. Our user interface widgets are drawn on the UI plane in world space—versus screen space—to leverage the 3D stereoscopic effects from our scene setup.

To meet our design specifications, we have four active videos per roadmap edge during the viewer’s experience. Only two of the four videos are played at any given time to maximize GPU resources. Based on the current playing direction, the appropriate two videos are queued to play, whereas the other two are paused. When the viewer wants to change the playing direction or is at an intersection, we must switch and sync the pair of videos through seek operations.

2.3.3 User Interface Design

User state Button action Playback action
On an edge Double tap Switch play direction
Tap + hold
Play video
(release will pause video)
At an intersection Double tap
Switch play direction
(back on edge)
Tap + hold Enter preview mode
Tap Cycle video selection
Preview mode Double tap Exit preview mode
Tap + hold Move onto selected video
Table 1: Available interactions in the navigational interface. Depending on the current playback state, the three recognized button actions will perform different playback actions.

We believe that an unobtrusive user interface will enhance the viewer’s exploration experience. Influenced by Oculus Connect 2’s developer conference talk [5], our interface utilizes a single button, in which we have defined the following actions: tap, double tap, and tap+hold. The interface has three states, which is determined by the user’s state: on an edge, at an intersection, or in preview mode.

Data set Voxel size Total memory
Video
paths
Memory
footprint
Video
length
Avg. FPS
Dimensional
changes
Server room 417x345x60 0.032 GB 5 0.13 GB 00:01:47 60.401 Camera
Visible Human
(Male, Female)
Male: 512x512x1877
Female: 512x512x1734
Male: 1.83 GB
Female: 1.69 GB
7 0.092 GB 00:02:10 59.711
Camera, TF,
Clipping Plane
Supernova
(50 time steps)
867x867x867 120 GB 1 0.54 GB 00:01:40 59.943 Temporal
Table 2: A quantitative summary of the case studies. Each data set is a floating-point volumetric field. Each field has a fixed size (Voxel size) for a single time step; only Supernova has multiple time steps. The amount of storage used by the raw data is listed under Total memory. The number of authored videos (Video paths), total file size (Memory footprint), and length (Video length) are shown next. The average frame rate (Avg. FPS) obtained in the case studies is also reported. Finally, we list the dimensional changes that were applied onto the data set.

We designed a playback widget that allows the viewer to see their progress on the current video. The playback widget is structured as a circular progress bar with a play icon at its center. At the start of a video, the progress bar is fully maroon. When the user progresses forward, the bar fills in a counter-clockwise fashion with turquoise. If the user plays the video backwards, the progress bar recedes and the play icon updates its direction. At both ends of the progress bar are smaller triangles which are highlighted when the user has reached an intersection. With a tap+hold action, the progress bar fills with yellow and only switches to the preview mode once the progress bar is full. The preview mode enlarges, showing the roadmap to the user. The nodes and edges have visual encodings which is determined by the user’s viewing history. In order to select a video, the user must tap to cycle through the adjacent edges of their current node. Below the roadmap is a set of dots, which represent the number of video options and the current selection. A summary of the available interactions is available in Table 1.

With the current design, users have no indication as to what the next path entails. In our initial implementation of the preview mode, users were only able to view and choose from the upcoming changes. This preview displayed one of four small multiples through a semi-transparent black frame. The frame acted as a viewing window. Each multiple displayed a video frame that corresponded to one of the four “time steps,” which were placed at 25 percent increments throughout the video. In order to preview the video, we used alpha-blending transitions to cycle through the multiples. However, when we conducted a pilot study prior to the usability study, participants reported this feature to be too confusing. We decided to omit this preview feature in favor of showing the roadmap structure itself, which allowed users to be more aware of their global position in the authored content. After more thorough design, this preview feature should be integrated back into the system and should be used in tandem with the roadmap. This will reduce any randomness involved when users are deciding which path to traverse.

3 Case Studies

We created a set of case studies to demonstrate an author’s thought process when creating a presentation and how the resulting visuals present unique data features for the end user to learn. These case studies include scientific data sets that vary in size and respective scientific domain. Table 2 summarizes the specifications of the data sets, along with a quantitative overview of the resulting video content. The case studies were conducted on a Samsung Galaxy S6 that has 32 GB of storage, 3 GB of RAM, and a display resolution of 2560x1440 pixels. For the following content, we used a renderer with advanced lighting features, which improve the depth perception of the data’s features [24].

3.1 Server Room

The server room data set is artificially-made and captures the characteristics of air pressure fields in a room full of machines. With several rows of server machines, the room is expected to be hot which is damaging to computers. To better maintain the machines, the owner can use visualization to evaluate the quality of their ventilation systems, which help regulate the room’s temperature. To visually present the characteristics of the room’s air pressure, we used a heat map to not only color the level of air pressure, but also indicate the temperature at any point of the room; this shows where the ventilation could be improved.

The server room is the smallest data set of our three case studies, sized at 417x345x60 voxels and 0.032 GB. For this case study, we used a single TF, which includes a rainbow color mapping with low opacity for the categorical representation of air pressure values—red and purple map to low and high air pressure, respectively. This spectrum of warm to cool color hues maps to hot to cold temperatures. The room and machines were colored gray to provide a contrast against the colors of the air pressure values. We manipulated the camera dimension to provide a first-person view of someone walking through the room. The authored paths extend from each corner and meet at the room’s center. We also created a path between two of the corners, as that particular hallway shows unique instances of air pressure distribution.

Figure 5: A preview of interesting characteristics of the room’s pressure field: blocks of medium air pressure hovering above the machines and floor vents with low air pressure.

By moving the camera throughout the room, we allowed users to compare instances of air pressure from various parts of the room. With the 360 viewing, users have control over where to examine the air flow and can determine whether the air is being emitted from floor vents or exhausted towards the ceiling. Figure 5 highlights a few interesting aspects from our presentation of the data, such as how low air pressure radiates from several floor vents. Our TF also revealed several fairly-defined yellow blocks, which are shaped as the machines below them. The color indicates that the air emitted from the machine’s exhaust fan, which is not as powerful as the ventilation system.

3.2 Visible Human

(a) Visible male
(b) Visible female
Figure 6: Side-by-side comparison of two sliced side views of the anatomy: a) male and b) female. For instance, one navigable path goes through these lower body cavities in which the visual features are different; the annotation shows a panoramic view of inside the cavity.

The Visible Human data set is a collection of digitized slices of two full-body cadavers: one male, the other female. This data set is provided by the U.S. National Library of Medicine’s Visible Human Project [1], an effort that has captured high-quality cross-sectional photographs for visualizing the human body. Quality data such as this has opened more opportunities for study of the human anatomy. We authored this data set to showcase the anatomical structures and provide a point of comparison between female and male bodies.

The male data set is 512x512x1877 voxels, whereas the female data set is 512x512x1734 voxels. We first created content of the male by creating a roadmap with an edge that branched out with two options. On the opposite end of the branch, we built another edge, where the video starts by fading-in the male from black. For the other edges, we changed the following: moved the camera through two cavities in the head and lower chest, adjusted the clipping planes to reveal the internal structures, and fine-tuned TFs to filter out noisy values that was found in one of the explored cavities of one data set and not the other. This same roadmap was used for the female, with a few modifications to correctly apply the dimensional changes onto the differing physical characteristics. The two roadmaps generated two sets of content, which were connected by post-concatenating the videos that transitioned from black to its respective cadaver. The metadata file was modified to reflect the concatenation of the male and female videos.

In the navigable content, we wanted to establish context by setting the camera outside each of the cadavers. As expected, there are noticeable physical differences between the female and male. By manipulating the TF to make skin values translucent and moving the slicing planes, we revealed the organs and bone. As we moved the camera towards the cavities in the head or chest, users are able to observe differences at microscopic scales. One instance is shown in Figure 6 (a, b), where the camera animates from the sliced view and into the lower chest cavity. In the close-up, the male has a protrusion, whereas the female does not. Overall, the nuances shared between the cadavers provided opportunities for the user to study.

3.3 Supernova

The supernova data set was created from the results of physical model simulations on a supernova star. The value visualized is entropy, or the rate of decline in energy. These simulations tend to be large, complex, and multi-modal, which can be difficult for scientists to quantitatively analyze, let alone users like astronomy students. When authoring these videos, we wanted to simplify the experience to the evolutionary changes of a supernova’s energy.

(a) Time step 1
(b) Time step 50
Figure 7: Panoramic views of the supernova at different time steps: a) 1 and b) 50. The gases have experienced changes in movement and energy.

In this case study, we visualized 50 time steps. The data set is sized at 867x867x867 voxels, in which a single time step is 2.40 GB. We used a single TF for coloring entropy—blue is low and purple is high. By restricting the use of a single TF throughout the videos, we maintained a consistent visual encoding between color and energy. In order to fully-immerse the viewer, we fixed the camera’s position at the center of the supernova and near its core. Since the total memory to store all the time steps amounts to 120 GB, we generated the content in segments–10 time steps at a time. Then, we concatenated the segments for viewing as a single video.

Figure 7 (a, b) shows the supernova’s dynamic nature over the 50 time steps. We chose to show a large range of time, so users can see how the star’s gases evolved in terms of movement and energy: The gas clouds have wrapped around the core and seem to have experienced high entropy, which can be inferred by its color transition to purple. Also, the resulting memory footprint is 0.54 GB in contrast to the original data size of 120 GB, which is an impractical size for average computers to hold in memory and interactively render for visual presentation. With results like this, our system makes large data sets more accessible for presentation in a modest setting.

4 Formative Usability Study

We conducted a small usability study to assess how well-received the presentation experience is for a student audience. This study serves as a formative evaluation, which will help us ascertain the strengths and weaknesses of the interface. Since the interface allows users to dictate the playing and viewing direction of the navigable videos, it was important that we receive feedback on its current design.

By assuming the role as author, we created a presentation to highlight the data set’s unique features. We chose to simplify the viewing experience by only introducing camera movement as the only data dimension that users can change as they play the videos. We asked participants to answer four questions to encourage them to navigate through the room to find the answers. The task was intended for the students to interact and use our presentation system, such that the correctness of their answers hold little significance to our study’s results.

4.1 Procedure

First, we summarized the goals of our research and pre-assessed the user’s experience on the concepts involved in our system. Shortly after, we had the participants go through a tutorial that walked them through the operations to navigate through a sample data set. For those who were not familiar with Google Cardboard, we reviewed the headset’s features. The tutorial also covered how to use the preview mode, which introduced the roadmap. It explained that each edge represents a video and how at each path’s end, users would be presented options on where to go next in the roadmap.

Secondly, we asked the users to complete a navigation task as a simple exercise in using the navigational interface. Users were not required to answer the questions in a particular order and were not timed per answer. This task involved the same navigable videos of the server room data set, which is described in Section 3.1. These questions were designed to be answered objectively and solved by leveraging the 360 viewing and moving throughout the room. This task required them to answer the following questions:

  • In any two room’s corners, is the pressure high or low?

  • Are the exhaust vents on the top or sides of the machine?

  • Which color (s) emit from the floor vents?

  • Is high air pressure on the floor or on top of the machines?

A post-assessment followed once the user finished answering the questions. They were asked to rate their thoughts against a series of statements on a 5-point Likert scale. Finally, we asked if they had any feedback for improving the playback interface.

4.2 Participants

Using the university’s emailing-lists, we recruited 22 students who are currently involved in the STEM fields, such as Computer Science, Biomedical Engineering, Physics, and Material Sciences. 16 were male and six were female. Participants’ mean age was 25. For the pre-assessment, students reported an average rating of 3.18 (=0.89), 3.14 (=1.04), and 3.18 (=1.05) for familiarity with scientific visualization, VR, and 360 videos, respectively. In our study, we had two participants who could not complete the navigation task due to the phone overheating and having experienced high levels of cybersickness.

4.3 Environment

Participants were asked to sit at a table in a swivel chair. By sitting in a swivel chair, the user can better align their body when viewing in 360. In front of them were reference sheets about the interface widgets and data set, along with a copy of the questions. Users could freely refer to these materials at any time during the study. The VR devices used were a Google Cardboard headset and the same phone described in Section 3. A Google Chromecast streamed the phone’s screen to a secondary monitor for us to troubleshoot any issues users ran into. Only audio was recorded for user’s feedback which were reviewed after the session. Users were encouraged to take as much time as they needed when exploring the content and answering the questions.

4.4 User Feedback

During the post-assessment, we received valuable user feedback on the navigational interface. Most subjects wanted the roadmap to be displayed at all times for reference on their location. Some subjects suggested new usability features. S3 suggests “maybe if you triple tap [it] show[s] help.” This help menu would display content similar to that of Table 1. Others related the navigational interface back to their studies. S10 explained “I just know this thing in 2D—pressure dispersion around a room. 2D would be as useful to me as 3D. If I didn’t know that already, this would be much more useful as a learning experience.” S5 stated “I look at proteins on my computer, and so it is very annoying to look on a desktop and just drag it around with a mouse, so I was thinking how nice it is to have 3D.”

Although in a controlled setting with a fairly simple data set, participants had fair reviews of our navigational interface. Students reported an average rating of 3.41 (=0.85), 3.86 (=0.77), and 3.67 (=0.86), for interface usability, presentation effectiveness, and if they would use the interface again, respectively. However, we did observe that a fairly high learning curve existed when using our system. For example, we noticed users showing little aptitude when viewing 360 videos, which may have negatively influenced their experience. Specifically, some users were disoriented when the camera movements did not align with their current viewing direction. We noticed these participants passively viewed the content and did not leveraging the 360 viewing. Overall, most participants still expressed excitement and piqued interest towards the future applications that our system enables.

5 Discussion

When considering all the possible video configurations, we expected that videos of long length would be frustrating and uninspiring for viewers. Originally, the experience was intended to be “continuous,” such that it emulates real-world experiences of navigating through an area. However, this interactive tour can easily become a maze, where the user becomes lost or impatient. For example, S8 expressed that they wanted a teleportation feature. They expressed frustration when traversing the video back again to reach an already-known area of interest. However, we observed that the participants made a strong connection of the roadmap’s layout to spatial movement. More design consideration will be needed when introducing abstract dimensional changes, such as TF, to the user and breaking this seemingly strong connection between spatial movement and the user’s change in position on the roadmap. It is unclear if the “convenience” of teleporting outweighs the likely jarring effect when viewing the possibly large and discrete visual changes.

The participants’ reception on the authored content highlights the tight interplay between the authoring tool and navigational experience. For instance, long videos are likely to be tedious for a participant to traverse through. If the path happens to be a “dead end,” the user must play the video backwards to reach another intersection for video selection. Also, since there is no system restriction on how the content can be authored, extreme cases of content–to name a few, a large number of short videos or one extremely long video–can negatively impact the playback experience. With further experiments to cover all bases of roadmap configurations along with an evaluation involving the intended users, that is, author and viewer, we can build a reference of best practices when creating content for immersive navigation. It is important to balance the user’s experience with the ease and flexibility of authoring content–viewers should feel engaged and in control, while authors should be able to create whatever they desire to fit their presentation needs.

6 Future work

Our system’s design is flexible and should be applied to other disciplines that would benefit from visualization, animation, and the 3D space that VR offers. Information visualization is traditionally rendered in 2D and often involves extremely large amounts of data in which the relationships are often aggregated or filtered due to the data’s density and lack of rendering and viewing real-estate [36]. For these reasons, immersive, stereoscopic viewing may be a viable solution to these challenges with its extra spatial dimension to lay out the data. For this class of renderers to adopt our approach, a set of dimensional changes must be specified. Some of these dimensions include temporal, motion, and color and filtering transitions.

6.0.1 Authoring Tool

We have found that 360 viewing has its pros and cons: Users have full control of where they are looking, but are susceptible to missing crucial aspects of the content. Although in favor of an interactive environment, these drawbacks are detrimental to how well the author’s intent is conveyed when viewing the tailored experiences. One feature that may help is incorporating text annotations into our system. These annotations would be fairly arbitrary, ranging from displaying instructions to interesting facts about the data set. Using the authoring tool, the scientist can specify their message and where the annotation resides. The annotations will then exist on the playback side for the viewer to encounter them. We can experiment with how to present these annotations–for example, as pop-ups or “signs”–to see what best aids the viewer to understand and learn the presented content. We want the annotations to serve as guides to ensure that the viewer still has sufficient interactivity and control of their viewing experience.

We also would like to have comprehensive previewing features to aid scientists in creating more effective, interactive narratives of their data and findings. For example, authors should be able to traverse through the roadmap to preview what their tour would look like. Another example is to provide a complementary rendering window which would emulate the viewer’s experience, especially when the content is projected onto the sphere. It also may be possible to port a frame or animation segment directly to the physical HMD. This allows the scientist to experience the spatial depth that stereoscopic effects provide. However, exporting the content between the desktop and HMDs in a streamlined manner is not trivial.

Another step for this work is to evaluate our authoring tool when used by scientists. By integrating our tool into their domain-specific workflows, scientists can present their findings in our new medium. During the evaluation, it is important to observe how scientists use the tool. Depending on the sample size, we may find a trend in how scientists use the tool which dictates the different constraints our system should enforce. The scientists’ feedback would be insightful, allowing us to better align the software design to meet their needs. In practice, most scientific animations are created by skilled animators. Since scientists are unlikely to be experts with animation tools, the feedback can guide later designs, whether it involves an interface redesign or more scientist-friendly features.

6.0.2 Navigational Interface

In future work, we would like to improve our video memory footprint for each viewing experience. Since we used a third-party plugin to render video frames to a texture, we had less flexibility for memory optimizations. In the next iteration, we would like to implement our own video encoder and decoder, which we can design to fit our needs. We can employ techniques such as Facebook’s pyramid encoding [20] and benefit from its data management scheme, which has reported to reduce memory up to 85 percent.

A major challenge in immersive viewing is prompting what a user should look at. Since the 360 viewing is fully interactive, a user decides what they end up looking at, and possibly misses salient information. We believe that on-screen clues, such as the aforementioned annotations from the authoring stage, or audio cues can address this issue. The on-screen clues can indicate where the user should turn their head, whereas immersive audio can be another perceptual channel to prompt users where to look. Audio can be particularly useful for learning and captions can also be used to increase accessibility. However, the bidirectional video playback presents a possible challenge since audio and reading is naturally linear. Smaller, “discrete” audio cues would be easiest to use. More design and experimentation would be required to integrate audio into our system.

Following through with the nature of storytelling, we would like to facilitate the sharing of this scientific content. A centralized repository can help form a community that builds and shares these immersive, visual presentations of scientific data. In addition, real-time streaming of the authored videos enables our system to support a multi-user experience. However, we must be wary of the consequences: Network dependencies are likely to be introduced in order to enable content distribution. Although we can offload the number of videos resident on the device’s memory and GPU decoders, we would have to fine-tune the video settings to be better suited for streaming video—if latency is poor, it will make for an uncomfortable viewing experience.

7 Conclusion

Many scientific studies are about capturing and understanding complex physical phenomena and structures. Immersive visualization offers a more perceptually effective way to examine 3D structures and spatial relationships. Capturing this immersive space, our presentation medium leverages a scientist’s expertise to display the content effectively. For viewing, the navigational interface is compatible with increasingly affordable HMDs, which are accessible VR platforms for showcasing a scientist’s research to their target audience. From the case studies and formative usability study, our findings suggest that our navigable videos show promise as a presentation medium. However, the interface will need further design iterations to improve its usability, especially with the expectations that it will be used by people with varying interest levels in scientific visualization. We believe that our work can be used in other visualization fields, such as information visualization and visual analytics, which would benefit from an immersive presentation medium.

Acknowledgements.
This research was sponsored in part by the UC Davis RISE program, US National Science Foundation via grants DRL-1323214, IIS-1528203, and IIS-1320229, and U.S. Department of Energy via grant DE-FC02-12ER26072.

References

  • [1] M. J. Ackerman. The visible human project. Proceedings of the IEEE, 86(3):504–511, 1998.
  • [2] A. Agarwala, K. C. Zheng, C. Pal, M. Agrawala, M. Cohen, B. Curless, D. Salesin, and R. Szeliski. Panoramic video textures. In ACM Transactions on Graphics (TOG), volume 24, pages 821–827, 2005.
  • [3] H. Akiba, C. Wang, and K.-L. Ma. AniViz: A template-based animation tool for volume visualization. Computer Graphics and Applications, IEEE, 30(5):61–71, 2010.
  • [4] X. Amatriain, J. Kuchera-Morin, T. Hollerer, and S. T. Pope. The allosphere: Immersive multimedia for scientific discovery and artistic exploration. IEEE MultiMedia, 16(2):0064–75, 2009.
  • [5] K. Brady and R. Emms. Oculus connect 2:navigating new worlds: Designing UI and UX in VR. https://www.youtube.com/watch?v=braV_c4M8oI, September 2015.
  • [6] F. Chevalier, N. H. Riche, C. Plaisant, A. Chalbi, and C. Hurter. Animations 25 years later: New roles and opportunities. 2016.
  • [7] C. Demiralp, C. D. Jackson, D. B. Karelitz, S. Zhang, and D. H. Laidlaw. Cave and fishtank virtual-reality displays: A qualitative and quantitative comparison. IEEE transactions on visualization and computer graphics, 12(3):323–330, 2006.
  • [8] M. Drouhard, C. A. Steed, S. Hahn, T. Proffen, J. Daniel, and M. Matheson. Immersive visualization for materials science data analysis using the oculus rift. In Big Data (Big Data), 2015 IEEE International Conference on, pages 2453–2461, 2015.
  • [9] D. S. Ebert, C. D. Shaw, A. Zwa, and C. Starr. Two-handed interactive stereoscopic visualization. In Proceedings of the 7th Conference on Visualization’96, pages 205–ff. IEEE Computer Society Press, 1996.
  • [10] T. A. Funkhouser and C. H. Séquin. Adaptive display algorithm for interactive frame rates during visualization of complex virtual environments. In Proceedings of the 20th annual conference on Computer graphics and interactive techniques, pages 247–254. ACM, 1993.
  • [11] N. Gershon and W. Page. What storytelling can do for information visualization. Communications of the ACM, 44(8):31–37, 2001.
  • [12] D. Gledhill, G. Y. Tian, D. Taylor, and D. Clarke. Panoramic imaging—a review. Computers & Graphics, 27(3):435–445, 2003.
  • [13] Google. Rendering omni‐directional stereo content. https://developers.google.com/cardboard/jump/rendering-ods-content.pdf, 2015.
  • [14] S. Gratzl, A. Lex, N. Gehlenborg, N. Cosgrove, and M. Streit. From visual exploration to storytelling and back again. bioRxiv, page 049585, 2016.
  • [15] C. Hänel, B. Weyers, B. Hentschel, and T. W. Kuhlen. Visual quality adjustment for volume rendering in a head-tracked virtual environment. IEEE transactions on visualization and computer graphics, 22(4):1472–1481, 2016.
  • [16] W.-H. Hsu, Y. Zhang, and K.-L. Ma. A multi-criteria approach to camera motion design for volume data animation. Visualization and Computer Graphics, IEEE Transactions on, 19(12):2792–2801, 2013.
  • [17] D. Kanter. Graphics processing requirements for enabling immersive vr. AMD White Paper, 2015.
  • [18] J. Kniss, P. McCormick, A. McPherson, J. Ahrens, J. Painter, A. Keahey, and C. Hansen. Interactive texture-based volume rendering for large data sets. IEEE Computer Graphics and Applications, 21(4):52–61, 2001.
  • [19] N. Kraakman. The best encoding settings for your 4K 360 3D VR videos. http://www.purplepillvr.com/best-encoding-settings-resolution-for-4k-360-3d-vr-videos/, December 2015.
  • [20] E. Kuzyakov and D. Pio. Next-generation video encoding techniques for 360 video and VR. https://code.facebook.com/posts/1126354007399553/next-generation-video-encoding-techniques-for-360-video-and-vr/, January 2016.
  • [21] B. Laha, K. Sensharma, J. D. Schiffbauer, and D. A. Bowman. Effects of immersion on visual analysis of volume data. Visualization and Computer Graphics, IEEE Transactions on, 18(4):597–606, 2012.
  • [22] I. Liao, W.-H. Hsu, and K.-L. Ma. Storytelling via navigation: A novel approach to animation for scientific visualization. In Smart Graphics, pages 1–14. Springer, 2014.
  • [23] A. Limaye. Drishti: a volume exploration and presentation tool. In SPIE Optical Engineering+ Applications, pages 85060X–85060X. International Society for Optics and Photonics, 2012.
  • [24] F. Lindemann and T. Ropinski. About the influence of illumination models on image comprehension in direct volume rendering. IEEE Transactions on Visualization and Computer Graphics, 17(12):1922–1931, 2011.
  • [25] J.-L. Lugrin, M. Cavazza, D. Pizzi, T. Vogt, and E. André. Exploring the usability of immersive interactive storytelling. In Proceedings of the 17th ACM symposium on virtual reality software and technology, pages 103–110, 2010.
  • [26] K.-L. Ma, I. Liao, J. Frazier, H. Hauser, and H.-N. Kostis. Scientific storytelling using visualization. Computer Graphics and Applications, IEEE, 32(1):12–19, 2012.
  • [27] E. Marchetti and A. Valente. What happened to non-linear narrative? a pedagogical reflection. In Advanced Learning Technologies (ICALT), 2015 IEEE 15th International Conference on, pages 233–237, 2015.
  • [28] S. Marks, J. E. Estevez, and A. M. Connor. Towards the holodeck: fully immersive virtual reality visualisation of scientific and engineering data. In Proceedings of the 29th International Conference on Image and Vision Computing New Zealand, pages 42–47. ACM, 2014.
  • [29] B. Meixner and H. Kosch. Interactive non-linear video: definition and XML structure. In Proceedings of the 2012 ACM symposium on Document engineering, pages 49–58, 2012.
  • [30] B. Meixner, B. Siegel, G. Hölbling, F. Lehner, and H. Kosch. SIVA suite: authoring system and player for interactive non-linear videos. In Proceedings of the 18th ACM international conference on Multimedia, pages 1563–1566, 2010.
  • [31] J. Noguera and J. R. Jimenez. Mobile volume rendering: Past, present and future. 2016.
  • [32] M. Ponder, B. Herbelin, T. Molet, S. Schertenlieb, B. Ulicny, G. Papagiannakis, N. Magnenat-Thalmann, and D. Thalmann. Immersive VR decision training: telling interactive stories featuring advanced virtual human simulation technologies. In Proceedings of the workshop on Virtual environments 2003, pages 97–106. ACM, 2003.
  • [33] K. Reda, A. Knoll, K.-i. Nomura, M. E. Papka, A. E. Johnson, and J. Leigh. Visualizing large-scale atomistic simulations in ultra-resolution immersive environments. In LDAV, pages 59–65, 2013.
  • [34] M. Roussou. Immersive interactive virtual reality and informal education. In Proceedings of User Interfaces for All: Interactive Learning Environments for Children, 2000.
  • [35] M. Roussou. Learning by doing and learning through play: an exploration of interactivity in virtual environments for children. Computers in Entertainment (CIE), 2(1):10–10, 2004.
  • [36] H.-J. Schulz and C. Hurter. Grooming the hairball-how to tidy up network visualizations? In INFOVIS 2013, IEEE Information Visualization Conference, 2013.
  • [37] F. Shipman, A. Girgensohn, and L. Wilcox. Hyper-hitchcock: Towards the easy authoring of interactive video. In Human-Computer Interaction INTERACT, volume 3, pages 33–40, 2003.
  • [38] M. Spaniol, R. Klamma, N. Sharda, and M. Jarke. Web-based learning with non-linear multimedia stories. In Advances in Web Based Learning–ICWL 2006, pages 249–263. Springer, 2006.
  • [39] J. Steele and N. Iliinsky. Beautiful visualization: looking at data through the eyes of experts. ” O’Reilly Media, Inc.”, 2010.
  • [40] J. E. Stone, W. R. Sherman, and K. Schulten. Immersive molecular visualization with omnidirectional stereoscopic ray tracing and remote rendering. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), pages 1048–1057. IEEE, 2016.
  • [41] A. Van Dam, A. S. Forsberg, D. H. Laidlaw, J. J. LaViola Jr, and R. M. Simpson. Immersive VR for scientific visualization: A progress report. Computer Graphics and Applications, IEEE, 20(6):26–52, 2000.
  • [42] M. Wohlfart. Story telling aspects in medical applications. In Central European Seminar on Computer Graphics, 2006.
  • [43] M. Wohlfart and H. Hauser. Story telling for presentation in volume visualization. In Proceedings of the 9th Joint Eurographics/IEEE VGTC conference on Visualization, pages 91–98. Eurographics Association, 2007.
  • [44] D. Zhang, L. Zhou, R. O. Briggs, and J. F. Nunamaker. Instructional video in e-learning: Assessing the impact of interactive video on learning effectiveness. Information & management, 43(1):15–27, 2006.
  • [45] S. Zhang, C. Demiralp, D. F. Keefe, M. DaSilva, D. H. Laidlaw, B. Greenberg, P. J. Basser, C. Pierpaoli, E. A. Chiocca, and T. S. Deisboeck. An immersive virtual environment for DT-MRI volume visualization applications: a case study. In Visualization, 2001. VIS’01. Proceedings, pages 437–584. IEEE, 2001.