On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing

05/22/2018
by   Michael Kenzel, et al.
0

Compute-mode rendering is becoming more and more attractive for non-standard rendering applications, due to the high flexibility of compute-mode execution. These newly designed pipelines often include streaming vertex and geometry processing stages. In typical triangle meshes, the same transformed vertex is on average required six times during rendering. To avoid redundant computation, a post-transform cache is traditionally suggested to enable reuse of vertex processing results. However, traditional caching neither scales well as the hardware becomes more parallel, nor can be efficiently implemented in a software design. We investigate alternative strategies to reusing vertex shading results on-the-fly for massively parallel software geometry processing. Forming static and dynamic batching on the data input stream, we analyze the effectiveness of identifying potential local reuse based on sorting, hashing, and efficient intra-thread-group communication. Altogether, we present four vertex reuse strategies, tailored to modern parallel architectures. Our simulations showcase that our batch-based strategies significantly outperform parallel caches in terms of reuse. On actual GPU hardware, our evaluation shows that our strategies not only lead to good reuse of processing results, but also boost performance by 2-3× compared to naïvely ignoring reuse in a variety of practical applications.

READ FULL TEXT

page 1

page 7

page 9

research
09/20/2021

GPGPU-Parallel Re-indexing of Triangle Meshes with Duplicate-Vertex and Unused-Vertex Removal

We describe a simple yet highly parallel method for re-indexing "indexed...
research
08/13/2020

Strategies for Efficient Executions of Irregular Message-Driven Parallel Applications on GPU Systems

Message-driven executions with over-decomposition of tasks constitute an...
research
04/15/2021

Rendering Point Clouds with Compute Shaders and Vertex Order Optimization

While commodity GPUs provide a continuously growing range of features an...
research
09/10/2021

An Effective Early Multi-core System Shared Cache Design Method Based on Reuse-distance Analysis

In this paper, we proposed an effective and efficient multi-core shared-...
research
07/29/2019

Modeling Shared Cache Performance of OpenMP Programs using Reuse Distance

Performance modeling of parallel applications on multicore computers rem...
research
05/08/2023

TauBench 1.1: A Dynamic Benchmark for Graphics Rendering

Many graphics rendering algorithms used in both real-time games and virt...
research
10/31/2019

Run-time Parameter Sensitivity Analysis Optimizations

Efficient execution of parameter sensitivity analysis (SA) is critical t...

Please sign up or login with your details

Forgot password? Click here to reset