To Use or Not to Use: CPUs' Cache Optimization Techniques on GPGPUs

10/09/2018
by   Vajira Thambawita, et al.
0

General Purpose Graphic Processing Unit(GPGPU) is used widely for achieving high performance or high throughput in parallel programming. This capability of GPGPUs is very famous in the new era and mostly used for scientific computing which requires more processing power than normal personal computers. Therefore, most of the programmers, researchers and industry use this new concept for their work. However, achieving high-performance or high-throughput using GPGPUs are not an easy task compared with conventional programming concepts in the CPU side. In this research, the CPU's cache memory optimization techniques have been adopted to the GPGPU's cache memory to identify rare performance improvement techniques compared to GPGPU's best practices. The cache optimization techniques of blocking, loop fusion, array merging and array transpose were tested on GPGPUs for finding suitability of these techniques. Finally, we identified that some of the CPU cache optimization techniques go well with the cache memory system of the GPGPU and shows performance improvements while some others show the opposite effect on the GPGPUs compared with the CPUs.

READ FULL TEXT
research
05/08/2019

SAWL:A Self-adaptive Wear-leveling NVM Scheme for High Performance Storage Systems

In order to meet the needs of high performance computing (HPC) in terms ...
research
07/26/2019

Massively Scaling Seismic Processing on Sunway TaihuLight Supercomputer

Common Midpoint (CMP) and Common Reflection Surface (CRS) are widely use...
research
11/23/2020

Proximu: Efficiently Scaling DNN Inference in Multi-core CPUs through Near-Cache Compute

Deep Neural Network (DNN) inference is emerging as the fundamental bedro...
research
06/03/2019

Cache Contention on Multicore Systems: An Ontology-based Approach

Multicore processors have proved to be the right choice for both desktop...
research
07/23/2020

Observing the Invisible: Live Cache Inspection for High-Performance Embedded Systems

The vast majority of high-performance embedded systems implement multi-l...
research
08/02/2022

A Framework for High-throughput Sequence Alignment using Real Processing-in-Memory Systems

Sequence alignment is a fundamentally memory bound computation whose per...
research
11/26/2018

An optimized Parallel Failure-less Aho-Corasick algorithm for DNA sequence matching

The Aho-Corasick algorithm is multiple patterns searching algorithm runn...

Please sign up or login with your details

Forgot password? Click here to reset