Satoshi Matsuoka

research

∙ 06/06/2023

Exploiting Scratchpad Memory for Deep Temporal Blocking: A case study for 2D Jacobian 5-point iterative stencil kernel (j2d5pt)

General Purpose Graphics Processing Units (GPGPU) are used in most of th...

0 Lingqi Zhang, et al. ∙

research

∙ 05/12/2023

Revisiting Temporal Blocking Stencil Optimizations

Iterative stencils are used widely across the spectrum of High Performan...

0 Lingqi Zhang, et al. ∙

research

∙ 01/06/2023

Myths and Legends in High-Performance Computing

In this humorous and thought provoking article, we discuss certain myths...

0 Satoshi Matsuoka, et al. ∙

research

∙ 04/15/2022

Preparing for the Future – Rethinking Proxy Apps

A considerable amount of research and engineering went into designing pr...

0 Satoshi Matsuoka, et al. ∙

research

∙ 04/05/2022

Persistent Kernels for Iterative Memory-bound GPU Applications

Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU ...

0 Lingqi Zhang, et al. ∙

research

∙ 10/21/2021

MLPerf HPC: A Holistic Benchmark Suite for Scientific Machine Learning on HPC Systems

Scientific communities are increasingly adopting machine learning and de...

29 Steven Farrell, et al. ∙

research

∙ 10/19/2021

Digital transformation of droplet/aerosol infection risk assessment realized on "Fugaku" for the fight against COVID-19

The fastest supercomputer in 2020, Fugaku, has not only achieved digital...

0 Kazuto Ando, et al. ∙

research

∙ 04/27/2021

Performance Portable Back-projection Algorithms on CPUs: Agnostic Data Locality and Vectorization Optimizations

Computed Tomography (CT) is a key 3D imaging technology that fundamental...

0 Peng Chen, et al. ∙

research

∙ 10/27/2020

Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?

Matrix engines or units, in different forms and affinities, are becoming...

0 Jens Domke, et al. ∙

research

∙ 08/26/2020

Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA

The dedicated memory of hardware accelerators can be insufficient to sto...

0 Mohamed Wahib, et al. ∙

research

∙ 07/25/2020

The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism

We present scalable hybrid-parallel algorithms for training large-scale ...

0 Yosuke Oyama, et al. ∙

research

∙ 04/11/2020

A Study of Single and Multi-device Synchronization Methods in Nvidia GPUs

GPUs are playing an increasingly important role in general-purpose compu...

0 Lingqi Zhang, et al. ∙

research

∙ 04/09/2020

A Survey on Coarse-Grained Reconfigurable Architectures from a Performance Perspective

With the end of both Dennard's scaling and Moore's law, computer users a...

0 Artur Podobas, et al. ∙

research

∙ 02/14/2020

High-Performance High-Order Stencil Computation on FPGAs Using OpenCL

In this paper we evaluate the performance of FPGAs for high-order stenci...

0 Hamid Reza Zohouri, et al. ∙

research

∙ 01/06/2020

AN5D: Automated Stencil Framework for High-Degree Temporal Blocking on GPUs

Stencil computation is one of the most widely-used compute patterns in h...

0 Kazuaki Matsumura, et al. ∙

research

∙ 10/15/2019

The Memory Controller Wall: Benchmarking the Intel FPGA SDK for OpenCL Memory Interface

Supported by their high power efficiency and recent advancements in High...

0 Hamid Reza Zohouri, et al. ∙

research

∙ 09/06/2019

iFDK: A Scalable Framework for Instant High-resolution Image Reconstruction

Computed Tomography (CT) is a widely used technology that requires compu...

0 Peng Chen, et al. ∙

research

∙ 07/14/2019

A Versatile Software Systolic Execution Model for GPU Memory-Bound Kernels

This paper proposes a versatile high-performance execution model, inspir...

0 Peng Chen, et al. ∙

research

∙ 03/27/2019

Batched Sparse Matrix Multiplication for Accelerating Graph Convolutional Networks

Graph Convolutional Networks (GCNs) are recently getting much attention ...

0 Yusuke Nagasaka, et al. ∙

research

∙ 02/26/2019

A New Linear Time Correctness Condition for Multiplicative Linear Logic

In this paper, we give a new linear time correctness condition for proof...

0 Satoshi Matsuoka, et al. ∙

research

∙ 12/21/2018

Adaptive Pattern Matching with Reinforcement Learning for Dynamic Graphs

Graph pattern matching algorithms to handle million-scale dynamic graphs...

0 Hiroki Kanezashi, et al. ∙

research

∙ 11/29/2018

Second-order Optimization Method for Large Mini-batch: Training ResNet-50 on ImageNet in 35 Epochs

Large-scale distributed training of deep neural networks suffer from the...

0 Kazuki Osawa, et al. ∙

research

∙ 10/22/2018

Double-precision FPUs in High-Performance Computing: an Embarrassment of Riches?

Among the (uncontended) common wisdom in High-Performance Computing (HPC...

0 Jens Domke, et al. ∙

research

∙ 04/13/2018

μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching

NVIDIA cuDNN is a low-level library that provides GPU kernels frequently...

0 Yosuke Oyama, et al. ∙

research

∙ 04/05/2018

High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitiv...

0 Yusuke Nagasaka, et al. ∙

research

∙ 02/01/2018

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL

Recent developments in High Level Synthesis tools have attracted softwar...

0 Hamid Reza Zohouri, et al. ∙

Satoshi Matsuoka

Featured Co-authors

Sign in with Google

Consider DeepAI Pro