DeepAI AI Chat
Log In Sign Up

Optimizing Xeon Phi for Interactive Data Analysis

by   Chansup Byun, et al.

The Intel Xeon Phi manycore processor is designed to provide high performance matrix computations of the type often performed in data analysis. Common data analysis environments include Matlab, GNU Octave, Julia, Python, and R. Achieving optimal performance of matrix operations within data analysis environments requires tuning the Xeon Phi OpenMP settings, process pinning, and memory modes. This paper describes matrix multiplication performance results for Matlab and GNU Octave over a variety of combinations of process counts and OpenMP threads and Xeon Phi memory modes. These results indicate that using KMP_AFFINITY=granlarity=fine, taskset pinning, and all2all cache memory mode allows both Matlab and GNU Octave to achieve 66 performance for process counts ranging from 1 to 64 and OpenMP threads ranging from 1 to 64. These settings have resulted in generally improved performance across a range of applications and has enabled our Xeon Phi system to deliver significant results in a number of real-world applications.


page 1

page 3


Benchmarking Data Analysis and Machine Learning Applications on the Intel KNL Many-Core Processor

Knights Landing (KNL) is the code name for the second-generation Intel X...

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication

We propose COSMA: a parallel matrix-matrix multiplication algorithm that...

Sparse Matrix-Matrix Multiplication on Multilevel Memory Architectures : Algorithms and Experiments

Architectures with multiple classes of memory media are becoming a commo...

High-performance sparse matrix-matrix products on Intel KNL and multicore architectures

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitiv...

New Perspectives on Centering

Data matrix centering is an ever-present yet under-examined aspect of da...

Improving a High Productivity Data Analytics Chapel Framework

Most state of the art exploratory data analysis frameworks fall into one...

Interactive Supercomputing on 40,000 Cores for Machine Learning and Data Analysis

Interactive massively parallel computations are critical for machine lea...