Funktionen

Print[PRINT]
.  Home  .  Lehre  .  Studentische Arbeiten  .  Bachelorarbeiten  .  ba-spmv-prefetch

Software prefetching for irregular memory access patterns

    SparCity

Motivation and Background

Computational kernels with irregular memory access patterns, such as sparse matrix-vector multiplication (SpMV), occur in many scientific and machine learning applications. A typical and important example is the conjugate gradient method used in iterative solvers. The performance of these applications often suffers from high memory access penalties due to low locality of reference in the irregular access pattern.

Prefetching, speculatively loading data ahead of time into fast memory, is a common technique to reduce the cost of memory accesses by hiding memory access latency. Modern processors are typically equipped with a hardware prefetching mechanism that analyzes the address stream to predict addresses of future memory accesses. Hardware prefetching requires predictable and therefore regular access patterns in order to enable the prefetching mechanism. Thus, hardware prefetchers are usually ineffective for irregular access patterns. However, prefetching can also be implemented with software by inserting prefetch instructions into the code. Those prefetching instructions can be used to explicitly prefetch data, even for irregular memory access patterns.

Goals and Tasks

In this thesis, we want to explore the potential performance benefits and drawbacks of inserting prefetching instructions into SpMV kernels on a variety of modern hardware architectures. In the context of this thesis, you will:

  • Implement or modify (vectorized) SpMV kernels for serveral sparse data storage formats
  • Select a collection of suitable sparse matrices with different archetypal matrix structures
  • Design and conduct experiments on the hardware architectures of the BEAST system
  • Evaluate the implementation and interpret the results with respect to SpMV kernel and storage format, matrix structure, and hardware architecture

Prerequisites

  • Proficiency in a low-level programming language (C/C++ knowledge from the Systempraktikum or similar)
  • Basic understanding and interest in modern computer architecture and numerical algorithms
  • Advantageous: Knowledge of the contents of the lecture Parallel and High Performance Computing or similar

Literature

Organisatorisches