GPGPU Applications

Objectives:
The course demonstrates the general purpose utilization of the computing power of modern graphics cards, through their generalized model. During the course the architecture of the graphics card and the OpenCL general purpose computing environment are introduced. Various algorithms designed for massively parallel architecture are presented through practical examples.
 
Synopsis:
1. Overview of the architecture of the GPU
 
The lecture discusses the massively parallel architecture of the graphics hardware and its limitations, as well as the fundamentals of parallel programming. An overview of environments for graphics hardware programming is presented.
 
2. Introduction to OpenCL and GPGPU
 
OpenCL, a general purpose programming environment for GPU programming is presented, including an overview of its virtual machine platform, memory and program model and the OpenCL C language and the related CPU side API.  
 
Vector processing operations for large data sets are discussed. An overview of the parallelization of scattering and gathering type algorithms, including their limitations and implementation details, is presented. 
 
3. Implementation issues of basic parallel primitives in OpenCL
 
Introduction to the parallel programming primitives, the most fundamental building blocks of scalable algorithms. Discussed primitives include map, reduce, amplify, scan and compact operators.
 
Sorting algorithms designed for parallel architecture are presented: brick sort, radix sort, merge sort and the parallel variant of quick sort. The parallel algorithms are compared to traditional sorting algorithms w.r.t. complexity and time cost.
 
4. Solution of linear equation systems 
 
Parallel algorithms for solving linear equation systems are presented. We discuss implementation issues of matrix and vector operations, as well as storage issues and operations with sparse matrices. 

5. Monte Carlo methods on the GPU
 
Introduction to Monte Carlo methods and their implementation issues and possible applications. As the main issue of these algorithms is the generation of high quality random numbers, we discuss pseudo random and quasi random number generators that can be used on parallel architectures.
 
6. Physical simulations on the GPU
 
Examples for physical models that can be evaluated efficiently on GPUs. Efficiency and scalability of the presented algorithms is also discussed.
 
7. Adjoint Monte Carlo methods Fundamental issues of adjoint Monte Carlo methods
 
We discuss theoretical requirements and implementation issues of a gathering type algorithm in the context of GPU based PET reconstruction.
 
8. Introduction to CUDA programming
 
Similarities and differences between OpenCL and CUDA.
 
9. Optimization issues of GPGPU applications
 
Optimization and performance measurement issues of parallel algorithms are discussed. The possibilities of algorithmic optimizations using of theoretical and practical metrics are presented. 
 
10. Efficient interoperability with the graphics API (OpenGL)
 
The basic tools to connect the OpenCL environment with the OpenGL graphics API are discussed. These can be used to create efficient visualization for general purpose computing application, which can be extremely helpful in the evaluation of computation results.