PACO 2019: 3rd Workshop on Power-Aware Computing

Name: PACO 2019: 3rd Workshop on Power-Aware Computing
Start: 2019-11-05T08:00:00+01:00
End: 2019-11-06T13:00:00+01:00
Location: Max Planck Institute for Dynamics of Complex Technical Systems

5–6 Nov 2019

Max Planck Institute for Dynamics of Complex Technical Systems

Europe/Berlin timezone

Contact

Contribution List

1. Opening

Peter Benner (Max Planck Institute for Dynamics of Complex Technical Systems)

05/11/2019, 08:30

7. Parallel solution of large sparse systems by direct and hybrid methods

Iain S. Duff (STFC RAL, UK and Cerfacs, France)

05/11/2019, 08:45

Keynote

We discuss a range of algorithms and codes for the solution of sparse systems that we have developed in an EU Horizon 2020 Project, called NLAFET, that finished on 30 April 2019.

We used two approaches to get good single node performance. For symmetric systems we used task-based algorithms based on an assembly tree representation of the factorization. We then used runtime systems for...

15. Exploiting Nested Task-Parallelism in the LU Factorization of Hierarchical Matrices

Rocío Carratalá-Sáez (Universitat Jaume I)

05/11/2019, 09:50

Talk

Day I

Hierarchical matrices (H-matrices) lie in-between dense and sparse scenarios. Therefore, it is natural to tackle the LU factorization of H-Matrices via a task-parallel approach, which has recently reported successful results for related linear algebra problems. In this work, we will describe how to discover the data-ﬂow parallelism intrinsic to the operation at execution time, via the analysis...

19. Unleashing the sptrsv method in FPGAs

Federico Favaro (Facultad de Ingeniería, Universidad de la República)

05/11/2019, 10:20

Talk

Day I

Field--Programmable Gate Arrays (FPGAs) as hardware accelerators offer great flexibility and performance, and recently are emerging as a more energy--efficient alternative than other many--core devices.

The traditional methods for FPGA design involve the use of low--level Hardware Description Languages such as VHDL or Verilog. These impose a vastly different programming model than standard...

20. Towards an efficient many-core implementation of the IRKA

Matías Valdés (Universidad de la República)

05/11/2019, 10:40

Talk

Day I

The modeling of physical phenomena as Linear Time Invariant systems is a common practise across science and industry. It is often the case that the order of these models is so large that it renders them unuseful at the time of simulating the studied system. In these cases, practitioners can appeal to Model Order Reduction (MOR) techniques, which departing from the original model produce a...

21. Automatic selection of GPU sparse triangular solvers based on energy consumption

Raúl Marichal (Universidad de la República)

05/11/2019, 11:00

Talk

Day I

Preconditioned Krylov-subspace methods to solve general sparse linear systems are the computational bottleneck of the solution of many science and engineering problems. In particular, it is the preconditioner application on each iteration of the solver, the stage that concentrate the most of the processing time. Many times this stage implies the solution of a number of sparse triangular linear...

25. Gingko's load-balancing COO SpMV on NVIDIA and AMD GPU architectures

YU-HSIANG TSAI (Karlsruhe Institute of Technology), Terry Cojean (Karlsruhe Institute of Technology)

05/11/2019, 11:20

Talk

Day I

Efficiently processing unbalanced and irregular matrices on manycore architectures is a challenging problem. With the load-balancing Sparse Matrix Vector Multiplication (SpMV) based on the coordinate format (COO), we have designed an SpMV kernel that provides attractive performance across a wide range of matrices. In this contribution, we present the load-balancing COO SpMV kernel, elaborate...

24. Iterative Refinement in Three Precisions

Erin Carson (Charles University)

05/11/2019, 13:45

Keynote

Support for floating point arithmetic in multiple precisions is becoming increasingly common in emerging architectures.
For example, half precision is now available on the NVIDIA V100 GPUs, on which it runs twice as fast as single precision with a proportional savings in energy consumption. Further, the NVIDIA V100's half-precision tensor cores can provide up to a 16x speedup over double...

22. Parallel multiprecision iterative Krylov subspace solver

Xenia Rosa Volk (Karlsruhe Institute of Technology)

05/11/2019, 14:50

Talk

Day I

The use of multiprecision numerics is becoming increasingly attractive as modern processor architectures often achieve significantly higher performance and throughput rates when using lower precision than IEEE double precision. Error analysis aims at investigating how rounding errors introduced by using different precision formats propagate throughout the algorithms and potentially impact the...

11. Energy-Time Analysis of Heterogeneous Clusters for EEG Classification

Julio Ortega (University of Granada, Granada, Spain)

05/11/2019, 15:20

Talk

Day I

Power-aware computing introduces an additional dimension in the development of efficient parallel codes for heterogeneous computing architectures. Along with experimental frameworks that facilitate the accomplishment of experimental measures, there is a need for data analysis strategies and programming guidelines and strategies that jointly consider speed and consumption performance, among...

9. Convolutional Neural Nets for the Run Time and Energy Consumption Estimation of the Sparse Matrix–Vector Product

Manuel F. Dolz (Universitat Jaume I)

05/11/2019, 15:50

Talk

Day I

Introduction. Modeling the execution time and the energy efficiency of the Sparse Matrix-Vector product (SpMV) on a current CPU architecture is especially complex due to i) irregular memory accesses; ii) indirect memory referencing; and iii) low arithmetic intensity. While analytical models may yield accurate estimates for the total number of cache hits/misses, they often fail to predict...

16. Revisiting the idea of multiprecision block-Jacobi preconditioning - where do we stand 2 years later?

Hartwig Anzt (Karlsruhe Institute of Technology)

05/11/2019, 16:20

Talk

Day I

At the PACO workshop 2017, we presented the idea of decoupling the memory precision from the arithmetic precision, and storing a block-Jacobi preconditioner such that the precision format of each diagonal block is optimized to the numerical characteristics. The idea is to reduce the pressure on the memory bandwidth while preserving regularity of the preconditioner and the convergence of the...

26. In-application energy measurement on Megware SlideSX systems

Martin Köhler (Max Planck Institute for Dynamics of Complex Technical Systems)

05/11/2019, 16:50

Talk

Day I

The design of energy efficient applications requires a proper energy measurement inside the compute servers. The Megware SlideSX chassis for HPC servers provide an energy measurement directly between the power supply and the mainboard of the system with a sampling rate up to 100Hz. This enables the users to detect the energy consuming parts of their applications. In order to obtain the energy...

8. Massively Parallel & Low Precision Accelerator Hardware as Trends in HPC - How to use it for large scale simulations allowing high computational, numerical and energy efficiency with application to CFD

Stefan Turek (TU Dortmund)

05/11/2019, 17:25

Keynote

The aim of this talk is to present and to discuss how modern, resp., future High Performance Computing (HPC) facilities regarding massively parallel hardware with millions of cores together with very fast, but low precision accelerator hardware can be exploited in numerical simulations so that a very high computational, numerical and hence energy efficiency can be obtained. Here, as...

12. Parallel Algorithms for CP, Tucker, and Tensor Train Decompositions

Grey Ballard (Wake Forest University)

06/11/2019, 09:00

Keynote

Multidimensional data, coming from scientific applications such as numerical simulation, can often overwhelm the memory or computational resources of a single workstation. In this talk, we will describe parallel algorithms and available software implementations for computing CP, Tucker, and Tensor Train decompositions of large tensors. The open-source software is designed for clusters of...

14. Energy Efficiency of Nonlinear Domain Decomposition Methods

Axel Klawonn (Universität zu Köln)

06/11/2019, 10:15

Talk

Day II

A nonlinear domain decomposition (DD) solver is considered with respect to improved energy efficiency. In this method, nonlinear problems are solved using Newton’s method on the subdomains in parallel and in asynchronous iterations. The method is compared to the more standard Newton-Krylov approach, where a linear domain decomposition solver is applied to the overall nonlinear problem after...

13. Domain decomposition methods in FreeFEM with ffddm

Pierre-Henri Tournier (Sorbonne Université, CNRS, Université de Paris, Inria, Laboratoire Jacques-Louis Lions, F-75005 Paris, France)

06/11/2019, 10:45

Talk

Day II

The idea behind ffddm is to simplify the use of parallel solvers in the open source finite element software FreeFEM. The ffddm framework is entirely written in the FreeFEM language. Thanks to ffddm, FreeFEM users have access to high-level functionalities for specifying and solving their finite element problems in parallel using scalable two-level Schwarz domain decomposition methods. The...

23. Evaluating asynchronous Schwarz solvers for Exascale.

Pratik Nayak (Karlsruhe Institute of Technology)

06/11/2019, 11:15

Talk

Day II

With the commencement of the exascale computing era, we realize that the majority of the leadership supercomputers are heterogeneous and massively parallel even on a single node with multiple co-processors such as GPU's and multiple cores on each node. For example, ORNL's Summit accumulates six NVIDIA Tesla V100's and 42 core IBM Power9's on each node.

At this scale of parallelism, the...

6. S-Step Enlarged Conjugate Gradient Methods

Sophie Moufawad (American University of Beirut (AUB))

06/11/2019, 11:45

Talk

Day II

In many numerical simulations, there is a need to solve a sparse linear system ( $A x = b$ ) at every iteration. The solution of these linear systems, using iterative methods such as Krylov Subspace Methods, consumes around 80% of the simulation's runtime on modern architectures. Recently, enlarged Krylov subspace methods were introduced in the aim of reducing communication and speeding-up the...

Choose timezone

PACO 2019: 3rd Workshop on Power-Aware Computing

Contact