Speaker
Description
Multidimensional data, coming from scientific applications such as numerical simulation, can often overwhelm the memory or computational resources of a single workstation. In this talk, we will describe parallel algorithms and available software implementations for computing CP, Tucker, and Tensor Train decompositions of large tensors. The open-source software is designed for clusters of computers and has been benchmarked on various supercomputers. The algorithms are scalable, able to process terabyte-sized tensors and maintain high computational efficiency for 100s to 1000s of processing nodes. We will detail the data distribution and parallelization strategies for the key computational kernels within the algorithms, which include the matricized-tensor times Khatri-Rao product, computing (structured) Gram matrices, and tall-skinny QR decompositions.