CS 598: Communication Cost Analysis of Algorithms
Mon/Wed 9:30-10:45, 1214 Siebel
Instructor: Edgar Solomonik
solomon2@illinois.edu
4229 Siebel Center
Lec1 | Introduction, motivation, course administration; collective communication protocols src |
Lec2 | Optimal packet-based broadcast, communication cost models (LogP, LogGP, BSP) src |
Lec3 | Communication avoiding algorithms for matrix multiplication src |
Lec4 |
Communication avoiding algorithms for LU factorization src notes on rectangular matrix multiplication |
Lec5 | Memory- and communication-efficient LU factorization src video 1 2 |
Lec6 | Communication avoiding algorithms for LU with pivoting and QR intro src video 2 (QR) |
Lec7 | Parallel algorithms for QR factorization src video 1 2 |
Lec8 | Communication avoiding algorithms for rectangular QR src video 1 2 3 |
Lec9 | Ideal cache model and the discrete Fourier transform src video 1 2 3 |
Lec10 | Parallel FFT and intro to communication lower bounds src video 1 2 3 |
Lec11 | Cache-efficient and parallel sorting src video 1 2 3 |
Lec12 | Bitonic sort and single-source shortest path graph algorithms src video 1 2 3 |
Lec13 | All-pairs shortest-paths src video 1 2 3 |
Lec14 | Betweenness centrality and sample sort revisited src video 1 2 3 |
Lec15 | Pipelined parallel sorting, PRAM, tree contraction src video 1 2 3 |
Lec16 | Tree contraction, Euler tour, list ranking, connectivity and MST src video 1 2 3 |
Lec17 | Sparse direct methods, intro to iterative methods src video 1 2 3 |
Lec18 | Avoiding communication in iterative solvers for sparse linear systems src video 1 2 |
Lec19 | Preconditioning, incomplete LU src video 1 2 3 |
Lec20 | Parallel preconditioning, domain decomposition, graph partitioning src video 1 2 |
Lec21 | Approximate low-rank dense matrix and tensor factorizations src video 1 2 |
Lec22 | Randomized algorithms for low-rank matrix factorization and least-squares src video 2 |
Lec23 | Matrix and tensor completion (ALS, SGD, CCD) src video 1 2 3 |
Lec24 | Molecular dynamics src video 1 2 3 |
Lec25 | Fast integral equation methods, hierarchically structured matrices src video 1 |
Lec26 | Hierarchically semi-separable (HSS) matrices src video 1 2 3 |
Lec27 | HSS matrix construction, electronic structure calculations src video 1 2 |
Lec28 | Convolutional neural networks src video 1 2 |
Lec29 | Network interconnect topologies and topology-aware algorithms src video 1 2 3 |
Efficiency and parallel scalability of data-intensive applications are most often constrained by data movement in the memory hierarchy and the network. This course will focus on analysis of algorithms through the lens of communication and synchronization models. Communication lower bounds and algorithms that attain them will be surveyed for fundamental combinatorial and numerical problems. The course will emphasize general analytical techniques, but will also connect to full-scale applications.
algorithms, linear algebra, and basic parallel programming (e.g. CS 473, 420, 450)
Here is a short list of closely relevant courses of which I am aware, with a theoretical focus and available online material. Most of them cover some subset of the topics/material presented in this course plus topics we will not cover. Predominantly they also have a different focus.
Michael Heath: Parallel Numerical Algorithms, 2015
Guy Blelloch: Parallel Algorithms, 2009
James Demmel and Oded Schwarz: Communication-Avoiding Algorithms, 2011
James Demmel: Applications of Parallel Computers, 2015 (other years available)
Satish Rao: Foundations of Parallel and Distributed Systems, 2012
Pavel Tvrdik: Topics in Parallel Computing, 1999
Yousef Saad's Iterative Methods for Sparse Linear Systems formed the basis of the preconditioning lectures and has much more information on stability and convergence of the methods.
James Demmel's Applied Numerical Linear Algebra is a great reference for information on matrix factorizations and other topics. The presentation of FFT given in the course was partially based on this book.
Joseph JaJa's Introduction to Parallel Algorithms covers PRAM complexity including list ranking and tree-based algorithms presented in the course.