Syllabus

This course aims to teach you how to write fast code for parallel computers and hardware accelerators. It is a spiritual follow-on to Software Performance Engineering (SPE), where we graduate from mainstream CPUs to things like GPUs.

Recommended Prerequisite: SPE or similar low-level programming, performance analysis, computer architecture, and performance optimization experience is expected.

Key things we expect you to learn include:

Why do parallel accelerators like GPUs look the way they do? We will both motivate and explain key architectural issues like SIMD, multithreading, latency hiding, memory systems, matrix units, and spatial architectures (FPGAs), and see how similar ideas manifest in multicore CPUs, GPUs, TPUs, and DSPs.
How can we program them for high performance? We emphasize developing practical experience writing & optimizing code on modern accelerators (CUDA on GPUs), and how the major architectural ideas manifest in how we program them.
How to reason about accelerator performance from first principles.

Grading

The overall course grade comprises:

85% lab assignments
10% final project
5% participation (primarily live lab attendance)

Late submissions

You have 6 free “late days” to use on lab submissions over the course of the term.
Beyond those 6, there is a 25% penalty for each additional day submissions are late.
Late days may be used for both checkpoints and final submissions.
We will automatically compute the application of late days to your submissions over the semester which maximizes your lab grade.

We will generally also allow a short grace period after the deadline to accommodate minor technical hiccups, at our discretion. You don’t need to ask for accommodation if a bug delayed your submission by 5 minutes.

Late days are intended to cover everyday disruptions like a mild illness, interview travel, or a confluence of deadlines with other classes. Additional extensions over and above this generous baseline will only be considered in exceptional circumstances.

Attendance

You may miss up to 2 live labs, no questions asked. As with late days for labs, this is intended to cover common minor disruptions, and beyond this, you risk failing the class.

Collaboration and LLM Policy

You are welcome and encouraged to discuss and learn from each other, but what you ultimately submit should represent your own individual final implementation. You should not use LLMs to write your final implementation. You may not post your solutions on a publicly available repository like GitHub.