Syllabus

This course aims to teach you how to write fast code for parallel computers and hardware accelerators. It is a spiritual follow-on to Software Performance Engineering (SPE), where we graduate from mainstream CPUs to things like GPUs.

Recommended Prerequisite: SPE or similar low-level programming, performance analysis, computer architecture, and performance optimization experience is expected.

Key things we expect you to learn include:

Grading

The overall course grade comprises:

Late submissions

We will generally also allow a short grace period after the deadline to accommodate minor technical hiccups, at our discretion. You don’t need to ask for accommodation if a bug delayed your submission by 5 minutes.

Late days are intended to cover everyday disruptions like a mild illness, interview travel, or a confluence of deadlines with other classes. Additional extensions over and above this generous baseline will only be considered in exceptional circumstances.

Attendance

You may miss up to 2 live labs, no questions asked. As with late days for labs, this is intended to cover common minor disruptions, and beyond this, you risk failing the class.

Collaboration and LLM Policy

You are welcome and encouraged to discuss and learn from each other, but what you ultimately submit should represent your own individual final implementation. You should not use LLMs to write your final implementation. You may not post your solutions on a publicly available repository like GitHub.