Intel CPU: CPU cores + GPU (multiple cores itself), shared memory system and L3 cache enabling low-latency, high-bandwidth communication between CPU and GPU
Workstation: add high-end discrete GPU across PCIe bus
Also itself heterogeneous! Some units are purely for graphical processing, not just for SIMD processing
Mobile: System-on-a-Chip (SoC), dedicated cores for GPU, neural network accel., video compression/decompression, etc.
Digital signal processors (DSPs): programmable processors, but with simpler instruction stream control paths
Complex instructions: perform many ops per instruction (amortize cost of control)
SIMD
Very Large Instruction Word (VLIW) - single instruction specifies multiple operations
Anton supercomputer for molecular dynamics (D.E. Shaw Research)
Simulates protein evolution over time
ASIC for computing particle-particle interactions (512 in a machine)
Google TPU pods for ML
Google Pixel Visual Core
Programmable image-processing unit (IPU)
Each core: 16x16 grid of 16-bit mul-add ALUs
10-20x more efficient than GPU at image processing tasks
FPGAs (Field-Programmable Gate Arrays)
Middle ground between ASIC and processor: provides array of logic blocks, connected by interconnect
Programmer-defined logic implemented directly by FPGA