Caches

Small amounts of unusually fast memory (Data D$, Instruction I$, Translation Lookaside Buffer TLB), cache misses, speculatively prefetch, does it fit in cache (small is fast), no time/space tradeoff at hardware level, locality counts (stay in the cache), predictability helps. A modern multi-core machine will have a multi-level cache hierarchy, where the faster and smaller caches belong to individual processors. When one processor modifies a value in its cache, other processors cannot use the old value anymore. [Read More]

Parallelism

Kinds of parallelism bit level instruction level (ILP) data (DLP/SIMD) task parallelism (TLP/MIMD) See YouTube/MIT - parallel processing. Examples Distributed processing over networks Multiple CPUs Multiple cores Pipelines (deeper and wider pipeline = more control hazards) ILP - instruction level parallelism (at best x2 speed up) MLP - Memory-level parallelism is a term in computer architecture referring to the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer (TLB) misses, at the same time Loop unrolling Out-of-order execution - OoO of multiple instructions simultaneously Single Operation-Multiple-Data (SIMD) operations in vector registers Multiple CPU cores on the same chip Speculative execution Branch prediction versus branch target prediction SSE and AVX Moore’s law hits the roof OpenMP C++ AMP - Accelerated Massive Parallelism Pluralsight - High-performance Computing in C++ SMOP - small matter of programming: multiple cores are the way we’re heading, working out how to use them is the difficult part Vector processing - think about it like explicitly managing giant cache lines GPGPU Advance Vector Extensions AVX - xmm ymm zmm Amdahl’s law Amdahl’s law shows the maximum speed up that can be achieved by parallelising a pipeline is related to the proportion that can be done in parallel. [Read More]