Kinds of parallelism

  • bit level
  • instruction level (ILP)
  • data (DLP/SIMD)
  • task parallelism (TLP/MIMD)

See YouTube/MIT - parallel processing.

Examples

  • Distributed processing over networks
  • Multiple CPUs
  • Multiple cores
  • Pipelines (deeper and wider pipeline = more control hazards)
  • ILP - instruction level parallelism (at best x2 speed up)
  • MLP - Memory-level parallelism is a term in computer architecture referring to the ability to have pending multiple memory operations, in particular cache misses or translation lookaside buffer (TLB) misses, at the same time
  • Loop unrolling
  • Out-of-order execution - OoO of multiple instructions simultaneously
  • Single Operation-Multiple-Data (SIMD) operations in vector registers
  • Multiple CPU cores on the same chip
  • Speculative execution
  • Branch prediction versus branch target prediction
  • SSE and AVX
  • Moore’s law hits the roof
  • OpenMP
  • C++ AMP - Accelerated Massive Parallelism
  • Pluralsight - High-performance Computing in C++
  • SMOP - small matter of programming: multiple cores are the way we’re heading, working out how to use them is the difficult part
  • Vector processing - think about it like explicitly managing giant cache lines
  • GPGPU
  • Advance Vector Extensions AVX - xmm ymm zmm

Amdahl’s law

Amdahl’s law shows the maximum speed up that can be achieved by parallelising a pipeline is related to the proportion that can be done in parallel. If you can only do 10% in parallel, the best that can be achieved if that now takes zero time is 90% for the overall process.

Extra cores are great but making use of them is difficult.

See wiki/Amdahl’s law.